NGS-SNP - Installation

Contents

Prerequisites for NGS-SNP installation

Instructions for obtaining and installing these prerequisites and NGS-SNP are given below for Mac OS X and Linux. The instructions will likely need to be adjusted depending on the particular OS you are using, and as new versions of the prerequisites are released. To use NGS-SNP on a Windows system we recommend using the NGS-SNP virtual machine (see the "NGS-SNP virtual machine" section below). The virtual machine can also be used on Mac OS X and Linux systems. The advantage of using the virtual machine is that NGS-SNP and all the dependencies are already installed.

Installation on Mac OS X

Download NGS-SNP

  1. Download the NGS-SNP script collection and unzip the file.

  2. Place the NGS-SNP folder in a convenient location such as your home directory. Keep note of the location as it will be required when setting up your environment.

Install MySQL

  1. Download MySQL Community Server (mysql-5.1.46-osx10.6-x86_64.dmg for example) from the MySQL website.

  2. Double-click on the downloaded dmg file to mount it, then double-click on the MySQL PKG file to install MySQL. If you would like MySQL to start automatically during system startup, double-click on the MySQL Startup Item to install it.

Install Berkeley DB

  1. Download Berkeley DB (Berkeley DB 6.2.32.tar.gz for example) from the Oracle website (will require to create an Oracle account) to a convenient location such as your home directory.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following from the directory where the Berkeley DB was downloaded to:

    tar xvfz db-6.2.32.tar.gz
    cd db-6.2.32
    ./dist/configure
    make
    sudo make install
            

Install Perl modules used by NGS-SNP

  1. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    sudo perl -MCPAN -e shell
            

    If you have not yet configured CPAN you will be prompted to do so.

    install File::Find::Rule
    install Tie::IxHash
    install DBI
    install DBD::mysql
    install BerkeleyDB
    install Memoize::ExpireLRU
    install Date::Calc
    install Parse::RecDescent
    install LWP::Protocol::https
    exit 
            

    Note that a running MySQL server is required for the installation of DBD::mysql. If the server is not running you can start it using:

    cd /usr/local/mysql
    sudo ./bin/mysqld_safe
    (ENTER YOUR PASSWORD, IF NECESSARY)
    (PRESS CONTROL-Z)
    bg
            

Install EMBOSS

  1. Download EMBOSS (EMBOSS-6.6.0.tar.gz for example) from the EMBOSS website into a convenient location such as your home directory.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following from the directory where EMBOSS was downloaded to:

    tar zxvf EMBOSS-6.6.0.tar.gz
    cd EMBOSS-6.6.0
    ./configure
    make
    make check
    sudo make install
            

Install T-Coffee

  1. Download T-Coffee (T-COFFEE_distribution.tar.gz for example) from the T-Coffee website into a convenient location such as your home directory.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following from the directory where T-COFFEE was downloaded to:

    tar xvzf T-COFFEE_distribution.tar.gz
    cd T-COFFEE_distribution_Version_11.00.8cbe486/
    sudo ./install t_coffee -exec=/usr/local/bin/
    [Note: The screen output at the end of the installation regarding instructions to "FINALIZE YOUR INSTALLATION" may be ignored.]
            

Install Muscle

  1. Download Muscle (muscle3.8.31_i86darwin64.tar.gz for example) from the Muscle website into a convenient location such as your home directory.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following from the directory where Muscle was downloaded to:

    tar zxvf muscle3.8.31_i86darwin64.tar.gz 
    sudo mv -i muscle3.8.64 /usr/local/bin/muscle
            

Install SIFT

  1. Download the SIFT source code (jcvi-sift-1.03.tar.gz for example) from the SIFT website into a convenient location such as your home directory.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following from the directory where SIFT was downloaded to:

    tar xvzf jcvi-sift-1.03.tar.gz
            

    Follow the installation guide in the included 'INSTALL' file. It is not necessary to complete the 'SETTING UP DATABASES' section. The commands you use may resemble the following:

    cd ~
    wget ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-macosx.tar.gz
    tar xvfz ncbi-blast-2.6.0+-x64-macosx.tar.gz
    cd ncbi-blast-2.6.0+/bin
    sudo mv * /usr/local/bin/
    cd ~
    sudo perl -MCPAN -e 'install DBD::SQLite'
    cd jcvi-sift-1.03
    export CFLAGS=-I/usr/include/malloc
    ./configure
    make
    make check
    sudo make install
            

Set up your environment

  1. Launch the Terminal application and enter the following (changing '/path/to/NGS-SNP' in the first line to the full path to the NGS-SNP directory on your system):

    NGS_SNP="/path/to/NGS-SNP"
    export NGS_SNP
    PERL5LIB=${NGS_SNP}/lib/bioperl-1.5.2_102_Matrix:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/bioperl-1.2.3:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-compara/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-variation/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-funcgen/modules:$PERL5LIB
    export PERL5LIB
            
  2. To avoid re-entering the above commands each time you launch the Terminal application, add the above commands to the end of your .bash_profile file, located in your home directory. To load the changes enter "source ~/.bash_profile".

Test your setup

Launch the Terminal application and switch into the annotate_SNPs directory and run the test.sh script:

  cd ${NGS_SNP}/scripts/annotate_SNPs
  ./test.sh
        

If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Installation on Linux

Download NGS-SNP

  1. Download the NGS-SNP script collection and unzip the file.

  2. Place the NGS-SNP folder in a convenient location such as your home directory. Keep note of the location as it will be required when setting up your environment.

Install MySQL

  1. Open a Bash terminal and install MySQL using the package management tool included with the OS, for example enter:

    sudo apt-get install mysql-server
    sudo apt-get install libmysqlclient-dev          
            

Install Berkeley DB

  1. Open a Bash terminal and install Berkeley DB using the package management tool included with the OS, for example enter:

    sudo apt-get install libdb5.3-dev
            

Install Perl modules used by NGS-SNP

  1. Open a Bash terminal and enter the following:

    sudo perl -MCPAN -e shell          
            

    If you have not yet configured CPAN you will be prompted to do so.

    install File::Find::Rule
    install Tie::IxHash
    install DBI
    install DBD::mysql
    install BerkeleyDB
    install Memoize::ExpireLRU
    install Date::Calc
    install Parse::RecDescent
    install LWP::Protocol::https
    exit 
            

    Note that a running MySQL server is required for the installation of DBD::mysql. If the server is not running you can start it using:

    sudo /usr/bin/mysqld_safe
    (ENTER YOUR PASSWORD, IF NECESSARY)
    (PRESS CONTROL-Z)
    bg
            

Install EMBOSS

  1. Download EMBOSS (EMBOSS-6.6.0.tar.gz for example) from the EMBOSS website into a convenient location such as your home directory.

  2. Open a Bash terminal and enter the following from the directory where EMBOSS was downloaded to:

    tar xvzf EMBOSS-6.6.0.tar.gz
    cd EMBOSS-6.6.0
    ./configure
    [Note: If you get an error that "X11 graphics have been selected but no X11 header files have been found", then run the command: sudo apt install libx11-dev followed by ./configure again.]
    make
    sudo ldconfig
    sudo make install
            

Install T-Coffee

  1. Download T-Coffee (T-COFFEE_distribution.tar.gz for example) from the T-Coffee website into a convenient location such as your home directory.

  2. Open a Bash terminal and enter the following from the directory where T-Coffee was downloaded to:

    tar xvzf T-COFFEE_distribution.tar.gz
    cd T-COFFEE_distribution_Version_11.00.8cbe486
    sudo ./install t_coffee -exec=/usr/local/bin/
    [Note: The screen output at the end of the installation regarding instructions to "FINALIZE YOUR INSTALLATION" may be ignored.]
            

Install Muscle

  1. Download Muscle (muscle3.8.31_i86linux64.tar.gz for example) from the Muscle website into a convenient location such as your home directory.

  2. Open a Bash terminal and enter the following from the directory where Muscle was downloaded to:

    tar xvzf muscle3.8.31_i86linux64.tar.gz
    sudo mv -i muscle3.8.31_i86linux64 /usr/local/bin/muscle
            

Install SIFT

  1. Download the SIFT source code (jcvi-sift-1.03.tar.gz for example) from the SIFT website into a convenient location such as your home directory.

  2. Open a Bash terminal and enter the following from the directory where SIFT was downloaded to:

    tar xvzf jcvi-sift-1.03.tar.gz
            

    Follow the installation guide in the included 'INSTALL' file (within jcvi-sift-1.03 folder). It is not necessary to complete the 'SETTING UP DATABASES' section. The commands you use may resemble the following:

    cd ~
    wget ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.6.0/ncbi-blast-2.6.0+-x64-linux.tar.gz
    tar xvfz ncbi-blast-2.6.0+-x64-linux.tar.gz
    cd ncbi-blast-2.6.0+/bin
    sudo mv * /usr/local/bin/
    cd ~
    sudo perl -MCPAN -e 'install DBD::SQLite'
    cd jcvi-sift-1.03
    ./configure
    make
    make check
    sudo make install
    sudo ldconfig
            

Set up your environment

  1. Open a Bash terminal and enter the following (changing '/path/to/NGS-SNP' in the first line to the full path to the NGS-SNP directory on your system):

    NGS_SNP="/path/to/NGS-SNP"
    export sift
    PERL5LIB=${NGS_SNP}/lib/bioperl-1.5.2_102_Matrix:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/bioperl-1.2.3:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-compara/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-variation/modules:$PERL5LIB
    PERL5LIB=${NGS_SNP}/lib/ensembl/ensembl-funcgen/modules:$PERL5LIB
    export PERL5LIB
            
  2. To avoid re-entering the above commands each time you open a terminal, add the above commands to the end of your .bashrc file, located in your home directory. To load the changes enter the following in the terminal:

    source ~/.bashrc

Test your setup

Open a Bash terminal and switch into the annotate_SNPs directory and run the test.sh script:

  cd ${NGS_SNP}/scripts/annotate_SNPs
  ./test.sh          
        

If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

NGS-SNP virtual machine

An alternative to downloading and installing the NGS-SNP prerequisites is to use the NGS-SNP virtual machine consisting of a Linux operating system with all the dependencies preinstalled. The NGS-SNP virtual machine runs in VirtualBox, which is freely available for Windows, Linux, Macintosh, and Solaris hosts.

Depending on the virtualization product used to run the machine you may be able to drag and drop files from your host OS to the virtual machine. If this feature does not work you can easily transfer files using a USB drive, or share a directory between the host OS and virtual machine.

The NGS-SNP virtual machine has the following user account:

user: ubuntu
password: pw2016#
    

The MySQL server has the same root password as the virtual machine.

To use NGS-SNP log in as user 'ubuntu'.

Numerous programs are included with this machine, including emacs and vim for viewing and editing text files, and xpdf for viewing pdf files.

If you are not familiar with Linux you may want to read chapter 4 of An introduction to Linux for bioinformatics.

Install VirtualBox Software

Before downloading the VirtualBox virtual machine, install the VirtualBox software on your computer.

If you have problems with any of the following steps please contact us by email.

Download the NGS-SNP virtual machine

Download the virtual machine from here.

Run the virtual machine

  1. Launch VirtualBox and choose Import Appliance from the File menu.

  2. In the Appliance Import Wizard that opens use the Choose button to select the ubuntu-bioVM.ova file that was downloaded.

  3. Click Continue to view the virtual machine settings and click Done to begin importing the virtual machine.

  4. Select the newly imported virtual machine in the VM VirtualBox Manager and click Start.

  5. Log in to the virtual machine as user 'ubuntu' using the password 'pw2016#'.

Update NGS-SNP and test your setup

  1. Open the Terminal application by clicking on the Terminal icon, or by choosing Applications->Accessories->Terminal.

  2. Download and extract the latest version of NGS-SNP by entering the following in the terminal from the home directory:

    ./setup_NGS-SNP.sh
            
  3. Navigate to the annotate_SNPs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_SNPs/
    ./test.sh
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

  4. Navigate to the annotate_INDELs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_INDELs/
    ./test.sh
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Creating a local copy of Ensembl for NGS-SNP

By default the scripts in NGS-SNP use a publicly accessible Ensembl MySQL server as the source of annotation information. The advantage of using the public server is that you don't need to download and setup a local Ensembl database. However, a local Ensembl database can increase performance dramatically and is recommended when annotating lists of more than 10,000 SNPs. In our own tests we were able to annotate about 4,000,000 SNPs in about 2 days on a standard Linux desktop when a local copy of Ensembl was used. This number of SNPs would take several months to annotate using the public Ensembl server.

The general procedure for setting up and using a local Ensembl database is as follows:

  1. Download the "core", "variation", and "funcgen" databases for the species that the SNPs were obtained from and the "core" and "variation" databases for the model species to be used during annotation (the model species is specified using the '-model' option of the annotate_SNPs.pl script). In the detailed commands below the SNPs to be annotated are from Bos taurus and the model species used to enhance the annotation is Homo sapiens.

  2. Download the "compara" and "ontology" databases.

  3. Create the necessary database tables and load the downloaded data.

  4. Adjust the commands used to run the NGS-SNP scripts, so that the scripts use the local Ensembl database instead of the public database.

Detailed example

In the example given below an Ensembl database for annotating cattle SNPs is created on a local hard drive, using data downloaded from ftp://ftp.ensembl.org/pub/. The commands may need to be adjusted the match the configuration of your system. Note that the example below uses release 87 of the databases. If you use a newer release of the databases you will need to update the API to the same version. See the "Updating the Ensembl API" section below.

  1. Specify the databases you want to process, by setting some temporary environment variables:

    export CORE='bos_taurus_core_87_31'
    export VAR='bos_taurus_variation_87_31'
    export FUNC='bos_taurus_funcgen_87_31'
    export MODEL_CORE='homo_sapiens_core_87_38'
    export MODEL_VAR='homo_sapiens_variation_87_38'
    export COMPARA='ensembl_compara_87'
    export ONTOLOGY='ensembl_ontology_87'
    export RELEASE='release-87'
    export LOCATION='/home/ubuntu'
    	
  2. Create the databases in MySQL:

    mysql -uroot -e "CREATE DATABASE ${CORE};" 
    mysql -uroot -e "CREATE DATABASE ${VAR};" 
    mysql -uroot -e "CREATE DATABASE ${FUNC};" 
    mysql -uroot -e "CREATE DATABASE ${MODEL_CORE};" 
    mysql -uroot -e "CREATE DATABASE ${MODEL_VAR};" 
    mysql -uroot -e "CREATE DATABASE ${COMPARA};" 
    mysql -uroot -e "CREATE DATABASE ${ONTOLOGY};" 
            
  3. Download the databases:

    cd ${LOCATION}
    mkdir -p ensembl_databases/download
    cd ensembl_databases/download
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/*"
    find ./ -name "*.gz" | xargs -I{} gunzip {}
            
  4. Configure the databases:

    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${CORE}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${VAR}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${FUNC}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${MODEL_CORE}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${MODEL_VAR}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${COMPARA}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${ONTOLOGY}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "FLUSH PRIVILEGES;"
    
    mysql -u bioin -ppass ${CORE} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/${CORE}.sql
    mysql -u bioin -ppass ${VAR} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/${VAR}.sql
    mysql -u bioin -ppass ${FUNC} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/${FUNC}.sql
    mysql -u bioin -ppass ${MODEL_CORE} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/${MODEL_CORE}.sql
    mysql -u bioin -ppass ${MODEL_VAR} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/${MODEL_VAR}.sql
    mysql -u bioin -ppass ${COMPARA} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/${COMPARA}.sql
    mysql -u bioin -ppass ${ONTOLOGY} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/${ONTOLOGY}.sql
            
  5. Load data into the databases (this step may run for a day or so depending on the capabilities of your system):

    mysqlimport -u bioin -ppass ${CORE} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/*.txt
    mysqlimport -u bioin -ppass ${VAR} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/*.txt
    mysqlimport -u bioin -ppass ${FUNC} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/*.txt
    mysqlimport -u bioin -ppass ${MODEL_CORE} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/*.txt
    mysqlimport -u bioin -ppass ${MODEL_VAR} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/*.txt
    mysqlimport -u bioin -ppass ${COMPARA} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/*.txt
    mysqlimport -u bioin -ppass ${ONTOLOGY} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/*.txt
            
  6. Run the annotation script using the local Ensembl database:

    cd ~/NGS-SNP/scripts/annotate_SNPs
    perl annotate_SNPs.pl -n 'chr30=chrX' -s bos_taurus -v \
    -i test_input/bovine_GA_maq_transcripts.tab \
    -o test_output/bovine_GA_maq_transcripts_annotated.tab \
    -cs Homo_sapiens Mus_musculus -model Homo_sapiens \
    -host localhost -user bioin -pass pass
            

Updating the Ensembl API

The Ensembl API can be updated as follows:

  1. Open a terminal and navigate to the NGS-SNP/lib directory:

    cd NGS-SNP/lib
            
  2. Create a directory to contain the new API. For example, for release 88 create a directory called "ensembl_88":

    mkdir ensembl_88
            
  3. Go to Ensembl: Perl API Installation and download the following API packages to the directory created in the previous step: ensembl, ensembl-compara, ensembl-variation, ensembl-funcgen.

  4. Extract the API packages:

    find ./ -name "*tar.gz" | xargs -I{} tar xvzf {}
            
  5. Modify the ensembl symbolic link in the NGS-SNP/lib directory so that it points to the new version of the Ensembl API.

    ln -s ensembl_88 ensembl