NGS-SNP - Installation

Contents

NGS-SNP requirements

Instructions for obtaining and installing these prerequisites and NGS-SNP are given below for Mac OS X and Linux. The instructions will likely need to be adjusted depending on the particular OS you are using, and as new versions of the perquisites are released. To use NGS-SNP on a Windows system we recommend using the NGS-SNP virtual machine (see the "Linux virtual machine" section below). The virtual machine can also be used on Mac OS X and Linux systems. The advantage of using the virtual machine is that NGS-SNP and all the dependencies are already installed.

Mac OS X installation

Download NGS-SNP

  1. Download the NGS-SNP script collection.

  2. Unzip the file.

Install MySQL

  1. Download MySQL Community Server (mysql-5.1.46-osx10.6-x86_64.dmg for example) from the MySQL website.

  2. Double-click on the downloaded dmg file to mount it, then double-click on the MySQL PKG file to install MySQL. If you would like MySQL to start automatically during system startup, double-click on the MySQL Startup Item to install it.

Install Berkeley DB

  1. Download Berkeley DB (Berkeley DB 5.1.25.tar.gz for example) from the Oracle website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvfz db-5.1.25.tar.gz
    cd db-5.1.25
    ./dist/configure
    make
    sudo make install
            

Install Perl modules used by NGS-SNP

  1. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    sudo perl -MCPAN -e shell
            

    If you have not yet configured CPAN you will be prompted to do so.

    install File::Find::Rule
    install Tie::IxHash
    install DBI
    install DBD::mysql
    install BerkeleyDB
    install Memoize::ExpireLRU
    install Date::Calc
    install Parse::RecDescent
    install LWP::Protocol::https
            

    Note that a running MySQL server is required for the installation of DBD::mysql. If the server is not running you can start it using:

    cd /usr/local/mysql
    sudo ./bin/mysqld_safe
    (ENTER YOUR PASSWORD, IF NECESSARY)
    (PRESS CONTROL-Z)
    bg
            

Install EMBOSS

  1. Download EMBOSS (EMBOSS-6.3.1.tar.gz for example) from the EMBOSS website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvzf EMBOSS-6.3.1.tar.gz
    cd EMBOSS-6.3.1
    ./configure
    make
    make test
    sudo make install
            

Install T-Coffee

  1. Download T-Coffee (T-COFFEE_distribution.tar.gz for example) from the T-Coffee website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvzf T-COFFEE_distribution.tar.gz
    cd T-COFFEE_distribution_Version_8.93
    sudo ./install t_coffee -exec=/usr/local/bin/
            

Install Muscle

  1. Download Muscle (muscle3.8.31_i86darwin32.tar.gz for example) from the Muscle website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvzf muscle3.8.31_i86darwin32.tar.gz
    sudo mv -i muscle3.8.31_i86darwin32 /usr/local/bin/muscle
            

Install SIFT

  1. Download the SIFT source code (jcvi-sift-1.03.tar.gz for example) from the SIFT website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvzf jcvi-sift-1.03.tar.gz
            

    Follow the installation guide in the included 'INSTALL' file. It is not necessary to complete the 'SETTING UP DATABASES' section. The commands you use may resemble the following:

    cd Desktop/
    wget ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.2.23/ncbi-blast-2.2.23+-universal-macosx.tar.gz
    tar xvfz ncbi-blast-2.2.23+-universal-macosx.tar.gz
    cd ncbi-blast-2.2.23+/bin
    sudo mv * /usr/local/bin/
    cd ~/Desktop
    sudo perl -MCPAN -e 'install DBD::SQLite'
    cd jcvi-sift-1.03
    ./configure
    make
    make check
    sudo make install
    sudo ldconfig
            

Set up your environment

  1. Launch the Terminal application and enter the following (changing '/path/to/NGS-SNP' in the first line to the full path to the NGS-SNP directory on your system):

    NGS_SNP="/path/to/NGS-SNP"
    export NGS_SNP
    PERL5LIB="${NGS_SNP}"/lib/bioperl-1.5.2_102_Matrix:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/bioperl-1.2.3:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-compara/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-variation/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-funcgen/modules:"$PERL5LIB"
    export PERL5LIB
            
  2. To avoid re-entering the above commands each time you launch the Terminal application, add the above commands to the end of your .bash_profile file, located in your home directory. To load the changes enter "source ~/.bash_profile".

Test your setup

  1. Launch the Terminal application and switch into the annotate_SNPs directory and run the test.sh script:

    cd ${NGS_SNP}/scripts/annotate_SNPs
    ./test.sh
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Linux installation

Download NGS-SNP

  1. Download the NGS-SNP script collection.

  2. Unzip the file.

Install MySQL

  1. Open a Bash terminal and install MySQL using the package management tool included with the OS, for example enter:

    sudo apt-get install mysql-server
    sudo apt-get install libmysqlclient-dev          
            

Install Berkeley DB

  1. Open a Bash terminal and install Berkeley DB using the package management tool included with the OS, for example enter:

    sudo apt-get install libdb4.7-dev          
            

Install Perl modules used by NGS-SNP

  1. Open a Bash terminal and enter the following:

    sudo perl -MCPAN -e shell          
            

    If you have not yet configured CPAN you will be prompted to do so.

    install File::Find::Rule
    install Tie::IxHash
    install DBI
    install DBD::mysql
    install BerkeleyDB
    install Memoize::ExpireLRU
    install Date::Calc
    install Parse::RecDescent
    install LWP::Protocol::https
            

    Note that a running MySQL server is required for the installation of DBD::mysql. If the server is not running you can start it using:

    sudo /usr/bin/mysqld_safe
    (ENTER YOUR PASSWORD, IF NECESSARY)
    (PRESS CONTROL-Z)
    bg
            

Install EMBOSS

  1. Download EMBOSS (EMBOSS-6.3.1.tar.gz for example) from the EMBOSS website.

  2. Open a Bash terminal and enter the following:

    cd Desktop/
    tar xvzf EMBOSS-6.3.1.tar.gz
    cd EMBOSS-6.3.1
    ./configure
    make
    make test
    sudo make install
            

Install T-Coffee

  1. Download T-Coffee (T-COFFEE_distribution.tar.gz for example) from the T-Coffee website.

  2. Open a Bash terminal and enter the following:

    cd Desktop/
    tar xvzf T-COFFEE_distribution.tar.gz
    cd T-COFFEE_distribution_Version_8.93
    sudo ./install t_coffee -exec=/usr/local/bin/
            

Install Muscle

  1. Download Muscle (muscle3.8.31_i86linux32.tar.gz for example) from the Muscle website.

  2. Open a Bash terminal and enter the following:

    cd Desktop/
    tar xvzf muscle3.8.31_i86linux32.tar.gz
    sudo mv -i muscle3.8.31_i86linux32 /usr/local/bin/muscle
            

Install SIFT

  1. Download the SIFT source code (jcvi-sift-1.03.tar.gz for example) from the SIFT website.

  2. Launch the Terminal application (located in the /Applications/Utilities folder) and enter the following:

    cd Desktop/
    tar xvzf jcvi-sift-1.03.tar.gz
            

    Follow the installation guide in the included 'INSTALL' file. It is not necessary to complete the 'SETTING UP DATABASES' section. The commands you use may resemble the following:

    cd Desktop/
    wget ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.2.23/ncbi-blast-2.2.23+-ia32-linux.tar.gz
    tar xvfz ncbi-blast-2.2.23+-ia32-linux.tar.gz
    cd ncbi-blast-2.2.23+/bin
    sudo mv * /usr/local/bin/
    cd ~/Desktop
    sudo perl -MCPAN -e 'install DBD::SQLite'
    cd jcvi-sift-1.03
    ./configure
    make
    make check
    sudo make install
    sudo ldconfig
            

Set up your environment

  1. Open a Bash terminal and enter the following (changing '/path/to/NGS-SNP' in the first line to the full path to the NGS-SNP directory on your system):

    NGS_SNP="/path/to/NGS-SNP"
    export NGS_SNP
    PERL5LIB="${NGS_SNP}"/lib/bioperl-1.5.2_102_Matrix:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/bioperl-1.2.3:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-compara/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-variation/modules:"$PERL5LIB"
    PERL5LIB="${NGS_SNP}"/lib/ensembl/ensembl-funcgen/modules:"$PERL5LIB"
    export PERL5LIB
            
  2. To avoid re-entering the above commands each time you open a terminal, add the above commands to the end of your .bashrc file, located in your home directory. To load the changes enter "source ~/.bashrc".

Test your setup

  1. Open a Bash terminal and switch into the annotate_SNPs directory and run the test.sh script:

    cd ${NGS_SNP}/scripts/annotate_SNPs
    ./test.sh          
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Linux virtual machine overview

An alternative to downloading and installing the NGS-SNP prerequisites is to use an NGS-SNP virtual machine consisting of a Linux operating system with all the dependencies preinstalled. Two types of virtual machines are available for NGS-SNP: a VMware virtual machine that runs in VMware Fusion (Mac OS X) and VMware Player (Windows and Linux); and a VirtualBox virtual machine that runs in VirtualBox, which is freely available for Windows, Linux, Macintosh, and Solaris hosts.

Depending on the virtualization product used to run the machine you may be able to drag and drop files from your host OS to the virtual machine. If this feature does not work you can easily transfer files using a USB drive.

The NGS-SNP virtual machine has the following user accounts:

user: bioin
password: bioin1234

user: root
password: bioin1234
    

The MySQL server does not have a root password set.

To use NGS-SNP log in as user 'bioin'.

Numerous programs are included with this machine, including emacs and vim for viewing and editing text files, and xpdf for viewing pdf files.

If you are not familiar with Linux you may want to read chapter 4 of An introduction to Linux for bioinformatics.

Downloading and using the VMware Linux virtual machine

If you have problems with any of the following steps please contact Adriano Arantes at arantes@ualberta.ca.

Before downloading the VMware virtual machine, install the appropriate VMware virtualization software for your host system: VMware Fusion (Mac OS X) or VMware Player (Windows and Linux). Both programs are available from www.vmware.com.

Download the virtual machine

  1. Download the NGS-SNP Linux virtual machine for VMware.

  2. Unzip the file.

Run the virtual machine

  1. Launch VMware Fusion (Mac OS X) or VMware Player (Windows and Linux). If you are using VMware Fusion choose Open from the File menu to open the file you unzipped. If you are using VMware Player open the .vmx file that is located in the directory that was created when you unzipped the virtual machine. If asked whether the machine was moved or copied, choose copied.

  2. Start the virtual machine by clicking the play symbol.

  3. Log in to the virtual machine as user 'bioin' using the password 'bioin1234'.

    Login Screen

Update NGS-SNP and test your setup

  1. Open the Terminal application by clicking on the Terminal icon, or by choosing Applications->Accessories->Terminal.

  2. Download and extract the latest version of NGS-SNP:

    ./setup_NGS-SNP.sh
            
    Setup Screen
  3. Navigate to the annotate_SNPs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_SNPs/
    ./test.sh
            
    Annotation Test Screen

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

  4. Navigate to the annotate_INDELs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_INDELs/
    ./test.sh
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Downloading and using the VirtualBox Linux virtual machine

If you have problems with any of the following steps please contact Adriano Arantes at arantes@ualberta.ca.

Before downloading the VirtualBox virtual machine, install the VirtualBox software on your computer.

Download the virtual machine

  1. Download the NGS-SNP Linux virtual machine for VirtualBox.

  2. Unzip the file.

Run the virtual machine

  1. Launch VirtualBox and choose Import Appliance from the File menu.

  2. In the Appliance Import Wizard that opens use the Choose button to select the Ubuntu-vb.ovf file that is located in the directory that was created when you unzipped the virtual machine.

  3. Click Continue to view the virtual machine settings and click Done to begin importing the virtual machine.

  4. Select the newly imported virtual machine in the VM VirtualBox Manager and click Start.

  5. Log in to the virtual machine as user 'bioin' using the password 'bioin1234'.

    Login Screen

Update NGS-SNP and test your setup

  1. Open the Terminal application by clicking on the Terminal icon, or by choosing Applications->Accessories->Terminal.

  2. Download and extract the latest version of NGS-SNP:

    ./setup_NGS-SNP.sh
            
    Setup Screen
  3. Navigate to the annotate_SNPs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_SNPs/
    ./test.sh
            
    Annotation Test Screen

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

  4. Navigate to the annotate_INDELs directory and run the test.sh script:

    cd ~/NGS-SNP/scripts/annotate_INDELs/
    ./test.sh
            

    If the script is working you should see various progress messages as the annotation is completed. To stop the annotation, press Control-C.

Creating a local copy of Ensembl for NGS-SNP

By default the scripts in NGS-SNP use a publicly accessible Ensembl MySQL server as the source of annotation information. The advantage of using the public server is that you don't need to download and setup a local Ensembl database. However, a local Ensembl database can increase performance dramatically and is recommended when annotating lists of more than 10,000 SNPs. In our own tests we were able to annotate about 4,000,000 SNPs in about 2 days on a standard Linux desktop when a local copy of Ensembl was used. This number of SNPs would take several months to annotate using the public Ensembl server.

The general procedure for setting up and using a local Ensembl database is as follows:

  1. Download the "core", "variation", and "funcgen" databases for the species that the SNPs were obtained from and the "core" and "variation" databases for the model species to be used during annotation (the model species is specified using the '-model' option of the annotate_SNPs.pl script). In the detailed commands below the SNPs to be annotated are from Bos taurus and the model species used to enhance the annotation is Homo sapiens.

  2. Download the "compara" and "ontology" databases.

  3. Create the necessary database tables and load the downloaded data.

  4. Adjust the commands used to run the NGS-SNP scripts, so that the scripts use the local Ensembl database instead of the public database.

Detailed example

In the example given below an Ensembl database for annotating cattle SNPs is created on a local hard drive, using data downloaded from ftp://ftp.ensembl.org/pub/. The commands may need to be adjusted the match the configuration of your system. Note that the example below uses release 61 of the databases. If you use a newer release of the databases you will need to update the API to the same version. See the "Updating the Ensembl API" section below.

  1. Specify the databases you want to process, by setting some temporary environment variables:

    export CORE='bos_taurus_core_61_4j'
    export VAR='bos_taurus_variation_61_4j'
    export FUNC='bos_taurus_funcgen_61_4j'
    export MODEL_CORE='homo_sapiens_core_61_37f'
    export MODEL_VAR='homo_sapiens_variation_61_37f'
    export COMPARA='ensembl_compara_61'
    export ONTOLOGY='ensembl_ontology_61'
    export RELEASE='release-61'
    export LOCATION='/home/bioin'
    	
  2. Create the databases in MySQL:

    mysql -uroot -e "CREATE DATABASE ${CORE};" 
    mysql -uroot -e "CREATE DATABASE ${VAR};" 
    mysql -uroot -e "CREATE DATABASE ${FUNC};" 
    mysql -uroot -e "CREATE DATABASE ${MODEL_CORE};" 
    mysql -uroot -e "CREATE DATABASE ${MODEL_VAR};" 
    mysql -uroot -e "CREATE DATABASE ${COMPARA};" 
    mysql -uroot -e "CREATE DATABASE ${ONTOLOGY};" 
            
  3. Download the databases:

    cd ${LOCATION}
    mkdir -p ensembl_databases/download
    cd ensembl_databases/download
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/*"
    wget -r -t 45 -A.gz "ftp://ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/*"
    find ./ -name "*.gz" | xargs -I{} gunzip {}
            
  4. Configure the databases:

    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${CORE}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${VAR}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${FUNC}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${MODEL_CORE}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${MODEL_VAR}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${COMPARA}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "GRANT ALL PRIVILEGES ON ${ONTOLOGY}.* TO 'bioin'@'localhost' IDENTIFIED BY 'pass';"
    mysql -uroot -e "FLUSH PRIVILEGES;"
    
    mysql -u bioin -ppass ${CORE} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/${CORE}.sql
    mysql -u bioin -ppass ${VAR} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/${VAR}.sql
    mysql -u bioin -ppass ${FUNC} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/${FUNC}.sql
    mysql -u bioin -ppass ${MODEL_CORE} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/${MODEL_CORE}.sql
    mysql -u bioin -ppass ${MODEL_VAR} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/${MODEL_VAR}.sql
    mysql -u bioin -ppass ${COMPARA} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/${COMPARA}.sql
    mysql -u bioin -ppass ${ONTOLOGY} < ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/${ONTOLOGY}.sql
            
  5. Load data into the databases (this step may run for a day or so depending on the capabilities of your system):

    mysqlimport -u bioin -ppass ${CORE} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${CORE}/*.txt
    mysqlimport -u bioin -ppass ${VAR} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${VAR}/*.txt
    mysqlimport -u bioin -ppass ${FUNC} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${FUNC}/*.txt
    mysqlimport -u bioin -ppass ${MODEL_CORE} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_CORE}/*.txt
    mysqlimport -u bioin -ppass ${MODEL_VAR} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${MODEL_VAR}/*.txt
    mysqlimport -u bioin -ppass ${COMPARA} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${COMPARA}/*.txt
    mysqlimport -u bioin -ppass ${ONTOLOGY} -L ${LOCATION}/ensembl_databases/download/ftp.ensembl.org/pub/${RELEASE}/mysql/${ONTOLOGY}/*.txt
            
  6. Run the annotation script using the local Ensembl database:

    cd ~/NGS-SNP/scripts/annotate_SNPs
    perl annotate_SNPs.pl -n 'chr30=chrX' -s bos_taurus -v \
    -i test_input/bovine_GA_maq_transcripts.tab \
    -o test_output/bovine_GA_maq_transcripts_annotated.tab \
    -cs Homo_sapiens Mus_musculus -model Homo_sapiens \
    -host localhost -user bioin -pass pass
            

Updating the Ensembl API

The Ensembl API can be updated as follows:

  1. Open a terminal and navigate to the NGS-SNP/lib directory:

    cd NGS-SNP/lib
            
  2. Create a directory to contain the new API. For example, for release 61 create a directory called "ensembl_61":

    mkdir ensembl_61
            
  3. Go to Ensembl: Perl API Installation and download the following API packages to the directory created in the previous step: ensembl, ensembl-compara, ensembl-variation, ensembl-funcgen.

  4. Extract the API packages:

    find ./ -name "*tar.gz" | xargs -I{} tar xvzf {}
            
  5. Modify the ensembl symbolic link in the NGS-SNP/lib directory so that it points to the new version of the Ensembl API.

    ln -s ensembl_61 ensembl