NGS-SNP - Overview

Description

NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcripts or whole genomes from organisms with reference sequences in Ensembl. Included among the annotations, several of which are not available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. These comparisons allow, for example, SNPs to be sorted or filtered based on how drastically the SNP changes the score of a protein alignment. Other fields indicate the names of overlapping protein domains or features, and the conservation of both the SNP site and flanking regions. NCBI, Ensembl, and Uniprot IDs are provided for genes, transcripts, and proteins when applicable, along with Gene Ontology terms, a gene description, phenotypes linked to the gene, and an indication of whether the SNP is novel or known. A “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologous gene, typically in a well-characterized species.

NGS-SNP scripts

Using NGS-SNP

  1. Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.

  2. Obtain a list of SNPs from SAMtools, Maq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.

  3. Annotate the SNP list using the annotate_SNPs.pl script.

The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):

cd NGS-SNP/scripts

perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \
Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \
annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
        

For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.

Using a local Ensembl database

Questions or Suggestions?

Email Paul Stothard at stothard@ualberta.ca.