NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of transcripts or whole genomes from organisms with reference sequences in Ensembl. Included among the annotations, several of which are not available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. These comparisons allow, for example, SNPs to be sorted or filtered based on how drastically the SNP changes the score of a protein alignment. Other fields indicate the names of overlapping protein domains or features, and the conservation of both the SNP site and flanking regions. NCBI, Ensembl, and Uniprot IDs are provided for genes, transcripts, and proteins when applicable, along with Gene Ontology terms, a gene description, phenotypes linked to the gene, and an indication of whether the SNP is novel or known. A “Model_Annotations” field provides several annotations obtained by transferring in silico the SNP to an orthologous gene, typically in a well-characterized species.
Set up NGS-SNP. Note that the simplest approach is to follow the "Linux virtual machine" section of the installation guide.
Obtain a list of SNPs from SAMtools, Maq, the AB diBayes SNP package, or some other SNP calling software. The SNP list formats that can be parsed by the annotate_SNPs.pl script are described in the annotate_SNPs.pl documentation.
Annotate the SNP list using the annotate_SNPs.pl script.
The following commands illustrate a typical NGS-SNP session in which SNPs are annotated and then scored (sample data included with NGS-SNP is analyzed):
cd NGS-SNP/scripts perl annotate_SNPs/annotate_SNPs.pl -s bos_taurus -cs Homo_sapiens \ Mus_musculus -v -matrix annotate_SNPs/data/blosum62.mat -i \ annotate_SNPs/test_input/bovine_GA_maq_transcripts.tab -o annotated_snps.tab
For more information on the options available, input formats, and output formats, see the documentation for each script. Each script also comes with sample input and output files, located in directories called test_input and sample_output, respectively.
To speed up the annotation process by using a local Ensembl database, see "Creating a local copy of Ensembl for NGS-SNP".
To update NGS-SNP so that it uses the latest release of Ensembl, see "Updating the Ensembl API".
Email Paul Stothard at email@example.com.