NGS-SNP - obtain_INDEL_characteristics.pl

Description

This script generates 3 figures consisting of bar plots showing the distribution of INDELs by INDEL length. The figures show the distribution for either all INDELs, deletions or insertions. In addition, each figure has a separate plot created for the subset of INDELs occurring in coding regions (CDS).

Previous studies have shown that INDELs with a length 3n are enriched in the coding regions (first described in the Nature genetics paper Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing). This is due to the fact that 3n INDELs preserve reading frame.

Setup

See setup instructions for NGS-SNP.

In addition, this script requires R. If you are using the NGS-SNP virtual machine, R will already be installed, otherwise see the R site for installation instructions.

Usage

Usage: perl obtain_INDEL_characteristics.pl 
Arguments required:
       -i [FILE] : input INDEL annotation file (Required).
       -o [DIRECTORY] : output folder (Required).
       -c [FILE] : cutoff of length of INDELs (default is 12).
       -w [INT] : width of figure (default is 1400).
       -l [INT] : height of figure (default is 700).
       -r [FILE]: the location of the Length_Distribution_Plot.R script (Optional; 
                     default is to locate automatically).
example: perl obtain_INDEL_characteristics.pl -i indels.vcf.annotated -o test 
    

Input

An INDEL annotation file as created by the annotate_INDELs.pl script.

Output

Four files will be generated in the output directory: