Recent developments - Flower and Bamstats
by Ketil; January 6, 2012
Extracting statistics from paired alignments
If you do Illumina sequencing, you’re probably getting paired reads, i.e. you get two reads from each molecule, one from each end. This is usually pretty reliable, but for more complicated mate pair protocols, there’s often chimeric sequences where the two members of a pair actually come from different molecules, and sometimes the protocol fails for other reasons. In any case, it might be useful to look closer at the result when you map the reads to their reference.
To simplify this, I’ve uploaded a small tool, bamstats to Hackage. It uses Nick Ingolia’s samtools
wrapper to extract relevant information from BAM files, and then summarizes it. Here are the examples:
Insert size statistics:
% bam stats -n 1000000 test.bam
Alignment count prop mean stdev skew kurt
innies 484874 96.97% 365.3 24.1 -3.9 34.9
outies 141 0.03% 112.8 217.3 8.0 62.8
lefties 0 0.00% NaN NaN NaN NaN
righties 0 0.00% NaN NaN NaN NaN
Total reads: 1000000
Histogram of insert sizes:
% bam hist -n 1000000 -b 6 -m 600 test.bam
Alignment count prop 100 200 300 400 500 >
innies 484874 96.97% 0 2204 5136 471596 5847 91
outies 141 0.03% 138 1 0 0 0 2
lefties 0 0.00% 0 0 0 0 0 0
righties 0 0.00% 0 0 0 0 0 0
Total reads: 1000000
As usual, the --help
option will tell you about the available tunables.
Flower update
For 454 sequences (and, I guess, for Ion torrent?), it is often informative to look at the raw data (or flow values) in the SFF files. Flower, which now ships as a part of the biosff library, is one option for doing this. The output option -h
produces a table of flow values by nucleotide. The latest release adds -H
, which produces a table of flow values by flow position. This can e.g. be used to build empirical distributions for use with flowsim.
One way to plot the output is to run
flower -h flows.dat input.sff
And then, with gnuplot do:
gnuplot> set view map
gnuplot> set logscale cb
gnuplot> splot "flows.dat" matrix with image
comments powered by Disqus