Posts tagged ‘sequence analysis’
-
Presentation on the anatomy of de novo genome projects
- October, 2012
sequence analysis, de novo, genome sequencingSlides from a presentation I just gave, illustrating de novo genome assembly and annotation. Hopefully useful to y'all.
() -
Transitive alignments (and why they matter)
- October, 2012
sequence analysis, alignments, annotationTransitive alignments are calculated using an intermediate sequence database to help identify relationships that are highly diverged in the query and target sequences. This makes it possible to construct pairwise alignments well into the "twilight zone".
() -
Calculating insert stats from BAM files
- September, 2012
sequence analysis, BAMA small tool that reads a BAM file containing aligned reads (typically Illumina paired ends reads, but any paired type will do) and outputs various statistics on them.
() -
Compressing biological sequences
- June, 2011
sequence analysis, SFF, 454, parsingAs sequence data continue to grow exponentially in volume, compression is becoming more interesting. Here's a quick test of a couple of popular algorithms, and how to get improved compression rates by simple rearrangements of the data.
() -
Searching for poly(A) tails
- December, 2009
sequence analysis, fasta, ESTPoly-A tails is an inherent feature of mRNA sequences, and although they are useful to identify the end of transcripts, often we want to remove them before further analysis. Here is an algorithm to identify poly-A tails in FASTA sequences.
() -
454 sequencing and parsing the SFF binary format
- November, 2008
sequence analysis, SFF, 454, parsingRoche's 454 sequencing technology can produce biological sequence data on a scale that exceeds traditional Sanger sequencing by orders of magnitude. Due to the fundamentally different method used to generate the sequences, we would like to investigate the raw data and see if we can quantify -- and maybe also reduce the number or severity of -- errors. This means reading the binary SFF format. Below, we'll dissect the SFF format, and describe a Haskell implementation.
() -
A plan for Bloom filters
- July, 2008
sequence analysis, bloom filtersThe Bloom filter is an interesting data structure, providing a probabilistic set. Here are some ideas for using them for bioinformatics.
() -
Optimization week: making Haskell go faster
- May, 2008
sequence analysis, annotation, optimization, profilingOptimization being all the rage, we investigate how we can make a program that annotates sequences with GO terms, go faster (no pun intended). We use GHC profiling to identify performance culprits, and try to improve them.
() -
Can you spare five minutes?
- May, 2008
sequence analysis, clustering..to write a simple, but useful and efficient bioinformatics program? How to implement a simple tool to process ACE files, and extract clustering information and the contained sequences.
() -
Cleaning up sequences
- May, 2008
sequence analysis, vector maskingSome notes about cleaning up sequences. Sanger sequencing has perpaps become a bit old-skool in the years after this was written, but still offer long read lengths and good quality. The complicated protocol introduces a bit of junk, though, and removing this isn't as easy as it should be.
()

