Roche’s 454 sequencing technology can produce biological sequence data on a scale that exceeds traditional Sanger sequencing by orders of magnitude. Due to the fundamentally different method used to generate the sequences, we would like to investigate the raw data and see if we can quantify — and maybe also reduce the number or severity […]
Posted on November 14th, 2008 by ketil
Filed under: Optimization, Examples | 3 Comments »
I’ve been spending the last couple of weeks working on an indexing scheme for sequences, using Bryan O’Sullivan’s Bloom filters. Now, it turned out that when Bryan tested out the code, he found a curious problem: Apparently, the indexing stage scaled quadratically with sequence length. This wouldn’t have been so strange, were it not for […]
Posted on October 24th, 2008 by ketil
Filed under: Optimization, Examples, EST analysis | No Comments »
It was just brought to my attention that people have started to use a new file format for sequences. This format, called ‘FastQ’ combines both the sequence data itself and the quality data in one file. That’s a nice idea, and I implemented support for it, tests, docs and all, in the bio library. Runs […]
Posted on September 9th, 2008 by ketil
Filed under: Examples, EST analysis | No Comments »
Bloom filters is apparently a relatively old technology, dating from the 1970s or so, but it has somehow escaped my radar until Bryan O’Sullivan posted a message to the haskell mailing list announcing a high-performance implementation in Haskell, perhaps to support a chapter in the upcoming book. You can read all about Bloom filters on […]
Posted on July 31st, 2008 by ketil
Filed under: Optimization, Examples, EST analysis | 1 Comment »
My current development project is an EST pipeline. For various reasons, it is implemented in shell — bash, to be exact. In other words, the pipeline is a script, or rather a set of scripts, that will tie together the various stages: masking, clustering, assembly, and annotation.
As in any program, there are many occasions where […]
Posted on July 11th, 2008 by ketil
Filed under: Examples, EST analysis | No Comments »
As a consequence of IWC policies, the Institute of Marine Research is required to store genetic identification of each minke whale that is hunted. This of course means that people will come to me for help in bridging the gap between test tubes and the databases by providing some analysis tools that can extract […]
Posted on May 27th, 2008 by ketil
Filed under: Examples | No Comments »
It seems to be optimization week on the haskell café mailing list. Inspired by a combination of Don Stewart’s blog post about how to optimize for speed and the sorry performance of my xml2x program, I thought this would be good time to see if things could be improved. In its current […]
Posted on May 18th, 2008 by ketil
Filed under: Optimization, Examples | 3 Comments »
…to write a simple, but useful and efficient bioinformatics program? Here’s how to build a simple tool to extract a clustering from an ACE file, using functionality in the bioinformatics library. It is about three lines of actual code, it is fast, and it is efficient.
Posted on May 9th, 2008 by ketil
Filed under: Examples | 1 Comment »