BHLog

bioinformatics and haskell

454 sequencing and parsing the SFF binary format

Roche’s 454 sequencing technology can produce biological sequence data on a scale that exceeds traditional Sanger sequencing by orders of magnitude.   Due to the fundamentally different method used to generate the sequences, we would like to investigate the raw data and see if we can quantify — and maybe also reduce the number or severity […]

Optimization again: befuddled by bytestrings

I’ve been spending the last couple of weeks working on an indexing scheme for sequences, using Bryan O’Sullivan’s Bloom filters.  Now, it turned out that when Bryan tested out the code, he found a curious problem:  Apparently, the indexing stage scaled quadratically with sequence length.  This wouldn’t have been so strange, were it not for […]

A plan for Bloom filters

Bloom filters is apparently a relatively old technology, dating from the 1970s or so, but it has somehow escaped my radar until Bryan O’Sullivan posted a message to the haskell mailing list announcing a high-performance implementation in Haskell, perhaps to support a chapter in the upcoming book.  You can read all about Bloom filters on […]

Optimization week: making Haskell go faster

It seems to be optimization week on the haskell café mailing list. Inspired by a combination of Don Stewart’s blog post about how to optimize for speed and the sorry performance of my xml2x program, I thought this would be good time to see if things could be improved. In its current […]