Biohaskell

bioinformatics and haskell

Tools for pyrosequencing analysis

I recently did a brief presentation of the set of tools I’ve developed for analyzing pyrosequences (the Roche 454 variety).  Nothing spectacular, just an overview of various ways of slicing and dicing SFF files using tools written in Haskell.  For lack of a better place to put it, I’ll drop my slides below.
flowers

Searching for poly(A) tails

I’m currently involved in a project where we study, among other things, the 3′UTR and poly-A tails of certain genes. For this, is of course important to accurately identify the poly-A tail in each transcript, but I couldn’t find any program or tool to do just that. Presumably the task is considered too [...]

Installing the software on Ubuntu 9.10

Ubuntu 9.10, nicknamed Karmic Koala, is about to be released, and in a moment of idleness, I upgraded my old 9.04 install to the latest beta.  Upgrading is always generates a slight feeling of dread,  taking the plunge from the cozy stability of bugs I’ve learned to work around, into the great unknown, but it [...]

A (too) brief Biohaskell presentation

I was recently in Trondheim, and got an opportunity to present Haskell to an audience of bioinformaticians.  Alas, it is hard to describe Haskell in all its glory to the uninitiated in forty-five minutes, and especially when I also wanted to talk a bit about the application to bioinformatics.  I left in the belief that [...]

Parsing ints

A recurring theme on the Haskell mailing lists is how to quicly parse a file consisting of integers.  Often, this comes up in the contest of benchmarking, but a real example of integer-filled files are the quality data that often accompanies Fasta sequence files.  When investigating one of my programs that seemed a bit on [...]

454 sequencing and parsing the SFF binary format

Roche’s 454 sequencing technology can produce biological sequence data on a scale that exceeds traditional Sanger sequencing by orders of magnitude.   Due to the fundamentally different method used to generate the sequences, we would like to investigate the raw data and see if we can quantify — and maybe also reduce the number or severity [...]

Optimization again: befuddled by bytestrings

I’ve been spending the last couple of weeks working on an indexing scheme for sequences, using Bryan O’Sullivan’s Bloom filters.  Now, it turned out that when Bryan tested out the code, he found a curious problem:  Apparently, the indexing stage scaled quadratically with sequence length.  This wouldn’t have been so strange, were it not for [...]

The FastQ file format for sequences

It was just brought to my attention that people have started to use a new file format for sequences.  This format, called ‘FastQ’ combines both the sequence data itself and the quality data in one file.  That’s a nice idea, and I implemented support for it, tests, docs and all, in the bio library.  Runs [...]

A plan for Bloom filters

Bloom filters is apparently a relatively old technology, dating from the 1970s or so, but it has somehow escaped my radar until Bryan O’Sullivan posted a message to the haskell mailing list announcing a high-performance implementation in Haskell, perhaps to support a chapter in the upcoming book.  You can read all about Bloom filters on [...]

Functional bash: bracketing

My current development project is an EST pipeline.  For various reasons, it is implemented in shell — bash, to be exact.  In other words, the pipeline is a script, or rather a set of scripts, that will tie together the various stages: masking, clustering, assembly, and annotation.
As in any program, there are many occasions where [...]

Next Page »