Biohaskell

bioinformatics and haskell

31  07 2008

The Haskell Bioinformatics Library

This is a collection of data structures and algorithms that most of the other stuff on this site depends on. Much of the functionality is stable, robust, and even well-documented, some is less so.  I largely apply the itch-scratching software development process (ISSDP), so the current feature set and its level of completeness is dependent on what I need in my other work.  If you don’t like that, I am happy to accept your patches :-)

Acquiring it

Either:

or:

  • Download the tarball from Hackage

Building it

The easiest way to acquire this – or any other – Haskell library, is probably to use cabal-install.  Get it up and working, and a simple ‘cabal install bio’ should get the latest version from hackage, including all dependencies, compile everything and install it in your home directory (~/.cabal/bin, I think)

To install manually, you need to acquire a working GHC (or possibly another Haskell system).  You also need the following external libraries:

  • QuickCheck — for unit tests
  • binary — mainly for dealing with the TwoBit sequence format
  • tagsoup — for parsing XML output from Blast
  • Parsec — for dealing with various file formats

You should be able to get what you need from http://hackage.haskell.org/.

You can then build with ‘make’, doing either ‘make install’ if you can sudo, or ‘make user_install’ if you can not.  Of course, the Makefile just proxies for the regular Cabal routine, which will work just as well:

runhaskell Setup configure
runhaskell Setup build
sudo runhaskell Setup install

(Use --prefix=$HOME to the configure step, and remove the sudo from the install step, if you don’t want to install as root.)

Using it

The best tutorial is probably looking at my other code — not much, I know. In particular, the cluster_tools package contain a bunch of small, self-contained utilities that can provide a starting point. Apart from that, there’s the code itself, and the Haddock documentation. And of course I’ll try to answer any questions, and help out any way I can.   Current list of features includes:

Sequence data

Supporting protein and nucleotide sequences and conversion between them, quality data, reading and writing FASTA and FastQ formatted files, reading TwoBit and PHD formats.

Alignments

Rudimentary support for doing alignments – including dynamic adjustment of scores based on sequence quality – and BLAST, Bowtie and BLAT output parsing.

Support for reading ACE files, as output from e.g., CAP3 and Phrap.

Partly implemented single linkage clustering, and multiple alignment.

Annotation information

In addition to BLASTX ouput, there is support for Gene Ontology (GO) data, and KEGG.

Cabal badgeQC badgeHPC badge

6 Responses to “The Haskell Bioinformatics Library”

  1. [...] to be different. Could it be bloomfilter not being a good consumer for the generated words?  The bio library messing up FASTA parsing?  Something else [...]

  2. While trying to install your bio library I run iinto the following circular dependency problem:

    cabal install bio
    Resolving dependencies…
    cabal: cannot configure tagsoup-0.8. It requires QuickCheck ==2.1.*
    For the dependency on QuickCheck ==2.1.* there are these packages:
    QuickCheck-2.1, QuickCheck-2.1.0.1, QuickCheck-2.1.0.2 and QuickCheck-2.1.0.3.
    However none of them are available.
    QuickCheck-2.1 was excluded because bio-0.4 requires QuickCheck <2

    Is there anything to be done to fix this?
    j131

  3. The second way of installing it -ea by getting source through darcs and manually installing dependencies also do not work:

    Bio/Alignment/BlastXML.hs:41:12:
    `Tag’ is not applied to enough type arguments
    Expected kind `*’, but `Tag’ has kind `* -> *’
    In the type signature for `getFrom’:
    getFrom :: [Tag] -> String -> String

    Could you please perhaps state clearly which combination of the complier/libraries is recommended?

  4. I’ll look into this, but I think something changed in tagsoup between 0.6 and 0.8, and sloppiness on dependency versioning on my part makes this break. This is how ‘./Setup.hs configure -v’ on my system sees the world:

    Configuring bio-0.4.4...
    Dependency QuickCheck >=2: using QuickCheck-2.1.0.2
    Dependency array -any: using array-0.3.0.0
    Dependency base ==4.*: using base-4.2.0.0
    Dependency binary -any: using binary-0.5.0.2
    Dependency bytestring >=0.9.1: using bytestring-0.9.1.5
    Dependency containers -any: using containers-0.3.0.0
    Dependency mtl -any: using mtl-1.1.0.2
    Dependency old-time -any: using old-time-1.0.0.3
    Dependency parallel -any: using parallel-1.1.0.1
    Dependency parsec -any: using parsec-2.1.0.1
    Dependency random -any: using random-1.0.0.2
    Dependency tagsoup >=0.4: using tagsoup-0.6

  5. Hi j131,

    you need to download and to install tagsoup 0.4

    download from here:

    http://hackage.haskell.org/package/tagsoup-0.4

    then, you need to install using:

    runhaskell Setup configure or
    runhaskell Setup configure –user
    runhaskell Setup build
    runhaskell Setup install

    so, download bio-0.4 haskell tar source package and untar package.

    backup and edit bio.cabal following line:

    Build-Depends: base>=3 && =1.2.0.0, binary, tagsoup= 0.9.1, containers, array,
    parallel, parsec, random, old-time, mtl

    I changed QuickCheck to >=1.2 and tagsoup =1.2.0.0: using QuickCheck-2.1.0.3
    Dependency tagsoup <=0.4: using tagsoup-0.4

    ok, good then install biohaskell lib using:

    runhaskell Setup configure –user
    runhaskell Setup build

    opps so, you will see and error

    [18 of 43] Compiling Bio.Sequence.TwoBit ( Bio/Sequence/TwoBit.hs, dist/build/Bio/Sequence/TwoBit.o )

    Bio/Sequence/TwoBit.hs:37:31:
    Module `Test.QuickCheck' does not export `check'

    to fix error open Bio/Sequence/TwoBit.hs haskell source file and locate line 37 with your vi or emacs editor

    by default you will see:

    import Test.QuickCheck hiding (check) — QC 1.0
    –import Test.QuickCheck hiding ((.&.)) — QC 2.0

    please make changes like this:

    – import Test.QuickCheck hiding (check) — QC 1.0
    import Test.QuickCheck hiding ((.&.)) — QC 2.0

    this is because I am using QuickCheck 2

    ok, good , try to build once again :

    runhaskell Setup build

    opps error

    [35 of 43] Compiling Bio.Util.TestBase ( Bio/Util/TestBase.hs, dist/build/Bio/Util/TestBase.o )

    Bio/Util/TestBase.hs:81:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:85:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:90:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:98:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:105:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:109:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:117:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:125:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    Bio/Util/TestBase.hs:132:4:
    `coarbitrary' is not a (visible) method of class `Arbitrary'

    you will need to edit bio.cabal, find Bio.Util.TestBase and delete it then save file.

    runhaskell Setup install
    Installing library in $HOME/.cabal/lib/bio-0.4/ghc-6.12.1
    Registering bio-0.4…

    ls $HOME/.cabal/lib/bio-0.4/ghc-6.12.1/
    Bio HSbio-0.4.o libHSbio-0.4.a

    ready, that's all

    ketil, what's is your opinion ?

    Ciao

  6. Hi hackob, and thanks for the extensive walk-through.

    There are two issues you are running into:

    1) the interface changed between tagsoup 0.6 and 0.8
    2) QuickCheck 2 is different from QC 1

    In the current biolib (i.e. my darcs repo), the dependencies look like:

    | Build-Depends: base>=4 && <5, QuickCheck>=2, binary==0.4.*, tagsoup>=0.4 && <0.8, bytestring >= 0.9.1,
    | containers, array, parallel, parsec, random, old-time, mtl

    I have also received a patch for biolib to work with both old and new tagsoup, which would obliviate the need for the upper limit to tagsoup versions.

    I think the biggest problem here is that I haven’t kept up on Hackage, the 0.4 version is quite old. I’ve rolled out a new tarball (bio-0.4.4 at Hackage). You can usually get updated sources from “darcs get http://malde.org/biohaskell/biolib

    -k

Leave a Reply

You must be logged in to post a comment.