Presentation on the (lack of) data management practices

by Ketil Malde; June 2, 2011

I also gave a presentation at the ICES data center in Copenhagen last week. The data center is mostly concerned with storing oceanographic data (temperature, salinity and other parameters) and fisheries data for stock assessment and quotas, but I talked about storing data from (molecular and/or experimental) biology. So while the traditional data types are mainly tabular with a fairly fixed set of metadata (dates, positions, ships, and so on), my data consists of a large variety of file types, methods, and thus my approach to managing it is much more ad-hoc.

The most interesting point to me is my set of references, stuff that I think illustrates important points or concepts. Most of this material is commonly bandied about on the net, but perhaps not as commonly mentioned in all communities? Anyway, I consider them classics, and if more people would read them, the world would be a better place. Or at least, we’d waste a lot less effort. They are:

This describes Brooks’ experiences with software development at IBM, and it deals with how productivity doesn’t scale with the number of people working on a problem. It also introduces concepts like how the second system is much more ambitious, and thus more likely to fail.

This essay discusses classification systems, and illustrates why ambitious taxonomies and ontologies often fail. In this day and age of web services and XML schemas, it’s an interesting read.

Another essay, which talks about the different design philosophies behind Lisp and Unix, how the Lisp one is obviously better, and how Unix is, equally obviously, more successful. Although both philosophies embrace the same concepts (simplicity, correctness, consistency, and completeness), the priorities are slightly different.

Oh, yes, if you got past those links, my slides are here.

comments powered by Disqus
Feedback? Please email ketil@malde.org.