Functional bash bracketing

by Ketil; July 11, 2008

My current development project is an EST pipeline. For various reasons, it is implemented in shell — bash, to be exact. In other words, the pipeline is a script, or rather a set of scripts, that will tie together the various stages: masking, clustering, assembly, and annotation.

As in any program, there are many occasions where you want to effect some particular change during some part of the program. The archetypical example is allocation of local variables. After allocation, the variables are then available to the program until they run out of scope, they then get deallocated automatically. The technique can be generalized beyond this. For instance, you (or rather I) may want to set a $STAGE variable that indicates the current processing stage, and which should be unset when the stage has finished executing. Or you may want to run some processing in a different directory, in which case you really want to remember to return to the previous directory when you finish. The purpose of bracketing is to wrap a section of code with an initial part to be run in advance, and a final part to be run afterwards.

Some examples

When I toyed with PHP ages ago, I’d often find myself building a section of a page by a) generating a header with some opening tags, b) generating some content, and c) generating a footer with some closing tags. And when the exact contents of these pieces depend on various factors, and such sections would nest in complicated ways, it should not be a surprise that getting a) and c) to correspond exactly could be a challenge. With absolutely no enforcement by the language (which may have improved in later years, I wouldn’t know), this was very fragile.

For HTML the solution is simple: instead of generating open and close tags separately, have a function take the tag and its contents, and output the contents appropriately surrounded by open and close tags. If you build the whole document this way, you guarantee that each open tag will have a matching close tag, and that tags will be properly nested. Another example is Common Lisps with-open-file macro.

In Haskell, there’s bracket (in Control.Exception), which in addition to being vastly more general also is a regular function, thus once again proving Haskell’s vast technical and moral superiority over the more pedestrian languages….but I digress. I suspect the rather original name is supposed to allude to how brackets (as in those banana-shaped glyphs surrounding this text) consist of an opening bracket, some contents, and a closing bracket. Anyway, we like Haskell, so we use Haskell terminology.

As a final note, observe that brackets are similar to stack allocation, manual resource management is similar to manual memory management, while using finalizers/destructors is similar to garbage collection. (It’s tempting to add “pick any two”.)

Generalized bracket

While implementing the EST pipeline, I found myself needing, and implementing, several bracket-like functions (including the two previously mentioned: setting and unsetting a variable, and running a subcomputation in a separate directory). Thus, the question that poses itself is: Is it possible to do this in a more general way, akin to Haskell’s bracket? Here’s my currently best attempt:

bracket(){
    CLOSE=$2
    eval $1
    shift; shift
    eval $*
    eval $CLOSE
}

First we store the second parameter, which will be the “close” action in a variable. We then execute the first parameter (the “open” action), using eval so that variables can be set etc. We then skip the two first arguments, using shift twice, then eval the main action, and finally, eval the “close” action.

This allows stuff like:

bracket "mkdir mytmpdir && pushd mytmpdir" "popd" mkfiles

where the mkfiles function is run inside a temporary directory, and where execution resumes in the original directory after completion. Another example is

bracket "echo Entering first stage; STAGE=first" "STAGE=none" echo Current stage is '$STAGE'

Note the careful quoting of the variable with single quotes, we don’t want $STAGE to evaluated before it is set in the bracket function. In other words, the single quotes lets us pass the literal string $STAGE, sort of pass by name semantics instead of the default pass by value.

Perfection is the enemy…

If you are an experienced shell programmer, you may at this point have formed an opinion that I am not. And you’d be right, of course, but even I can see that there’s (at least) one obvious bug: we define a global variable named CLOSE. Not only does this have the potential to clash with an existing variables, it also prevents recursive calls to bracket. Possibly, we should generate variable names, or have $CLOSE be a stack, or something…But hey, Mr. Know-it-all, if you’re so damn good, why not post a comment explaining how it’s really done?

In other words: feedback and comments are most welcome.

comments powered by Disqus

Feedback? Please email ketil@malde.org.