Evaluation-orthogonal serialisation of Haskell data, as a library
Go to file
2015-07-09 23:59:34 +05:30
cbits support base 4.8.0 (TypeRep changed) 2015-06-22 23:19:14 +10:00
GHC Fix typo which caused errors: gHCFP - > ghcFP 2015-07-09 23:59:34 +05:30
Test detabbing 2015-06-22 23:23:20 +10:00
.gitignore Extended test program (NOT WORKING AT THE MOMENT) 2014-07-26 20:48:27 +02:00
LICENSE packaging: License file, Setup.hs, cabal file modifications 2014-08-28 15:53:46 +02:00
pack.old Some adjustments, towards more robust version 2014-07-24 20:35:52 +02:00
packman.cabal bump version 2015-06-22 23:30:13 +10:00
README.md Create README.md 2014-09-12 10:28:36 +02:00
Setup.hs packaging: License file, Setup.hs, cabal file modifications 2014-08-28 15:53:46 +02:00

packman

Evaluation-orthogonal serialisation of Haskell data, as a library

In brief, this is the packing code of Eden and GpH, ripped out of the runtime system and rewritten to make it thread-safe and return exception codes when it fails.

Most of this work was already there when I presented at HIW last year, see slides from HIW 2013 but all code was in the runtime system then.

The basic idea was described earlier, in a 2010 IFL paper

Haskell API

A Haskell API around this (C-implemented) functionality provides the following API:

trySerializeWith :: a -> Int -> IO (Serialized a) -- Int is maximum buffer size to use
trySerialize :: a -> IO (Serialized a)            -- uses default (maximum) buffer size
deserialize :: Serialized a -> IO a

Note that this serialisation is orthogonal to evaluation: the argument is serialised in its current state of evaluation, it might be entirely unevaluated (a thunk) or only partially evaluated (containing thunks).

The Serialized a type is an opaque representation of serialised Haskell data (it contains a ByteArray). Serialized a provides instances for Show and Read which satisfy read . show == id, and a Binary instance. For these instances, types are checked dynamically type-safe, therefore the Typeable context.

Advantages

The library enables sending and receiving data between different nodes of a distributed Haskell system. This is where the code originated: the Eden runtime system. You might want to read a related paper which describes parts of it ( IFL 2006 ).

Apart from this obvious application, the functionality can be used to optimise programs by memoisation (across different program runs), and to checkpoint program execution in selected places. Both uses are exemplified in the slide set linked above.

Drawbacks

As serialisation essentially provides a way to duplicate data (and therefore destroy sharing), certain data should not be serialisable. Most prominently, these include the mutable types MVar, IORef, and all STM-related types. However, the presence of those types is sometimes not apparent; they occur within the thunk computing a value of different type. The most annoying example is lazy file I/O: lazily reading a file entails holding a "half-closed" file handle (essentially an MVar), and will make serialisation fail for the read data. The ugly solution in the library: the API signals such conditions as exceptions.

Another limitation is that serialised data can only be used by the very same binary. This is however common for many approaches to distributed programming using functional languages.

If you find this library useful, I (Jost Berthold) would be happy to hear from you.

Acknowledgements:

Phil Trinder suggested to separate serialisation from other functionality of the parallel runtime system in 2009. Hans-Wolfgang Loidl introduced me to the GUM packing code, worked with me on the parallel runtime system for a long time, and always provided valuable feedback. Michael Budde and Åsbjørn Jøkladal assembled the first cabalised library version as a student project in our course "Topics in programming languages" 2014 (where the topic was parallel functional programming).