Fixes in code comments and Haddock documentation (no code)

This commit is contained in:
Jost Berthold 2015-11-27 23:44:26 +11:00
parent 34b56af22f
commit 0c2776b164
9 changed files with 99 additions and 75 deletions

View File

@ -4,9 +4,9 @@
{- |
Module : GHC.Packing
Copyright : (c) Jost Berthold, 2010-2014,
Copyright : (c) Jost Berthold, 2010-2015,
License : BSD3
Maintainer : jb.diku@gmail.com
Maintainer : jost.berthold@gmail.com
Stability : experimental
Portability : no (depends on GHC internals)
@ -31,11 +31,12 @@ heap data:
The routine will throw a 'PackException' if an error occurs inside the
C code which accesses the Haskell heap (see @'PackException'@).
In presence of concurrent threads, another thread might be evaluating
data /referred to/ by the data to be serialised. It would be nice to
/block/ the calling thread in this case, but this is not possible in
the library version (see <#background Background Information> below).
'trySerialize' variant will instead signal the condition as
'PackException' 'P_BLACKHOLE'.
data /referred to/ by the data to be serialised. In this case, the calling
thread will /block/ on the ongoing evaluation and continue when evaluated
data is available.
Internally, there is a 'PackException' 'P_BLACKHOLE' to signal the
condition, but it is hidden inside the core library
(see <#background Background Information> below).
The inverse operation to serialisation is
@ -191,27 +192,29 @@ and better usability.
The original primitive @'serialize'@ is modified and now returns error
codes, leading to the following type (again paraphrasing):
> serialize# :: a -> IO ( Int# , ByteArray# )
> trySerialize# :: a -> IO ( Int# , ByteArray# )
where the @Int#@ encodes potential error conditions returned by the runtime.
A second primitive operation has been defined, which considers the presence
of concurrent evaluations of the serialised data by other threads:
A second primitive operation has been defined, which uses a pre-allocated
@ByteArray#@
> trySerialize# :: a -> IO ( Int# , ByteArray# )
> trySerializeWith# :: a -> ByteArray# -> IO ( Int# , ByteArray# )
Further to returning error codes, this primitive operation will not block
Further to returning error codes, the newer primitive operation do not block
the calling thread when the serialisation encounters a blackhole in the
heap. While blocking is a perfectly acceptable behaviour (making packing
behave analogous to evaluation wrt. concurrency), the @'trySerialize'@
variant allows one to explicitly control it and avoid becoming unresponsive.
heap.
It would be possible to observe the existence of blackholes from Haskell by
the return code of these primitive operation. This could - in theory - be
used to explicitly control and avoid blocking (avoiding unresponsive behaviour).
In practice, however, making blackholes observable from Haskell is
certainly undesirable. Therefore, the primitive operation will return
the address of the blackhole. This makes it possible to encode blocking on the blackhole at the Haskell level (see code in the @GHC.Packing.Core@ module).
certainly undesirable. The primitive operations return the address of the
blackhole, and the caller will block on this blackhole at
the Haskell level (see code in the @GHC.Packing.Core@ module).
The Haskell layer and its types protect the interface function @'deserialize'@
from being applied to grossly wrong data (by checking a fingerprint of the
executable and the expected type), but deserialisation is fragile by nature
executable and the expected type), but deserialisation is still rather fragile
(unpacking code pointers and data).
The primitive operation in the runtime system will only detect grossly wrong
formats, and the primitive will return error code @'P_GARBLED'@ when data

View File

@ -5,9 +5,9 @@
{-|
Module : GHC.Packing
Copyright : (c) Jost Berthold, 2010-2014,
Copyright : (c) Jost Berthold, 2010-2015,
License : BSD3
Maintainer : jb.diku@gmail.com
Maintainer : jost.berthold@gmail.com
Stability : experimental
Portability : no (depends on GHC internals)
@ -41,13 +41,13 @@ import Control.Exception(throw)
trySerialize :: a -> IO (Serialized a) -- throws PackException (RTS)
trySerialize x = trySerializeWith x defaultBufSize
-- | A default buffer size, used when using the old API
-- | default buffer size used by trySerialize
defaultBufSize :: Int
defaultBufSize = 10 * 2^20 -- 10 MB
-- | Extended interface function: Allocates a buffer of given size (in
-- bytes), serialises data into it, then truncates the buffer to the
-- actually required size before returning it (as @'Serialized' a@)
-- required size before returning it (as @'Serialized' a@)
trySerializeWith :: a -> Int -> IO (Serialized a) -- using instance PrimMonad IO
trySerializeWith dat bufsize
= do buf <- newByteArray bufsize

View File

@ -3,9 +3,9 @@
{-|
Module : GHC.Packing.PackException
Copyright : (c) Jost Berthold, 2010-2014,
Copyright : (c) Jost Berthold, 2010-2015,
License : BSD3
Maintainer : jb.diku@gmail.com
Maintainer : jost.berthold@gmail.com
Stability : experimental
Portability : no (depends on GHC internals)
@ -13,19 +13,20 @@ Exception type for packman library, using magic constants #include'd
from a C header file shared with the foreign primitive operation code.
'PackException's can occur at Haskell level or in the foreign primop.
The Haskell-level exceptions all occur when reading in
'GHC.Packing.Serialised' data, and are:
* 'P_BinaryMismatch': the serialised data have been produced by a
All Haskell-level exceptions are cases of invalid data when /reading/
and /deserialising/ 'GHC.Packing.Serialised' data:
* 'P_BinaryMismatch': serialised data were produced by a
different executable (must be the same binary).
* 'P_TypeMismatch': the serialised data have the wrong type
* 'P_TypeMismatch': serialised data have the wrong type
* 'P_ParseError': serialised data could not be parsed (from binary or
text format)
The other exceptions are return codes of the foreign primitive
operation, and indicate errors at the C level. Most of them occur when
serialising data; the exception is 'P_GARBLED' which indicates corrupt
serialised data.
The exceptions caused by the foreign primops (return codes)
indicate errors at the C level. Most of them can occur when
serialising data; the exception is 'P_GARBLED' which indicates that
serialised data is garbled.
-}
@ -50,20 +51,21 @@ data PackException =
P_SUCCESS -- ^ no error, ==0.
-- Internal code, should never be seen by users.
| P_BLACKHOLE -- ^ RTS: packing hit a blackhole.
-- Used internally, should probably not be seen by users.
-- Used internally, not passed to users.
| P_NOBUFFER -- ^ RTS: buffer too small
| P_CANNOTPACK -- ^ RTS: contains closure which cannot be packed (MVar, TVar)
| P_UNSUPPORTED -- ^ RTS: contains unsupported closure type (implementation missing)
| P_IMPOSSIBLE -- ^ RTS: impossible case (stack frame, message,...RTS bug!)
| P_GARBLED -- ^ RTS: corrupted data for deserialisation
-- Error codes from inside Haskell
| P_ParseError -- ^ Haskell: Packet data could not be parsed
| P_BinaryMismatch -- ^ Haskell: Executable binaries do not match
| P_TypeMismatch -- ^ Haskell: Packet data encodes unexpected type
deriving (Eq, Ord, Typeable)
-- | decode an 'Int#' to a @'PackException'@. Magic constants are read
-- from file /cbits/Errors.h/.
-- | decodes an 'Int#' to a @'PackException'@. Magic constants are read
-- from file /cbits///Errors.h/.
decodeEx :: Int## -> PackException
decodeEx #{const P_SUCCESS}## = P_SUCCESS -- unexpected
decodeEx #{const P_BLACKHOLE}## = P_BLACKHOLE
@ -92,7 +94,7 @@ instance Show PackException where
instance Exception PackException
-- | internally used: checks if the given code indicates 'P_BLACKHOLE'
-- | internal: checks if the given code indicates 'P_BLACKHOLE'
isBHExc :: Int## -> Bool
isBHExc #{const P_BLACKHOLE}## = True
isBHExc e## = False

View File

@ -3,9 +3,9 @@
{-|
Module : GHC.Packing.Type
Copyright : (c) Jost Berthold, 2010-2014,
Copyright : (c) Jost Berthold, 2010-2015,
License : BSD3
Maintainer : Jost Berthold <jb.diku@gmail.com>
Maintainer : Jost Berthold <jost.berthold@gmail.com>
Stability : experimental
Portability : no (depends on GHC internals)
@ -13,8 +13,8 @@ Portability : no (depends on GHC internals)
The data type @'Serialized' a@ includes a phantom type @a@ to ensure
type safety within one and the same program run. Type @a@ can be
polymorphic (at compile time, that is) when @Serialized a@ is not used
apart from being argument to @deserialize@.
polymorphic (at compile time, that is) when @'Serialized' a@ is not used
apart from being argument to @'deserialize'@.
The @Show@, @Read@, and @Binary@ instances of @Serialized a@ require an
additional @Typeable@ context (which requires @a@ to be monomorphic)
@ -67,7 +67,7 @@ import Control.Exception(throw)
import GHC.Packing.PackException
-- | The type of Serialized data. Phantom type 'a' ensures that we
-- unpack the expected type do not unpack rubbish.
-- unpack data as the expected type.
data Serialized a = Serialized { packetData :: ByteArray# }
{- $ShowReadBinary
@ -75,14 +75,14 @@ data Serialized a = Serialized { packetData :: ByteArray# }
The power of evaluation-orthogonal serialisation is that one can
/externalise/ partially evaluated data (containing thunks), for
instance write it to disk or send it over a network.
Therefore, the module defines a 'Binary' instance for 'Serialized a',
as well as instances for 'Read' and 'Show'@ which satisfy
@ read . show == id :: 'Serialized' a -> 'Serialized' a@.
> read . show == id :: 'Serialized' a -> 'Serialized' a
The phantom type is enough to ensure type-correctness when serialised
data remain in one single program run. However, when data from
previous runs are read in from an external source, their type needs to
previous runs are read from an external source, their type needs to
be checked at runtime. Type information must be stored together with
the (binary) serialisation data.
@ -90,7 +90,7 @@ The serialised data contain pointers to static data in the generating
program (top-level functions and constants) and very likely to
additional library code. Therefore, the /exact same binary/ must be
used when reading in serialised data from an external source. A hash
of the executable is therefore included in the representation as well.
of the executable is included in the representation to ensure this.
-}
@ -121,11 +121,11 @@ showWArray arr = unlines [ show i ++ ":" ++ unwords (map showH row)
where (first,rest) = splitAt 4 xs
-----------------------------------------------
-- | Reads the format generated by the (@'Show'@) instance, checks
-- | Reads the format generated by the 'Show' instance, checks
-- hash values for executable and type and parses exactly as much as
-- the included data size announces.
instance Typeable a => Read (Serialized a)
-- using ReadP parser (base-4.x), eats
-- using ReadP parser (base-4.x)
where readsPrec _ input
= case parseP input of
[] -> throw P_ParseError -- no parse
@ -138,13 +138,14 @@ instance Typeable a => Read (Serialized a)
other-> throw P_ParseError
-- ambiguous parse for packet
-- | Packet Parser: read header with size and type, then iterate over
-- array values, reading several hex words in one row, separated by
-- tab and space. Packet size needed to avoid returning a prefix.
-- | Packet Parser, reads the format generated by the @Read@ instance.
-- Could also consume other formats of the array (not implemented).
-- Returns: (data size in words, type fingerprint, array values)
parseP :: ReadS (Int, FP, [TargetWord])
parseP = readP_to_S $
-- read header with size and type, then iterate over array values,
-- reading several hex words in one row, separated by
-- tab and space. Packet size needed to avoid returning a prefix.
do string "Serialization Packet, size "
sz_str <- munch1 isDigit
let sz = read sz_str::Int
@ -222,7 +223,7 @@ instance Typeable a => Binary (Serialized a) where
-- fields, to be able to /read/ fingerprints
data FP = FP Word64 Word64 deriving (Read, Show, Eq)
-- | comparing 'FP's
-- | checks whether the type of the given expression matches the given Fingerprint
matches :: Typeable a => a -> FP -> Bool
matches x (FP c1 c2) = f1 == c1 && f2 == c2
where (GHC.Fingerprint.Fingerprint f1 f2) = typeRepFingerprint (typeOf x)
@ -233,11 +234,11 @@ typeRepFingerprint typeRep = ghcFP
where TypeRep ghcFP _ _ = typeRep
#endif
-- | creating an 'FP' from a GHC 'Fingerprint'
-- | creates an 'FP' from a GHC 'Fingerprint'
toFP :: GHC.Fingerprint.Fingerprint -> FP
toFP (GHC.Fingerprint.Fingerprint f1 f2) = FP f1 f2
-- | creating a type fingerprint
-- | returns the type fingerprint of an expression
typeFP :: Typeable a => a -> FP
typeFP = toFP . typeRepFingerprint . typeOf

View File

@ -1,5 +1,5 @@
The packman serialisation library for GHC
Copyright (c) 2014, Jost Berthold <jb.diku@gmail.com>
Copyright (c) 2014-15, Jost Berthold <jost.berthold@gmail.com>
All rights reserved.
Redistribution and use in source and binary forms, with or without

View File

@ -1,18 +1,17 @@
packman
Packman
=======
Evaluation-orthogonal serialisation of Haskell data, as a library
In brief, this is the packing code of Eden and GpH, ripped out of the runtime system and rewritten to make it thread-safe and return exception codes when it fails.
Most of this work was already there when I presented at HIW last year, [see slides from HIW 2013](http://www.haskell.org/wikiupload/2/28/HIW2013PackingAPI.pdf) but all code was in the runtime system then.
The basic idea was described earlier, in a [2010 IFL paper](http://www.diku.dk/~berthold/papers/mainIFL10-withCopyright.pdf)
### Haskell API
A Haskell API around this (C-implemented) functionality provides the following API:
This package provides Haskell data serialisation independent of evaluation,
by accessing the Haskell heap using foreign primitive operations.
Any Haskell data structure (with a few limitations) can be serialised and
later deserialised in the same or a new run of the same program (that means,
the same executable file).
A Haskell API around the C-implemented core provides the following API:
```
trySerializeWith :: a -> Int -> IO (Serialized a) -- Int is maximum buffer size to use
@ -20,11 +19,14 @@ trySerialize :: a -> IO (Serialized a) -- uses default (maximum) buff
deserialize :: Serialized a -> IO a
```
Note that this serialisation is orthogonal to evaluation: the argument is serialised **in its current state of evaluation**, it might be entirely unevaluated (a thunk) or only partially evaluated (containing thunks).
Note that this serialisation is orthogonal to evaluation: the argument is
serialised **in its current state of evaluation**, it might be entirely
unevaluated (a thunk) or only partially evaluated (containing thunks).
The `Serialized a` type is an opaque representation of serialised Haskell data (it contains a `ByteArray`).
`Serialized a` provides instances for `Show` and `Read` which satisfy `read . show == id`, and a `Binary` instance.
For these instances, types are checked dynamically type-safe, therefore the `Typeable` context.
The `Serialized a` type is an opaque representation of serialised Haskell data (it
contains a `ByteArray`). `Serialized a` provides instances for `Show` and `Read`
which satisfy `read . show == id`, and a `Binary` instance. For these instances,
types are checked dynamically type-safe, therefore the `Typeable` context.
### Advantages
@ -39,10 +41,29 @@ The ugly solution in the library: the API signals such conditions as exceptions.
Another limitation is that serialised data **can only be used by the very same binary**. This is however common for many approaches to distributed programming using functional languages.
If you find this library useful, I (Jost Berthold) would be happy to hear from you.
If you find this library useful, I would be happy to hear from you. Patches are welcome.
Acknowledgements:
-----------------
Phil Trinder suggested to separate serialisation from other functionality of the parallel runtime system in 2009.
#### Reading material
In brief, this is the packing code of Eden and GpH, ripped out of the runtime
system and rewritten to make it thread-safe and return exception codes when
it fails.
Most of what is provided by the library was already there when I
presented at HIW in 2013,
[see slides from HIW 2013](http://www.haskell.org/wikiupload/2/28/HIW2013PackingAPI.pdf)
but all code was in the runtime system then.
The basic idea was described earlier, in a
[2010 IFL paper](http://www.diku.dk/~berthold/papers/mainIFL10-withCopyright.pdf),
including a study of possible applications, especially checkpointing
and memoisation.
#### Acknowledgements
The idea to separate serialisation from other functionality of the parallel runtime system was suggested by Phil Trinder in 2009.
Hans-Wolfgang Loidl introduced me to the GUM packing code, worked with me on the parallel runtime system for a long time, and always provided valuable feedback.
Kevin Hammond is the original author of the packing code used by packman and the Eden RTS. It has been rewritten a few times and improved by a number of people (including Phil Trinder and Hans-Wolfgang Loidl).
Michael Budde and Åsbjørn Jøkladal assembled the first cabalised library version as a student project in our course "Topics in programming languages" 2014 (where the topic was parallel functional programming).

View File

@ -1,6 +1,6 @@
module Main where
import GHC.Packing -- Data.Serialize.Packman
import GHC.Packing
import Control.Exception
data Foo = A | B | C | D deriving Show

View File

@ -1,6 +1,3 @@
{-
Some tests to
-}
-- module TestSerialisation(tests)
-- where

View File

@ -11,7 +11,7 @@ category: Serialization, Data, GHC
license: BSD3
license-file: LICENSE
author: Michael Budde, Ásbjørn V. Jøkladal, Jost Berthold
maintainer: jb.diku@gmail.com
maintainer: jost.berthold@gmail.com
build-type: Simple
cabal-version: >= 1.20
tested-with: GHC==7.8.2, GHC==7.8.3, GHC==7.10.2