More comprehensive README.

This commit is contained in:
Michael Walker 2016-08-25 17:24:00 +01:00
parent 8d6cdb6aff
commit 2065253f73

View File

@ -15,78 +15,295 @@ forbid, wanted to *test* it? Testing concurrency is normally a hard
problem, because of the nondeterminism of scheduling: you can run your
program ten times and get ten different results if you're unlucky.
Fortunately, there is a solution. By abstracting out the actual
implementation of concurrency through a typeclass, an alternative
implementation can be used for testing, allowing the *systematic*
exploration of the possible results of your program.
There is a solution! Through these libraries, you can write concurrent
programs and test them *deterministically*. By abstracting out the
actual implementation of concurrency through a typeclass, an
alternative implementation can be used for testing, allowing the
*systematic* exploration of the possible results of your program.
This repository contains dejafu, a concurrency testing library based
on a typeclass abstraction of concurrency, and related libraries.
- concurrency ([hackage 0.1.0.0][concurrency]): A typeclass for
concurrency.
Table of Contents
-----------------
- dejafu ([hackage 0.4.0.0][dejafu]): Systematic testing for Haskell
concurrency.
- [Packages](#packages)
- [Concurrent Programming](#getting-started)
- [Asynchronous IO](#asynchronous-io)
- [Threads and MVars](#threads-and-mvars)
- [Software Transactional Memory](#software-transactional-memory)
- [Relaxed Memory and IORefs](#relaxed-memory-and-iorefs)
- [Program Testing](#program-testign)
- [Porting](#porting)
- [Contributing](#contributing)
- [Bibliography](#bibliography)
- dpor ([hackage 0.2.0.0][dpor]): A generic implementation of dynamic
partial-order reduction (DPOR) for testing arbitrary models of
concurrency.
- async-dejafu ([hackage 0.1.3.0][async]): Run MonadConc operations
asynchronously and wait for their results.
Packages
--------
- hunit-dejafu ([hackage 0.3.0.2][hunit]): Deja Fu support for the
HUnit test framework.
| | Version | Intended Users | Summary |
| --- | ------- | -------------- | ------- |
| async-dejafu [[docs][d:async]] [[hackage][h:async]] | 0.1.3.0 | Authors | Run MonadConc operations asynchronously and wait for their results. |
| concurrency [[docs][d:conc]] [[hackage][h:conc]] | 1.0.0.0 | Authors | Typeclasses, functions, and data types for concurrency and STM. |
| dejafu [[docs][d:dejafu]] [[hackage][h:dejafu]] | 0.4.0.0 | Testers | Systematic testing for Haskell concurrency. |
| dpor [[docs][d:dpor]] [[hackage][h:dpor]] | 0.2.0.0 | Me[1] | A generic implementation of dynamic partial-order reduction (DPOR) for testing arbitrary models of concurrency. |
| hunit-dejafu [[docs][d:hunit]] [[hackage][h:hunit]] | 0.3.0.2 | Testers | Deja Fu support for the HUnit test framework. |
| tasty-dejafu [[docs][d:tasty]] [[hackage][h:tasty]] | 0.3.0.2 | Testers | Deja Fu support for the Tasty test framework. |
- tasty-dejafu ([hackage 0.3.0.2][tasty]): Deja Fu support for the
Tasty test framework.
Each package has its own README in its subdirectory.
There is also dejafu-tests, the test suite for dejafu. This is in a
separate package due to Cabal being bad with test suite transitive
dependencies.
[concurrency]: https://hackage.haskell.org/package/concurrency
[dejafu]: https://hackage.haskell.org/package/dejafu
[dpor]: https://hackage.haskell.org/package/dpor
[async]: https://hackage.haskell.org/package/async-dejafu
[hunit]: https://hackage.haskell.org/package/hunit-dejafu
[tasty]: https://hackage.haskell.org/package/tasty-dejafu
[1] Slightly tongue-in-cheek, but *dpor* is only really useful for
people experimenting with concurrency testing.
Features
--------
[d:conc]: https://docs.barrucadu.co.uk/concurrency/
[d:dejafu]: https://docs.barrucadu.co.uk/dejafu/
[d:dpor]: https://docs.barrucadu.co.uk/dpor/
[d:async]: https://docs.barrucadu.co.uk/async-dejafu/
[d:hunit]: https://docs.barrucadu.co.uk/hunit-dejafu/
[d:tasty]: https://docs.barrucadu.co.uk/tasty-dejafu/
dejafu supports most of the GHC concurrency abstraction found in
`Control.Concurrent` and, through the [exceptions][] package,
`Control.Exception`. A brief list of supported functionality:
[h:conc]: https://hackage.haskell.org/package/concurrency
[h:dejafu]: https://hackage.haskell.org/package/dejafu
[h:dpor]: https://hackage.haskell.org/package/dpor
[h:async]: https://hackage.haskell.org/package/async-dejafu
[h:hunit]: https://hackage.haskell.org/package/hunit-dejafu
[h:tasty]: https://hackage.haskell.org/package/tasty-dejafu
- Threads: the `forkIO*` and `forkOn*` functions, although bound
threads are not supported.
- Getting and setting capablities (testing default is two).
- Yielding and delaying.
- Mutable state: STM, `MVar`, and `IORef`.
- Relaxed memory for `IORef` operations: total store order (the
testing default) and partial store order.
- Atomic compare-and-swap for `IORef`.
- Exceptions.
- All of the data structures in `Control.Concurrent.*` and
`Control.Concurrent.STM.*` have typeclass-abstracted
equivalents.
This is quite a rich set of functionality, although it is not
complete. If there is something else you need, file an issue!
Concurrent Programming
----------------------
[exceptions]: https://hackage.haskell.org/package/exceptions
You should read [Parallel and Concurrent Programming in Haskell][parconc],
by Simon Marlow. It's very good, and the API of the *async-dejafu* and
*concurrency* packages is intentionally kept very similar to the
standard packages, so all the knowledge transfers.
Usage
-----
[parconc]: http://chimera.labs.oreilly.com/books/1230000000929/
The documentation for the latest development version is
[available online][docs].
### Asynchronous IO
The wonderful *[async][]* package by Simon Marlow greatly eases the
difficulty of writing programs which merely need to perform some
asynchronous IO. The *async-dejafu* package provides an almost-total
reimplementation of *async*, using the abstractions provided by
*concurrency*.
For example, assuming a suitable `getURL` function, to fetch the
contents of two web pages at the same time:
```haskell
withAsync (getURL url1) $ \a1 -> do
withAsync (getURL url2) $ \a2 -> do
page1 <- wait a1
page2 <- wait a2
-- ...
```
The `withAsync` function starts an operation in a separate thread, and
kills it if the inner action finishes before it completes.
Another example, this time waiting for any of a number of web pages to
download, and cancelling the others as soon as one completes:
```haskell
let download url = do
res <- getURL url
pure (url, res)
downloads <- mapM (async . download) urls
(url, res) <- waitAnyCancel downloads
printf "%s was first (%d bytes)\n" url (B.length res)
```
The `async` function starts an operation in another thread but, unlike
`withAsync` takes no inner action to execute: the programmer needs to
make sure the computation is waited for or cancelled as appropriate.
[async]: http://hackage.haskell.org/package/async
### Threads and MVars
The fundamental unit of concurrency is the thread, and the most basic
communication mechanism is the `MVar`:
```haskell
main = do
var <- newEmptyMVar
fork $ putMVar var 'x'
fork $ putMVar var 'y'
r <- takeMVar m
print r
```
The `fork` function starts executing a `MonadConc` action in a
separate thread, and `takeMVar`/`putMVar` are used to communicate
values (`newEmptyMVar` just makes an `MVar` with nothing in it). This
will either print `'x'` or `'y'`, depending on which of the two
threads "wins".
On top of the simple `MVar`, we can build more complicated concurrent
data structures, like channels. A collection of these are provided in
the *concurrency* package.
If a thread attempts to read from an `MVar` which is never written to,
or write to an `MVar` which is never read from, it blocks.
### Software Transactional Memory
Software transactional memory (STM) simplifies stateful concurrent
programming by allowing complex atomic state operations. Whereas only
one `MVar` can be modified atomically at a time, any number of `TVar`s
can be. STM is normally provided by the *stm* package, but the
*concurrency* package exposes it directly.
For example, we can swap the values of two variables, and read them in
another thread:
```haskell
main = do
var1 <- newTVar 'x'
var2 <- newTVar 'y'
fork . atomically $ do
a <- readTVar var1
b <- readTVar var2
writeTVar var2 a
writeTVar var1 b
a <- atomically $ readTVar var1
b <- atomically $ readTVar var2
print (a, b)
```
Even though the reads and writes appear to be done in multiple steps
inside the forked thread, the entire transaction is executed in a
single step, by the `atomically` function. This means that the main
thread will observe the values `('x', 'y')` or `('y', 'x')`, it can
never get `('x', 'x')` as naive `MVar` implementation would.
### Relaxed Memory and CRefs
There is a third type of communication primitive, the `CRef` (known in
normal Haskell as the `IORef`). These do not impose synchronisation,
and so the behaviour of concurrent reads and writes depends on the
memory model of the underlying processor.
```haskell
crefs = do
r1 <- newCRef False
r2 <- newCRef False
x <- spawn $ writeCRef r1 True >> readCRef r2
y <- spawn $ writeCRef r2 True >> readCRef r1
(,) <$> readMVar x <*> readMVar y
```
Here `spawn` forks a thread and gives an `MVar` which can be read from
to get the return value. Under a sequentially consistent memory model,
there are three possible results: `(True, True)`, `(True, False)`, and
`(False, True)`. Under the relaxed memory model of modern processors,
the result `(False, False)` is also possible. Relaxed memory models
allow for reads and writes to be re-ordered between threads.
For testing, three memory models are supported (with the default being
TSO):
- **Sequential Consistency:** A program behaves as a simple
interleaving of the actions in different threads. When a `CRef` is
written to, that write is immediately visible to all threads.
- **Total Store Order (TSO):** Each thread has a write buffer. A
thread sees its writes immediately, but other threads will only
see writes when they are committed, which may happen later. Writes
are committed in the same order that they are created.
- **Partial Store Order (PSO):** Each `CRef` has a write buffer. A
thread sees its writes immediately, but other threads will only
see writes when they are committed, which may happen later. Writes
to different `CRefs` are not necessarily committed in the same
order that they are created.
### Program Testing
If you just write your concurrent program using the `MonadConc` and
`MonadSTM` typeclasses (maybe with `MonadIO` if you need `IO` as
well), then it is testable with dejafu!
Testing is similar to unit testing, the programmer produces a
self-contained monadic action to execute. It is run under many
schedules, and the results gathered into a list. This is a little
different from normal unit testing, where you just return "true" or
"false", here you have *many* results, and need to decide if the
collection is valid or not.
For the simple cases, where you just want to check something is
deterministic and doesn't deadlock, there is a handy `autocheck`
function. For example:
```haskell
example = do
a <- newEmptyMVar
b <- newEmptyMVar
c <- newMVar 0
let lock m = putMVar m ()
let unlock = takeMVar
j1 <- spawn $ lock a >> lock b >> modifyMVar_ c (return . succ) >> unlock b >> unlock a
j2 <- spawn $ lock b >> lock a >> modifyMVar_ c (return . pred) >> unlock a >> unlock b
takeMVar j1
takeMVar j2
takeMVar c
```
The correct result is 0, as it starts out as 0 and is incremented and
decremented by threads 1 and 2, respectively. However, note the order
of acquisition of the locks in the two threads. If thread 2 pre-empts
thread 1 between the acquisition of the locks (or if thread 1
pre-empts thread 2), a deadlock situation will arise, as thread 1 will
have lock a and be waiting on b, and thread 2 will have b and be
waiting on a.
Here is what `autocheck` has to say about it:
```
> autocheck example
[fail] Never Deadlocks (checked: 5)
[deadlock] S0------------S1-P2--S1-
0: main
[pass] No Exceptions (checked: 12)
[fail] Consistent Result (checked: 11)
0 S0------------S2-----------------S1-----------------S0----
0: main
[deadlock] S0------------S1-P2--S1-
0: main
False
```
It identifies the deadlock, and also the possible results the
computation can produce, and displays a simplified trace leading to
each failing outcome. The traces contain thread numbers, and below is
displayed thread names (which can be set by the programmer when
forking them). It also returns false as there are test failures.
Note that if your test case does `IO`, the `IO` will be executed a lot
of times. It needs to be deterministic enough to not invalidate the
results of testing. That may seem a burden, but it's a requirement of
any form of testing.
### Porting
As a general rule of thumb, to convert some existing code to work with
dejafu:
- Depend on "concurrency".
- Import `Control.Concurrent.Classy.*` instead of `Control.Concurrent.*`
- Change `IO a` to `MonadConc m => m a`
- Change `STM a` to `MonadSTM stm => stm a`
@ -94,7 +311,6 @@ dejafu:
-> `TVar stm`, `IORef` -> `CRef m`, etc
- Fix the type errors.
[docs]: https://docs.barrucadu.co.uk/dejafu
Contributing
------------
@ -104,12 +320,13 @@ Bug reports, pull requests, and comments are very welcome!
Feel free to contact me on GitHub, through IRC (#haskell on freenode),
or email (mike@barrucadu.co.uk).
Bibliography
------------
Each paper has a short name in parentheses, which I use in non-haddock
comments. Haddock comments get the full citation. PDF links are
provided where non-paywalled ones are available.
These libraries wouldn't be possible without prior research, which I
mention in the documentation. Haddock comments get the full citation,
whereas in-line comments just get the shortened name:
- [BPOR] *Bounded partial-order reduction*, K. Coons, M. Musuvathi,
and K. McKinley (2013)