Initial README

This commit is contained in:
Maciej Bendkowski 2022-03-31 21:42:15 +02:00
parent 6c908da3f9
commit 15ba2fd199

193
README.md
View File

@ -1,10 +1,187 @@
# generic-boltzmann-brain
# Generic Boltzmann Brain
-------------------------
*Generic-boltzmann-brain* is a framework which allows one to automatically
derive multiparametric *analytic* (Boltzmann) samplers for algebraic data types
in Haskell. As such, *generic-boltzmann-brain* is a twin project of
[Boltzmann-brain](https://github.com/maciej-bendkowski/boltzmann-brain).
`generic-boltzmann-brain` is a template Haskell library which allows its users
to automatically generate efficient multi-parametric analytic *Boltzmann
samplers* for algebraic data types.
Please bear in mind that the current project is still under active development.
Usage instractions, documentation, and tooling will be delivered in the near future.
In the meantime, you're welcome to explore the current codebase for yourself.
**Disclaimer**
Please bear in mind that the current project is still under active development
and should therefore be treated as a working prototype. Comments, critique, and
feature requests are most welcome.
### Quick overview
`generic-boltzmann-brain` constructs [Boltzmann
samplers](https://en.wikipedia.org/wiki/Boltzmann_sampler) generating random
inhabitants of user-declared algebraic data types. For instance, given the
following `BinTree` data type
```hs
data BinTree
= Leaf
| Node BinTree BinTree
deriving (Generic, Show)
```
one can construct a corresponding Boltzmann sampler in a single line of code
``` hs
mkDefBoltzmannSampler ''Tree 1000
```
which makes `BinTree` an instance of the `BoltzmannSampler` type class:
``` hs
class BoltzmannSampler a where
-- |
-- Samples a random object of type @a@. If the object size is larger than
-- the given upper bound parameter, @Nothing@ is returned instead.
sample :: RandomGen g => Int -> MaybeT (BuffonMachine g) (a, Int)
```
The so created `sample` function implements a Boltzmann sampler for `BinTree`.
The sampler outcomes follow a distribution in which *all* binary trees of equal
size have the *exact same* (uniform) probability of being sampled. The *size* of
the outcomes is itself a random variable with the user-declared mean value of
`1000`.
Further control over the outcome size distribution can be achieved through
*rejection sampling* discarding objects falling outside of the admissible lower
and upper size bounds.
``` hs
rejectionSampler ::
(RandomGen g, BoltzmannSampler a) => LowerBound -> UpperBound -> BuffonMachine g a
```
The `BuffonMachine g a` type encapsulates computations using random bits and is
in that respect closely related to QuickCheck's generator type `Gen a` to which
it can be converted through, e.g.:
``` hs
hoistRejectionSampler ::
BoltzmannSampler a => (Int -> (LowerBound, UpperBound)) -> Gen a
```
### Features
#### Multiparametric samplers
In a more advanced setting, `generic-boltzmann-brain` supports *inter allia*:
- systems of, possibly mutually recursive, algebraic data types,
- custom size definitions through user-declared constructor weights,
- outcome distribution tuning through constructors with user-prescribed frequencies.
Consider the following example of lambda terms in [DeBruijn notation](https://en.wikipedia.org/wiki/De_Bruijn_notation):
``` hs
data DeBruijn
= Z
| S DeBruijn
deriving (Generic, Show)
data Lambda
= Index DeBruijn
| App Lambda Lambda
| Abs Lambda
deriving (Generic, Show)
```
Suppose that we want a size notion where all constructors have weight
`1` (*i.e.* contribute one to the overall term size) *except* for `Index` which,
as a simple type converter, should contribute no size. Furthermore, suppose that
we want to *skew* the default uniform distribution, and increase the expected number
of abstractions in a random lambda term while retaining the sampler's *fairness*, *i.e.*
lambda terms of equal size *and* equal number of abstractions should have the same
probability of being sampled. With `generic-boltzmann-brain` we can generate such
a sampler as follows:
``` hs
mkBoltzmannSampler
System
{ targetType = ''Lambda
, meanSize = 10_000
, frequencies = ('Abs, 4_000) <:> def
, weights =
('Index, 0)
<:> $(mkDefWeights ''Lambda)
}
```
Note that `mkDefWeights` generates default constructor weights, whereas `<:>`
*overrides* the default value of `('Index, 1)` to be `('Index, 0)`. Likewise,
`def` defines the default (empty) set of frequencies and `<:>` includes
the custom frequency for the `Abs` constructor.
Using constructor tuning, it is therefore possible to distort the natural
frequency of each constructor in the given system. However, such an additional
non-trivial tuning procedure causes a not insignificant change in the underlying
probability model. In extreme cases, such as for instance requiring *80%* of
internal nodes in plane binary trees, the sampler might be unavailable or
virtually ineffective due to the sparsity of tuned structures.
**Please tune with caution!**
#### Multiple samplers for the same types
It is possible to have multiple samplers for the same underlying type, for
instance if different constructor frequencies are needed or different size
notions are required. Simply create a `newtype` wrapper and generate a let
`generic-boltzmann-brain` generate a new instance of `BoltzmannSampler`:
``` hs
newtype BinLambda = MkBinLambda Lambda
deriving (Generic, Show)
mkBoltzmannSampler
System
{ targetType = ''BinLambda
, meanSize = 6_000
, frequencies = ('Abs, 2340) <:> def
, weights =
('Index, 0)
<:> ('App, 2)
<:> ('Abs, 2)
<:> $(mkDefWeights ''Lambda)
}
```
### Installation
Currently, no pre-compiled binaries are available.
`generic-boltzmann-brain` uses an external [Python](https://www.python.org/) library
called [Paganini](https://github.com/maciej-bendkowski/paganini) to the construction
of Boltzmann samplers. `Python` with available `paganini` are expected to be executable
and present in the `PATH`.
We recommend using `stack` for compiling `generic-boltzmann-brain` sources.
### Known limitations
- Polymorphic data types are *not* supported.
- Non-algebraic data type specification are currently *not* supported, though
some of their extensions, such as some *Pólya structures* are *feasible*.
- Nested lists such as `[[Lambda]]` are *not* supported, even though lists such
as `[Lambda]` are. Note however that this is not a conceptual limitation, and
could be supported in future versions of `generic-boltzmann-brain`.
### Related work
As such, *generic-boltzmann-brain* is a twin project of
[Boltzmann-brain](https://github.com/maciej-bendkowski/boltzmann-brain). If you
are interested in generating standalone samplers out of a textual
representation, consider using *boltzmann-brain* instead.
`generic-boltzmann-brain` heavily relies on published work of numerous excellent
authors. Below, you can find a short (and definitely inexhaustive) list of
papers on the subject:
- [P. Duchon, P. Flajolet, G. Louchard. G. Schaeffer: Boltzmann Samplers for
the random generation of combinatorial
structures](http://algo.inria.fr/flajolet/Publications/DuFlLoSc04.pdf)
- [O.Bodini, J. Lumbroso, N. Rolin: Analytic samplers and the combinatorial rejection method](https://dl.acm.org/citation.cfm?id=2790220&dl=ACM&coll=DL)
- [F. Saad, C. Freer, M. Rinard, V. Mansinghka : Optimal Approximate Sampling from Discrete Probability Distributions](https://arxiv.org/pdf/2001.04555.pdf)
- [M. Bendkowski, S. Dovgal, O. Bodini : Polynomial tuning of multiparametric combinatorial
samplers](https://epubs.siam.org/doi/10.1137/1.9781611975062.9)