maciej-bendkowski/generic-boltzmann-brain

mirror of https://github.com/maciej-bendkowski/generic-boltzmann-brain.git synced 2024-11-22 01:53:37 +03:00

Analytic sampler framework for algebraic data types

starred-maciej-bendkowski-repo starred-repo

Go to file

Maciej Bendkowski 1c926cd2e8 Fix broken paper link		2023-05-12 18:36:35 +02:00
api/Data	Rename `hoist` samplers to `quickCheck` samplers	2022-04-05 19:05:16 +02:00
bench	Benchmark for random lambda terms.	2022-04-02 15:17:00 +02:00
internal/Data/Boltzmann	Upgrade to GHC 9.2.5	2023-02-13 17:36:23 +01:00
test	`MotzkinTree` sampler tests	2022-07-16 11:44:57 +02:00
.gitignore	Initial commit	2022-01-12 16:47:07 +01:00
default.nix	Bump version to `0.2.0.0`	2023-02-13 18:54:46 +01:00
flake.lock	Nix flake devShell and build system	2023-02-13 18:53:06 +01:00
flake.nix	Nix flake devShell and build system	2023-02-13 18:53:06 +01:00
fourmolu.yaml	Enable fourmolu code formatter	2022-01-23 14:13:59 +01:00
generic-boltzmann-brain.cabal	Bump version to `0.2.0.0`	2023-02-13 18:54:46 +01:00
hie.yaml	Fix `hie.yaml`	2022-07-10 15:35:10 +02:00
LICENSE	Initial commit	2022-01-12 16:47:07 +01:00
package.yaml	Bump version to `0.2.0.0`	2023-02-13 18:54:46 +01:00
README.md	Fix broken paper link	2023-05-12 18:36:35 +02:00
Setup.hs	Initial commit	2022-01-12 16:47:07 +01:00
stack.yaml	Upgrade to GHC 9.2.5	2023-02-13 17:36:23 +01:00
stack.yaml.lock	Upgrade to GHC 9.2.5	2023-02-13 17:36:23 +01:00

README.md

Generic Boltzmann Brain

generic-boltzmann-brain is a template Haskell library which allows its users to automatically generate efficient multi-parametric analytic Boltzmann samplers for algebraic data types.

Disclaimer

Please bear in mind that the current project is still under active development and should therefore be treated as a working prototype. Comments, critique, and feature requests are most welcome.

Citing Generic Boltzmann Brain

If you use Generic Boltzmann Brain or its components for published work, we encourage you to cite the accompanying paper:

Maciej Bendkowski

Automatic compile-time synthesis of entropy-optimal Boltzmann samplers

Quick overview

generic-boltzmann-brain constructs Boltzmann samplers generating random inhabitants of user-declared algebraic data types. For instance, given the following BinTree data type

import Data.Boltzmann

data BinTree
  = Leaf
  | Node BinTree BinTree
  deriving (Show)

one can construct a corresponding Boltzmann sampler in a single line of code

mkDefBoltzmannSampler ''BinTree 1000

which makes BinTree an instance of the BoltzmannSampler type class:

class BoltzmannSampler a where
  -- |
  --  Samples a random object of type @a@. If the object size is larger than
  --  the given upper bound parameter, @Nothing@ is returned instead.
  sample :: RandomGen g => MeanSize -> MaybeT (BuffonMachine g) (a, Int)

The so created sample function implements a Boltzmann sampler for BinTree. The sampler outcomes follow a distribution in which all binary trees of equal size have the exact same (uniform) probability of being sampled. The size of the outcomes is itself a random variable with the user-declared mean value of 1000.

Further control over the outcome size distribution can be achieved through rejection sampling discarding objects falling outside of the admissible lower and upper size bounds.

rejectionSampler ::
  (RandomGen g, BoltzmannSampler a) => LowerBound -> UpperBound -> BuffonMachine g a

The BuffonMachine g a type encapsulates computations using random bits and is in that respect closely related to QuickCheck's generator type Gen a to which it can be converted through, e.g.:

quickCheckRejectionSampler ::
  BoltzmannSampler a => (Int -> (LowerBound, UpperBound)) -> Gen a

Features

Multiparametric samplers

In a more advanced setting, generic-boltzmann-brain supports inter allia:

systems of, possibly mutually recursive, algebraic data types,
custom size definitions through user-declared constructor weights,
outcome distribution tuning through constructors with user-prescribed frequencies.

Consider the following example of lambda terms in DeBruijn notation:

data DeBruijn
  = Z
  | S DeBruijn
  deriving (Show)

data Lambda
  = Index DeBruijn
  | App Lambda Lambda
  | Abs Lambda
  deriving (Show)

Suppose that we want a size notion where all constructors have weight 1 (i.e. contribute one to the overall term size) except for Index which, as a simple type converter, should contribute no size. Furthermore, suppose that we want to skew the default uniform distribution, and increase the expected number of abstractions in a random lambda term while retaining the sampler's fairness, i.e. lambda terms of equal size and equal number of abstractions should have the same probability of being sampled. With generic-boltzmann-brain we can generate such a sampler as follows:

mkBoltzmannSampler
  System
    { targetType = ''Lambda
    , meanSize = 10_000
    , frequencies = ('Abs, 4_000) <:> def
    , weights =
        ('Index, 0)
          <:> $(mkDefWeights ''Lambda)
    }

Note that mkDefWeights generates default constructor weights, whereas <:> overrides the default value of ('Index, 1) to be ('Index, 0). Likewise, def defines the default (empty) set of frequencies and <:> includes the custom frequency for the Abs constructor.

Using constructor tuning, it is therefore possible to distort the natural frequency of each constructor in the given system. However, such an additional non-trivial tuning procedure causes a not insignificant change in the underlying probability model. In extreme cases, such as for instance requiring 80% of internal nodes in plane binary trees, the sampler might be unavailable or virtually ineffective due to the sparsity of tuned structures.

Please tune with caution!

Multiple samplers for the same types

It is possible to have multiple samplers for the same underlying type, for instance if different constructor frequencies are needed or different size notions are required. Simply create a newtype wrapper and generate a let generic-boltzmann-brain generate a new instance of BoltzmannSampler:

newtype BinLambda = MkBinLambda Lambda
  deriving (Show)

mkBoltzmannSampler
  System
    { targetType = ''BinLambda
    , meanSize = 6_000
    , frequencies = ('Abs, 2340) <:> def
    , weights =
        ('Index, 0)
          <:> ('App, 2)
          <:> ('Abs, 2)
          <:> $(mkDefWeights ''Lambda)
    }

Installation

Currently, no pre-compiled binaries are available.

generic-boltzmann-brain uses an external Python library called Paganini to do the construction of Boltzmann samplers. Python with available paganini are expected to be executable and present in the PATH.

We recommend using stack for compiling generic-boltzmann-brain sources.

Known limitations

Polymorphic data types are not supported.
Non-algebraic data type specifications are currently not supported, though some of their extensions, such as certain Pólya structures are feasible.
Nested lists such as [[Lambda]] are not supported, even though lists such as [Lambda] are. Note however that this is not a conceptual limitation, and could be supported in future versions of generic-boltzmann-brain.

As such, generic-boltzmann-brain is a continuation of (the now deprecated) Boltzmann-brain. generic-boltzmann-brain heavily relies on published work of numerous excellent authors. Below, you can find a short (and definitely inexhaustive) list of papers on the subject: