effectful/benchmarks/README.md
2022-08-04 05:10:42 +02:00

104 lines
4.7 KiB
Markdown

# Benchmarks
## Introduction
The benchmark suite of `effectful` compares performance of the most popular
extensible effects libraries in several scenarios. It implements two benchmarks:
- **countdown** - a microbenchmark that effectively measures performance of
monadic binds and the effect dispatch.
- **filesize** - a more down to earth benchmark that does various things,
including I/O.
Each benchmark has two flavours that affect the amount of effects available in
the context:
- **shallow** - only effects necessary for the benchmark.
- **deep** - necessary effects + 5 redundant effects put into the context before
and after the relevant ones (10 in total). This simulates a typical scenario
in which the code uses only a portion of the total amount of effects available
to the application.
Moreover, the benchmarked code was annotated with `NOINLINE` pragmas to prevent
GHC from inlining it and/or specializing away type class constraints related to
effects. This is crucial in order to get realistic results, as for any
non-trivial, multi-module application the compiler will not be able to do this
as that would essentially mean performing whole program specialization.
## Results
The code was compiled with GHC 9.2.4 and run on a Ryzen 9 5950x.
*Note:* below results are from a 1000 iteration run. Runs with more iterations
are not included in the analysis since they are proportionally the same, but can
be found
[here](https://github.com/haskell-effectful/effectful/tree/master/benchmarks).
### Countdown
<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_countdown_1000.png">
Analysis:
1. `effectful` takes the lead. Its static dispatch is on par with the reference
implementation that uses the `ST` monad, so it offers no additional
overhead. Its dynamic dispatch is also very fast.
2. `cleff` uses similar implementation techniques as `effectful` (the only major
difference is that its internal environment that stores effects is immutable),
so they trade blows:
- Its thread-local `State` is only slightly slower than `effectful`.
- Its `State` implemented via `IORef` is the fastest of the dynamically
dispatched effects, but it's worth noting that it's neither properly
thread-local nor shared as the underlying `IORef` is shared, but can't be
safely accessed with `get` and `put` from multiple threads.
3. `freer-simple` does surprisingly well for a solution that's based on free
monads.
4. `mtl` comes next and unfortunately here's when the conventional wisdom stating
that it is fast crumbles. The deep version is **50 times** slower than the
reference implementation!
This is a direct consequence of how type classes are compiled. To be more
precise, during compilation type class constraints are translated by the
compiler to regular arguments. These arguments are class dictionaries,
i.e. data types containing all functions that the type class contains.
Now, because usage of `mtl` style effects requires the monad to be
polymorphic, such functions at runtime are passed a dictionary of `Monad`
specific methods and have to call them. **In particular, this applies to the
monadic bind**. That's the crux of a problem - bind is called in between
every monadic operation, so making it a function call has a disastrous effect
on performance.
Why is the result for the deep stack so much worse than for the shallow one
though? It's because in reality, each call to bind performs *O(n)* function
calls, where *n* is the number of monad transformers on the stack. That's
because the implementation of bind for every monad transformer refers to the
bind of a monad it transforms.
Compare that to `effectful`, where monadic binds are known function calls and
can be eliminated by the compiler. What is more, the only piece of data
passed via class constraints are dictionaries of `:>`, each represented by a
single `Int` pointing at the place in the stack where the relevant effect is
located.
5. `fused-effects` exhibits similar behavior as `mtl`. This comes as no surprise
since it uses the same implementation techniques. It augments them with
additional machinery for convenience, which seems to add even more overhead
though.
6. `polysemy` is based on free monads just as `freer-simple` and performs
similarly, though with a much higher initial overhead.
### Filesize
<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_filesize_1000.png">
The results are similar to the ones of the *countdown* benchmark. It's worth
noting though that introduction of other effects and I/O makes the difference in
performance between libraries not nearly as pronounced.