2022-03-20 10:48:33 +03:00
|
|
|
# Benchmarks
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
The benchmark suite of `effectful` compares performance of the most popular
|
|
|
|
extensible effects libraries in several scenarios. It implements two benchmarks:
|
|
|
|
|
|
|
|
- **countdown** - a microbenchmark that effectively measures performance of
|
|
|
|
monadic binds and the effect dispatch.
|
|
|
|
|
|
|
|
- **filesize** - a more down to earth benchmark that does various things,
|
|
|
|
including I/O.
|
|
|
|
|
|
|
|
Each benchmark has two flavours that affect the amount of effects available in
|
|
|
|
the context:
|
|
|
|
|
|
|
|
- **shallow** - only effects necessary for the benchmark.
|
|
|
|
|
|
|
|
- **deep** - necessary effects + 5 redundant effects put into the context before
|
|
|
|
and after the relevant ones (10 in total). This simulates a typical scenario
|
|
|
|
in which the code uses only a portion of the total amount of effects available
|
|
|
|
to the application.
|
|
|
|
|
|
|
|
Moreover, the benchmarked code was annotated with `NOINLINE` pragmas to prevent
|
|
|
|
GHC from inlining it and/or specializing away type class constraints related to
|
|
|
|
effects. This is crucial in order to get realistic results, as for any
|
|
|
|
non-trivial, multi-module application the compiler will not be able to do this
|
|
|
|
as that would essentially mean performing whole program specialization.
|
|
|
|
|
|
|
|
## Results
|
|
|
|
|
2022-06-22 18:30:36 +03:00
|
|
|
The code was compiled with GHC 9.2.3 and run on a Ryzen 9 5950x.
|
2022-03-20 12:55:52 +03:00
|
|
|
|
2022-06-29 07:46:17 +03:00
|
|
|
*Note:* below results are from a 1000 iteration run. Runs with more iterations
|
|
|
|
are not included in the analysis since they are proportionally the same, but can
|
|
|
|
be found
|
|
|
|
[here](https://github.com/haskell-effectful/effectful/tree/master/benchmarks).
|
|
|
|
|
2022-03-20 10:48:33 +03:00
|
|
|
### Countdown
|
|
|
|
|
|
|
|
<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_countdown_1000.png">
|
|
|
|
|
|
|
|
Analysis:
|
|
|
|
|
|
|
|
1. `effectful` takes the lead. Its static dispatch is on par with the reference
|
|
|
|
implementation that uses the `ST` monad, so it offers no additional
|
2022-06-22 18:30:36 +03:00
|
|
|
overhead. Its dynamic dispatch is also very fast.
|
|
|
|
|
|
|
|
2. `cleff` uses similar implementation techniques as `effectful` (the only major
|
|
|
|
difference is that its internal environment that stores effects is immutable),
|
|
|
|
so they trade blows:
|
|
|
|
- Its thread-local `State` is only slightly slower than `effectful`.
|
2022-06-29 05:50:34 +03:00
|
|
|
- Its `State` implemented via `IORef` is the fastest of the dynamically
|
|
|
|
dispatched effects, but it's worth noting that it's neither properly
|
|
|
|
thread-local nor shared as the underlying `IORef` is shared, but can't be
|
2022-06-29 07:46:17 +03:00
|
|
|
safely accessed with `get` and `put` from multiple threads.
|
2022-03-20 10:48:33 +03:00
|
|
|
|
|
|
|
3. `freer-simple` does surprisingly well for a solution that's based on free
|
|
|
|
monads.
|
|
|
|
|
|
|
|
4. `mtl` comes next and unfortunately here's when the conventional wisdom stating
|
|
|
|
that it is fast crumbles. The deep version is **50 times** slower than the
|
|
|
|
reference implementation!
|
|
|
|
|
|
|
|
This is a direct consequence of how type classes are compiled. To be more
|
|
|
|
precise, during compilation type class constraints are translated by the
|
|
|
|
compiler to regular arguments. These arguments are class dictionaries,
|
|
|
|
i.e. data types containing all functions that the type class contains.
|
|
|
|
|
|
|
|
Now, because usage of `mtl` style effects requires the monad to be
|
|
|
|
polymorphic, such functions at runtime are passed a dictionary of `Monad`
|
|
|
|
specific methods and have to call them. **In particular, this applies to the
|
|
|
|
monadic bind**. That's the crux of a problem - bind is called in between
|
|
|
|
every monadic operation, so making it a function call has a disastrous effect
|
|
|
|
on performance.
|
|
|
|
|
2022-03-20 12:58:10 +03:00
|
|
|
Why is the result for the deep stack so much worse than for the shallow one
|
2022-03-20 10:48:33 +03:00
|
|
|
though? It's because in reality, each call to bind performs *O(n)* function
|
|
|
|
calls, where *n* is the number of monad transformers on the stack. That's
|
|
|
|
because the implementation of bind for every monad transformer refers to the
|
|
|
|
bind of a monad it transforms.
|
|
|
|
|
|
|
|
Compare that to `effectful`, where monadic binds are known function calls and
|
|
|
|
can be eliminated by the compiler. What is more, the only piece of data
|
|
|
|
passed via class constraints are dictionaries of `:>`, each represented by a
|
|
|
|
single `Int` pointing at the place in the stack where the relevant effect is
|
|
|
|
located.
|
|
|
|
|
2022-03-20 13:09:36 +03:00
|
|
|
5. `fused-effects` exhibits similar behavior as `mtl`. This comes as no surprise
|
|
|
|
since it uses the same implementation techniques. It augments them with
|
2022-03-20 10:48:33 +03:00
|
|
|
additional machinery for convenience, which seems to add even more overhead
|
|
|
|
though.
|
|
|
|
|
|
|
|
6. `polysemy` is based on free monads just as `freer-simple` and performs
|
|
|
|
similarly, though with a much higher initial overhead.
|
|
|
|
|
|
|
|
### Filesize
|
|
|
|
|
|
|
|
<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_filesize_1000.png">
|
|
|
|
|
|
|
|
The results are similar to the ones of the *countdown* benchmark. It's worth
|
|
|
|
noting though that introduction of other effects and I/O makes the difference in
|
|
|
|
performance between libraries not nearly as pronounced.
|