effectful/benchmarks/README.md

# Benchmarks

## Introduction

The benchmark suite of `effectful` compares performance of the most popular
extensible effects libraries in several scenarios. It implements two benchmarks:

- **countdown** - a microbenchmark that effectively measures performance of
  monadic binds and the effect dispatch.

- **filesize** - a more down to earth benchmark that does various things,
  including I/O.
   
Each benchmark has two flavours that affect the amount of effects available in
the context:

- **shallow** - only effects necessary for the benchmark.

- **deep** - necessary effects + 5 redundant effects put into the context before
  and after the relevant ones (10 in total). This simulates a typical scenario
  in which the code uses only a portion of the total amount of effects available
  to the application.

Moreover, the benchmarked code was annotated with `NOINLINE` pragmas to prevent
GHC from inlining it and/or specializing away type class constraints related to
effects. This is crucial in order to get realistic results, as for any
non-trivial, multi-module application the compiler will not be able to do this
as that would essentially mean performing whole program specialization.

## Results

The code was compiled with GHC 9.2.4 and run on a Ryzen 9 5950x.

*Note:* below results are from a 1000 iteration run. Runs with more iterations
are not included in the analysis since they are proportionally the same, but can
be found
[here](https://github.com/haskell-effectful/effectful/tree/master/benchmarks).

### Countdown

<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_countdown_1000.png">

Analysis:

1. `effectful` takes the lead. Its static dispatch is on par with the reference
   implementation that uses the `ST` monad, so it offers no additional
   overhead. Its dynamic dispatch is also very fast.

2. `cleff` uses similar implementation techniques as `effectful` (the only major
   difference is that its internal environment that stores effects is immutable),
   so they trade blows:
   - Its thread-local `State` is only slightly slower than `effectful`.
   - Its `State` implemented via `IORef` is the fastest of the dynamically
   dispatched effects, but it's worth noting that it's neither properly
   thread-local nor shared as the underlying `IORef` is shared, but can't be
   safely accessed with `get` and `put` from multiple threads.

3. `freer-simple` does surprisingly well for a solution that's based on free
   monads.
   
4. `mtl` comes next and unfortunately here's when the conventional wisdom stating
   that it is fast crumbles. The deep version is **50 times** slower than the
   reference implementation!
   
   This is a direct consequence of how type classes are compiled. To be more
   precise, during compilation type class constraints are translated by the
   compiler to regular arguments. These arguments are class dictionaries,
   i.e. data types containing all functions that the type class contains.
   
   Now, because usage of `mtl` style effects requires the monad to be
   polymorphic, such functions at runtime are passed a dictionary of `Monad`
   specific methods and have to call them. **In particular, this applies to the
   monadic bind**. That's the crux of a problem - bind is called in between
   every monadic operation, so making it a function call has a disastrous effect
   on performance.
   
   Why is the result for the deep stack so much worse than for the shallow one
   though? It's because in reality, each call to bind performs *O(n)* function
   calls, where *n* is the number of monad transformers on the stack. That's
   because the implementation of bind for every monad transformer refers to the
   bind of a monad it transforms.
   
   Compare that to `effectful`, where monadic binds are known function calls and
   can be eliminated by the compiler. What is more, the only piece of data
   passed via class constraints are dictionaries of `:>`, each represented by a
   single `Int` pointing at the place in the stack where the relevant effect is
   located.

5. `fused-effects` exhibits similar behavior as `mtl`. This comes as no surprise
   since it uses the same implementation techniques. It augments them with
   additional machinery for convenience, which seems to add even more overhead
   though.

6. `polysemy` is based on free monads just as `freer-simple` and performs
   similarly, though with a much higher initial overhead.

### Filesize

<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_filesize_1000.png">

The results are similar to the ones of the *countdown* benchmark. It's worth
noting though that introduction of other effects and I/O makes the difference in
performance between libraries not nearly as pronounced.
Update benchmarks and describe the results 2022-03-20 10:48:33 +03:00			`# Benchmarks`

			`## Introduction`

			The benchmark suite of `effectful` compares performance of the most popular
			`extensible effects libraries in several scenarios. It implements two benchmarks:`

			`- countdown - a microbenchmark that effectively measures performance of`
			`monadic binds and the effect dispatch.`

			`- filesize - a more down to earth benchmark that does various things,`
			`including I/O.`

			`Each benchmark has two flavours that affect the amount of effects available in`
			`the context:`

			`- shallow - only effects necessary for the benchmark.`

			`- deep - necessary effects + 5 redundant effects put into the context before`
			`and after the relevant ones (10 in total). This simulates a typical scenario`
			`in which the code uses only a portion of the total amount of effects available`
			`to the application.`

			Moreover, the benchmarked code was annotated with `NOINLINE` pragmas to prevent
			`GHC from inlining it and/or specializing away type class constraints related to`
			`effects. This is crucial in order to get realistic results, as for any`
			`non-trivial, multi-module application the compiler will not be able to do this`
			`as that would essentially mean performing whole program specialization.`

			`## Results`

Update benchmark results 2022-08-04 06:10:42 +03:00			`The code was compiled with GHC 9.2.4 and run on a Ryzen 9 5950x.`
Move info about system to results section 2022-03-20 12:55:52 +03:00
Slightly better benchmark access 2022-06-29 07:46:17 +03:00			`Note: below results are from a 1000 iteration run. Runs with more iterations`
			`are not included in the analysis since they are proportionally the same, but can`
			`be found`
			`[here](https://github.com/haskell-effectful/effectful/tree/master/benchmarks).`

Update benchmarks and describe the results 2022-03-20 10:48:33 +03:00			`### Countdown`

			`<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_countdown_1000.png">`

			`Analysis:`

			1. `effectful` takes the lead. Its static dispatch is on par with the reference
			implementation that uses the `ST` monad, so it offers no additional
Update benchmarks 2022-06-22 18:30:36 +03:00			`overhead. Its dynamic dispatch is also very fast.`

			2. `cleff` uses similar implementation techniques as `effectful` (the only major
			`difference is that its internal environment that stores effects is immutable),`
			`so they trade blows:`
			- Its thread-local `State` is only slightly slower than `effectful`.
Clarification 2022-06-29 05:50:34 +03:00			- Its `State` implemented via `IORef` is the fastest of the dynamically
			`dispatched effects, but it's worth noting that it's neither properly`
			thread-local nor shared as the underlying `IORef` is shared, but can't be
Slightly better benchmark access 2022-06-29 07:46:17 +03:00			safely accessed with `get` and `put` from multiple threads.
Update benchmarks and describe the results 2022-03-20 10:48:33 +03:00
			3. `freer-simple` does surprisingly well for a solution that's based on free
			`monads.`

			4. `mtl` comes next and unfortunately here's when the conventional wisdom stating
			`that it is fast crumbles. The deep version is 50 times slower than the`
			`reference implementation!`

			`This is a direct consequence of how type classes are compiled. To be more`
			`precise, during compilation type class constraints are translated by the`
			`compiler to regular arguments. These arguments are class dictionaries,`
			`i.e. data types containing all functions that the type class contains.`

			Now, because usage of `mtl` style effects requires the monad to be
			polymorphic, such functions at runtime are passed a dictionary of `Monad`
			`specific methods and have to call them. **In particular, this applies to the`
			`monadic bind**. That's the crux of a problem - bind is called in between`
			`every monadic operation, so making it a function call has a disastrous effect`
			`on performance.`

Fix typo 2022-03-20 12:58:10 +03:00			`Why is the result for the deep stack so much worse than for the shallow one`
Update benchmarks and describe the results 2022-03-20 10:48:33 +03:00			`though? It's because in reality, each call to bind performs O(n) function`
			`calls, where n is the number of monad transformers on the stack. That's`
			`because the implementation of bind for every monad transformer refers to the`
			`bind of a monad it transforms.`

			Compare that to `effectful`, where monadic binds are known function calls and
			`can be eliminated by the compiler. What is more, the only piece of data`
			passed via class constraints are dictionaries of `:>`, each represented by a
			single `Int` pointing at the place in the stack where the relevant effect is
			`located.`

Reword 2022-03-20 13:09:36 +03:00			5. `fused-effects` exhibits similar behavior as `mtl`. This comes as no surprise
			`since it uses the same implementation techniques. It augments them with`
Update benchmarks and describe the results 2022-03-20 10:48:33 +03:00			`additional machinery for convenience, which seems to add even more overhead`
			`though.`

			6. `polysemy` is based on free monads just as `freer-simple` and performs
			`similarly, though with a much higher initial overhead.`

			`### Filesize`

			`<img src="https://raw.githubusercontent.com/haskell-effectful/effectful/master/benchmarks/bench_filesize_1000.png">`

			`The results are similar to the ones of the countdown benchmark. It's worth`
			`noting though that introduction of other effects and I/O makes the difference in`
			`performance between libraries not nearly as pronounced.`