Regenerate haddock from README.md

This commit is contained in:
Bodigrim 2021-02-03 22:47:29 +00:00
parent b4d26a2afe
commit be37589b47

View File

@ -3,231 +3,227 @@ Module: Test.Tasty.Bench
Copyright: (c) 2021 Andrew Lelechenko
Licence: MIT
Featherlight benchmark framework (only one file!) for performance measurement with API mimicking [@criterion@](http://hackage.haskell.org/package/criterion) and [@gauge@](http://hackage.haskell.org/package/gauge).
Featherlight benchmark framework (only one file!) for performance
measurement with API
mimicking [@criterion@](http://hackage.haskell.org/package/criterion)
and [@gauge@](http://hackage.haskell.org/package/gauge).
=== How lightweight is it?
There is only one source file "Test.Tasty.Bench" and no external dependencies
except [@tasty@](http://hackage.haskell.org/package/tasty).
So if you already depend on @tasty@ for a test suite, there
is nothing else to install.
There is only one source file "Test.Tasty.Bench" and no external
dependencies except [@tasty@](http://hackage.haskell.org/package/tasty). So
if you already depend on @tasty@ for a test suite, there is nothing else
to install.
Compare this to @criterion@ (10+ modules, 50+ dependencies) and @gauge@ (40+ modules, depends on @basement@ and @vector@).
Compare this to @criterion@ (10+ modules, 50+ dependencies) and @gauge@
(40+ modules, depends on @basement@ and @vector@).
=== How is it possible?
Our benchmarks are literally regular @tasty@ tests, so we can leverage all existing
machinery for command-line options, resource management, structuring,
listing and filtering benchmarks, running and reporting results. It also means
that @tasty-bench@ can be used in conjunction with other @tasty@ ingredients.
Our benchmarks are literally regular @tasty@ tests, so we can leverage
all existing machinery for command-line options, resource management,
structuring, listing and filtering benchmarks, running and reporting
results. It also means that @tasty-bench@ can be used in conjunction
with other @tasty@ ingredients.
Unlike @criterion@ and @gauge@ we use a very simple statistical model described below.
This is arguably a questionable choice, but it works pretty well in practice.
A rare developer is sufficiently well-versed in probability theory
to make sense and use of all numbers generated by @criterion@.
Unlike @criterion@ and @gauge@ we use a very simple statistical model
described below. This is arguably a questionable choice, but it works
pretty well in practice. A rare developer is sufficiently well-versed in
probability theory to make sense and use of all numbers generated by
@criterion@.
=== How to switch?
[Cabal mixins](https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-mixins)
allow to taste @tasty-bench@ instead of @criterion@ or @gauge@
without changing a single line of code:
<https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-mixins Cabal mixins>
allow to taste @tasty-bench@ instead of @criterion@ or @gauge@ without
changing a single line of code:
@
cabal-version: 2.0
> cabal-version: 2.0
>
> benchmark foo
> ...
> build-depends:
> tasty-bench
> mixins:
> tasty-bench (Test.Tasty.Bench as Criterion)
benchmark foo
...
build-depends:
tasty-bench
mixins:
tasty-bench (Test.Tasty.Bench as Criterion)
@
This works vice versa as well: if you use @tasty-bench@, but at some point
need a more comprehensive statistical analysis,
it is easy to switch temporarily back to @criterion@.
This works vice versa as well: if you use @tasty-bench@, but at some
point need a more comprehensive statistical analysis, it is easy to
switch temporarily back to @criterion@.
=== How to write a benchmark?
Benchmarks are declared in a separate section of @cabal@ file:
@
cabal-version: 2.0
name: bench-fibo
version: 0.0
build-type: Simple
synopsis: Example of a benchmark
benchmark bench-fibo
main-is: BenchFibo.hs
type: exitcode-stdio-1.0
build-depends: base, tasty-bench
@
> cabal-version: 2.0
> name: bench-fibo
> version: 0.0
> build-type: Simple
> synopsis: Example of a benchmark
>
> benchmark bench-fibo
> main-is: BenchFibo.hs
> type: exitcode-stdio-1.0
> build-depends: base, tasty-bench
And here is @BenchFibo.hs@:
@
import Test.Tasty.Bench
> import Test.Tasty.Bench
>
> fibo :: Int -> Integer
> fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
>
> main :: IO ()
> main = defaultMain
> [ bgroup "fibonacci numbers"
> [ bench "fifth" $ nf fibo 5
> , bench "tenth" $ nf fibo 10
> , bench "twentieth" $ nf fibo 20
> ]
> ]
fibo :: Int -> Integer
fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
main :: IO ()
main = defaultMain
[ bgroup "fibonacci numbers"
[ bench "fifth" $ nf fibo 5
, bench "tenth" $ nf fibo 10
, bench "twentieth" $ nf fibo 20
]
]
@
Since @tasty-bench@ provides an API compatible with @criterion@,
one can refer to [its documentation](http://www.serpentine.com/criterion/tutorial.html#how-to-write-a-benchmark-suite) for more examples.
Since @tasty-bench@ provides an API compatible with @criterion@, one can
refer to
<http://www.serpentine.com/criterion/tutorial.html#how-to-write-a-benchmark-suite its documentation>
for more examples.
=== How to read results?
Running the example above (@cabal@ @bench@ or @stack@ @bench@)
results in the following output:
Running the example above (@cabal@ @bench@ or @stack@ @bench@) results in
the following output:
@
All
fibonacci numbers
fifth: OK (2.13s)
63 ns ± 3.4 ns
tenth: OK (1.71s)
809 ns ± 73 ns
twentieth: OK (3.39s)
104 μs ± 4.9 μs
> All
> fibonacci numbers
> fifth: OK (2.13s)
> 63 ns ± 3.4 ns
> tenth: OK (1.71s)
> 809 ns ± 73 ns
> twentieth: OK (3.39s)
> 104 μs ± 4.9 μs
>
> All 3 tests passed (7.25s)
All 3 tests passed (7.25s)
@
The output says that, for instance, the first benchmark
was repeatedly executed for 2.13 seconds (wall time),
its mean time was 63 nanoseconds and,
assuming ideal precision of a system clock,
execution time does not often diverge from the mean
further than ±3.4 nanoseconds
(double standard deviation, which for normal distributions
corresponds to [95%](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
probability). Take standard deviation numbers
with a grain of salt; there are lies, damned lies, and statistics.
The output says that, for instance, the first benchmark was repeatedly
executed for 2.13 seconds (wall time), its mean time was 63 nanoseconds
and, assuming ideal precision of a system clock, execution time does not
often diverge from the mean further than ±3.4 nanoseconds (double
standard deviation, which for normal distributions corresponds to
<https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule 95%>
probability). Take standard deviation numbers with a grain of salt;
there are lies, damned lies, and statistics.
Note that this data is not directly comparable with @criterion@ output:
@
benchmarking fibonacci numbers/fifth
time 62.78 ns (61.99 ns .. 63.41 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 62.39 ns (61.93 ns .. 62.94 ns)
std dev 1.753 ns (1.427 ns .. 2.258 ns)
@
> benchmarking fibonacci numbers/fifth
> time 62.78 ns (61.99 ns .. 63.41 ns)
> 0.999 R² (0.999 R² .. 1.000 R²)
> mean 62.39 ns (61.93 ns .. 62.94 ns)
> std dev 1.753 ns (1.427 ns .. 2.258 ns)
One might interpret the second line as saying that
95% of measurements fell into 61.9963.41 ns interval, but this is wrong.
It states that the [OLS regression](https://en.wikipedia.org/wiki/Ordinary_least_squares)
of execution time (which is not exactly the mean time) is most probably
somewhere between 61.99 ns and 63.41 ns,
but does not say a thing about individual measurements.
To understand how far away a typical measurement deviates
you need to add/subtract double standard deviation yourself
(which gives 62.78 ns ± 3.506 ns, similar to @tasty-bench@ above).
One might interpret the second line as saying that 95% of measurements
fell into 61.9963.41 ns interval, but this is wrong. It states that the
<https://en.wikipedia.org/wiki/Ordinary_least_squares OLS regression> of
execution time (which is not exactly the mean time) is most probably
somewhere between 61.99 ns and 63.41 ns, but does not say a thing about
individual measurements. To understand how far away a typical
measurement deviates you need to add\/subtract double standard deviation
yourself (which gives 62.78 ns ± 3.506 ns, similar to @tasty-bench@
above).
To add to the confusion, @gauge@ in @--small@ mode outputs
not the second line of @criterion@ report as one might expect,
but a mean value from the penultimate line and a standard deviation:
To add to the confusion, @gauge@ in @--small@ mode outputs not the
second line of @criterion@ report as one might expect, but a mean value
from the penultimate line and a standard deviation:
@
fibonacci numbers/fifth mean 62.39 ns ( +- 1.753 ns )
@
> fibonacci numbers/fifth mean 62.39 ns ( +- 1.753 ns )
The interval ±1.753 ns answers
for [68%](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)
of samples only, double it to estimate the behavior in 95% of cases.
The interval ±1.753 ns answers for
<https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule 68%> of
samples only, double it to estimate the behavior in 95% of cases.
=== Statistical model
Here is a procedure used by @tasty-bench@ to measure execution time:
1. Set \( n \leftarrow 1 \).
2. Measure execution time \( t_n \) of \( n \) iterations
and execution time \( t_{2n} \) of \( 2n \) iterations.
3. Find \( t \) which minimizes deviation of \( (nt, 2nt) \) from \( (t_n, t_{2n}) \).
4. If deviation is small enough (see @--stdev@ below),
return \( t \) as a mean execution time.
5. Otherwise set \( n \leftarrow 2n \) and jump back to Step 2.
1. Set \( n \leftarrow 1 \).
2. Measure execution time \( t_n \) of \( n \) iterations and execution time
\( t_{2n} \) of \( 2n \) iterations.
3. Find \( t \) which minimizes deviation of \( (nt, 2nt) \) from
\( (t_n, t_{2n}) \).
4. If deviation is small enough (see @--stdev@ below), return \( t \) as a
mean execution time.
5. Otherwise set \( n \leftarrow 2n \) and jump back to Step 2.
This is roughly similar to the linear regression approach which @criterion@ takes,
but we fit only two last points. This allows us to simplify away all heavy-weight
statistical analysis. More importantly, earlier measurements,
which are presumably shorter and noisier, do not affect overall result.
This is in contrast to @criterion@, which fits all measurements and
is biased to use more data points corresponding to shorter runs
(it employs \( n \leftarrow 1.05n \) progression).
This is roughly similar to the linear regression approach which
@criterion@ takes, but we fit only two last points. This allows us to
simplify away all heavy-weight statistical analysis. More importantly,
earlier measurements, which are presumably shorter and noisier, do not
affect overall result. This is in contrast to @criterion@, which fits
all measurements and is biased to use more data points corresponding to
shorter runs (it employs \( n \leftarrow 1.05n \) progression).
An alert reader could object that we measure standard deviation
for samples with \( n \) and \( 2n \) iterations, but report
it scaled to a single iteration.
Strictly speaking, this is justified only if we assume
that deviating factors are either roughly periodic
(e. g., coarseness of a system clock, garbage collection)
or are likely to affect several successive iterations in the same way
(e. g., slow down by another concurrent process).
An alert reader could object that we measure standard deviation for
samples with \( n \) and \( 2n \) iterations, but report it scaled to a single
iteration. Strictly speaking, this is justified only if we assume that
deviating factors are either roughly periodic (e. g., coarseness of a
system clock, garbage collection) or are likely to affect several
successive iterations in the same way (e. g., slow down by another
concurrent process).
Obligatory disclaimer: statistics is a tricky matter, there is no
one-size-fits-all approach.
In the absence of a good theory
simplistic approaches are as (un)sound as obscure ones.
Those who seek statistical soundness should rather collect raw data
and process it themselves using a proper statistical toolbox.
Data reported by @tasty-bench@
is only of indicative and comparative significance.
one-size-fits-all approach. In the absence of a good theory simplistic
approaches are as (un)sound as obscure ones. Those who seek statistical
soundness should rather collect raw data and process it themselves using
a proper statistical toolbox. Data reported by @tasty-bench@ is only of
indicative and comparative significance.
=== Memory usage
Passing @+RTS@ @-T@ (via @cabal@ @bench@ @--benchmark-options@ @'+RTS@ @-T'@
or @stack@ @bench@ @--ba@ @'+RTS@ @-T'@) enables @tasty-bench@ to estimate and report
memory usage such as allocated and copied bytes.
Passing @+RTS@ @-T@ (via @cabal@ @bench@ @--benchmark-options@ @\'+RTS@ @-T\'@ or
@stack@ @bench@ @--ba@ @\'+RTS@ @-T\'@) enables @tasty-bench@ to estimate and
report memory usage such as allocated and copied bytes:
@
All
fibonacci numbers
fifth: OK (2.13s)
63 ns ± 3.4 ns, 223 B allocated, 0 B copied
tenth: OK (1.71s)
809 ns ± 73 ns, 2.3 KB allocated, 0 B copied
twentieth: OK (3.39s)
104 μs ± 4.9 μs, 277 KB allocated, 59 B copied
All 3 tests passed (7.25s)
@
> All
> fibonacci numbers
> fifth: OK (2.13s)
> 63 ns ± 3.4 ns, 223 B allocated, 0 B copied
> tenth: OK (1.71s)
> 809 ns ± 73 ns, 2.3 KB allocated, 0 B copied
> twentieth: OK (3.39s)
> 104 μs ± 4.9 μs, 277 KB allocated, 59 B copied
>
> All 3 tests passed (7.25s)
=== Command-line options
Use @--help@ to list command-line options.
[@-p@, @--pattern@]:
This is a standard @tasty@ option, which allows filtering benchmarks
by a pattern or @awk@ expression. Please refer
to [@tasty@ documentation](https://github.com/feuerbach/tasty#patterns)
for details.
This is a standard @tasty@ option, which allows filtering benchmarks
by a pattern or @awk@ expression. Please refer
to [@tasty@ documentation](https://github.com/feuerbach/tasty#patterns)
for details.
[@--csv@]:
File to write results in CSV format.
File to write results in CSV format.
[@-t@, @--timeout@]:
This is a standard @tasty@ option, setting timeout for individual benchmarks
in seconds. Use it when benchmarks tend to take too long: @tasty-bench@ will make
an effort to report results (even if of subpar quality) before timeout. Setting
timeout too tight (insufficient for at least three iterations)
will result in a benchmark failure.
This is a standard @tasty@ option, setting timeout for individual
benchmarks in seconds. Use it when benchmarks tend to take too long:
@tasty-bench@ will make an effort to report results (even if of
subpar quality) before timeout. Setting timeout too tight
(insufficient for at least three iterations) will result in a
benchmark failure.
[@--stdev@]:
Target relative standard deviation of measurements in percents (5% by default).
Large values correspond to fast and loose benchmarks, and small ones to long and precise.
If it takes far too long, consider setting @--timeout@,
which will interrupt benchmarks, potentially before reaching the target deviation.
Target relative standard deviation of measurements in percents (5%
by default). Large values correspond to fast and loose benchmarks,
and small ones to long and precise. If it takes far too long,
consider setting @--timeout@, which will interrupt benchmarks,
potentially before reaching the target deviation.
-}