Now they are in FileSystem.Handle module corresponding to the source module
with the same name. Also, now we have them arranged based on space complexity
so that we can apply RTS memory restrictions when running.
Also, now longer benchmarks use a shorter file.
Some of the benchmarks were order of magnitude off due to missing INLINE for
type class operations. Now, all of them are in reasonable limits. Benchmarks
affected for serial streams:
* Functor, Applicative, Monad, transformers
We need to do a similar exercise for other types of streams and for
folds/parsers as well.
* Add 3 interesting cases for each concatMap case
* For mapM, map concurrently on a serial stream so that we measure the
concurrency overhead of mapM only and not both concurrent generation + mapM
* For Async streams add some benchmarks involving the `async` combinator.
* Add a benchmark for `foldrS`
* Now benchmark modules correspond to source modules. The Prelude module in
source corresponds to several modules one for each stream type.
* Benchmarks in the same order/groupings as they appear in source
* All benchmarks now have division according to space complexity
* Refactoring reduces a lot of code duplication especially the stream
generation and elimination functions.
* The RTS options are now completely set in the shell script to run the
benchmarks.
* RTS options can be set on a per benchmark basis. RTS options work correctly
now.
* The set of streaming/infinite stream benchmarks is now complete and we can
run all such benchmarks coneveniently.
* Benchmark "quick"/"speed" options can now be specified on a per benchmark
basis. Longer benchmarks can have fewer iterations/quick run time.
* Benchmarks are grouped in several groups which can be run on a per group
basis. Comparison groups are also defined for convenient comparisons of
different modules (e.g. arrays or streamD/K).
* The benchmark namespaces are grouped in a consistent manner. Benchmark
executables have a consistent naming based on module names.
CPS performs much better for parsing operations like "sequence" and
"choice" on large containers. Given that applicative "sequence" does
not scale, I guess the Monad instance as well won't scale for direct
implementation.
Also instead of returning wrapped state in "Stop" return type, return
the final extracted value.
a) This makes a lot of code simpler because extract and error handling
is not required at Stop.
b) Not returning internal state indicates that the parse has ended and
state is not supposed to be used from this point onwards.
c) It may add a little complexity in the parser code as the parser has to
extract the value from state at Stop.
d) However, extract is still needed. But it is only needed if the fold
remains partial when the stream stops. In such case we may have to
use exception handling to detect error, and to distinguish the parse
errors from any other types of errors. But since this is not common
case it does not come in the fast path.
Note that we could remove extract altogether but that means we will have to
store the intermediate fold values in the driver loop which impacts
performance.