"Streamly.Internal.Data.Strict" is replaced by:
Streamly.Internal.Data.Tuple.Strict
Streamly.Internal.Data.Maybe.Strict
Streamly.Internal.Data.Either.Strict
This commit also has some formatting changes to imports.
This commit results in worse performance because now we are double
buffering once in ParserD and once in ParserK. This can potentially be
fixed but would require bigger changes to unify the backtracking buffer
management for ParserD and ParserK.
Use --stream-size for all the array benchmarks now
Replace the following files:
- benchmark/Streamly/Benchmark/Data/Array.hs
- benchmark/Streamly/Benchmark/Data/ArrayOps.hs
- benchmark/Streamly/Benchmark/Data/Prim/Array.hs
- benchmark/Streamly/Benchmark/Data/Prim/ArrayOps.hs
- benchmark/Streamly/Benchmark/Data/SmallArray.hs
- benchmark/Streamly/Benchmark/Data/SmallArrayOps.hs
- benchmark/Streamly/Benchmark/Memory/Array.hs
- benchmark/Streamly/Benchmark/Memory/ArrayOps.hs
With:
- benchmark/Streamly/Benchmark/Array.hs
- benchmark/Streamly/Benchmark/ArrayOps.hs
Now they are in FileSystem.Handle module corresponding to the source module
with the same name. Also, now we have them arranged based on space complexity
so that we can apply RTS memory restrictions when running.
Also, now longer benchmarks use a shorter file.
"allocated" is much more stable for regression comparisons as it stays the same
whereas "time" varies based on various factors like cpu frequency, other things
running on the computer, context switches etc.
bytesCopied is a measure of long lived data being retained across GCs, which is
also a good measure of performance.
Some of the benchmarks were order of magnitude off due to missing INLINE for
type class operations. Now, all of them are in reasonable limits. Benchmarks
affected for serial streams:
* Functor, Applicative, Monad, transformers
We need to do a similar exercise for other types of streams and for
folds/parsers as well.
* Add 3 interesting cases for each concatMap case
* For mapM, map concurrently on a serial stream so that we measure the
concurrency overhead of mapM only and not both concurrent generation + mapM
* For Async streams add some benchmarks involving the `async` combinator.
* Add a benchmark for `foldrS`
* Now benchmark modules correspond to source modules. The Prelude module in
source corresponds to several modules one for each stream type.
* Benchmarks in the same order/groupings as they appear in source
* All benchmarks now have division according to space complexity
* Refactoring reduces a lot of code duplication especially the stream
generation and elimination functions.
* The RTS options are now completely set in the shell script to run the
benchmarks.
* RTS options can be set on a per benchmark basis. RTS options work correctly
now.
* The set of streaming/infinite stream benchmarks is now complete and we can
run all such benchmarks coneveniently.
* Benchmark "quick"/"speed" options can now be specified on a per benchmark
basis. Longer benchmarks can have fewer iterations/quick run time.
* Benchmarks are grouped in several groups which can be run on a per group
basis. Comparison groups are also defined for convenient comparisons of
different modules (e.g. arrays or streamD/K).
* The benchmark namespaces are grouped in a consistent manner. Benchmark
executables have a consistent naming based on module names.
Add: intersperseSuffix_, delay, timeIndexed
Change the APIs: times, absTimes, relTimes, timestamped
The new APIs have a default clock granularity of 10 ms.
add: times, relTimes, timestamped
unimplemented skeletons: durations, ticks, timeout
Changes to the original currentTime combinator: remove delay from the first
event, cap the granularity to 1 ms to guarantee reasonable cpu usage.
CPS performs much better for parsing operations like "sequence" and
"choice" on large containers. Given that applicative "sequence" does
not scale, I guess the Monad instance as well won't scale for direct
implementation.
Also instead of returning wrapped state in "Stop" return type, return
the final extracted value.
a) This makes a lot of code simpler because extract and error handling
is not required at Stop.
b) Not returning internal state indicates that the parse has ended and
state is not supposed to be used from this point onwards.
c) It may add a little complexity in the parser code as the parser has to
extract the value from state at Stop.
d) However, extract is still needed. But it is only needed if the fold
remains partial when the stream stops. In such case we may have to
use exception handling to detect error, and to distinguish the parse
errors from any other types of errors. But since this is not common
case it does not come in the fast path.
Note that we could remove extract altogether but that means we will have to
store the intermediate fold values in the driver loop which impacts
performance.
This is the initial version of stream parsing. It implements a "Parse"
type, some parsers based on that type, applicative composition, operations to
run parsers (parse) and to run parsers over chunks of a stream (parseChunks).
Parsers are just an extension of Folds in Streamly.Data.Fold. Parsers just add
backtracking and failure capabilities to folds.
Operations like splitOn to split a stream on a predicate can now be expressed
using a parser applied repeatedly on a stream. For example, line splitting can
be done using parsers. Parsers are as fast as fastest custom code for
line splitting.
1. Removed rts options at compile time for fold benchmark.
2. Added fold-o-1-space and fold-o-1-heap benchmark to bench.sh
3. Added fold-o-1-space and fold-o-1-heap to bench.sh
* Remove lib/B/Group.hs
* The idea: Have all the groups in Prelude and
export only the groups, that way, all the functions
aren't compiled multiple times (without and with
inlining).