Support running groups of benchmarks e.g. ARRAY/SERIAL/CONCURRENT
Add missing array benchmarks and "parallel" benchmark
For "--long" case run only INFINITE stream benchmarks
The predicate provides more power to the combinator by allowing you to count
only certain elements in the stream and filter out others. This is efficient
compared to an async tap followed by a filter and fold as here filtering is
done on the source.
Number elements may be incorrect if the stream size is specified on the command
line. Also it becomes inconsistent if changes are made to the defaults in the
benchmarks.
Do not use "tail" in zip benchmarks otherwise the tail perf can impact these
benchmarks. Also it makes them consistent with other such benchmarks.
Use half the stream size for each side of zip so that we effectively
create a stream of size similar to other linear benchmarks.
The "after", "finally" and "bracket" combinators did not run the "cleanup"
handler in case the stream is lazily partially evaluated e.g. using the lazy
right fold (foldrM), "Streamly.Prelude.head" is an example of such a fold.
Since we run the cleanup action when the stream Stops, the action won't be run
if the stream is not fully drained.
In the new implementation, we use a GC hook to run the cleanup action in case
the stream got garbage collected even before finishing. This will take care of
the lazy right fold cases mentioned above.
This is to make it the same as other benchmarks. We had to change the size of
the extreme concatMapWith benchmark to keep it in reasonable time limits.
* We are testing two extreme cases, add a middle case as well where the outer
and inner streams are of equal size.
* Enable some pure benchmarks as well
* Separate the zip benchmarks in a separate group as they are scalable
(memory consumption does not increase with stream size) and parallel
benchmarks are not scalable.
Streaming benchmarks take constant memory whereas the buffering ones take
memory proportional to the stream size. We would like to test them separately.
For streaming benchmarks we can use a very large stream size to make sure there
is no space leak and we are running in constant memory for long time.
gauge --measure-with forks a child process for each benchmark. The options
passed to the main gauge process are not passed to the child process. We use an
environment variable to set the stream-size and pass it on to the child
process.
* Document the precise behavior, some changes were made to the earlier behavior
* Make some changes to implementation according to (newly) documented behavior
* TakeByTime: perform the time check before generating the element so that we
do not drop an element after generation.
* TakeByTime now yields at least one element if the duration is non-zero
* dropByTime does not check the time after the drop duration is over
* Add inspection tests
* make the tests for shorter duration, earlier tests took too long
Use --stream-size to accept the stream size. The CLI argument position
is not fixed anymore so it can be specified anywhere on the command line. The
remaining arguments are used by gauge.
* pollCounts to poll the element count in another thread
* delayPost to introduce a delay in polling
* rollingMap to compute diff of successive elements
These combinators can be used to compute and report the element
processing rate in a stream.
* Instead of using hard coded numbers scale them based on the stream size.
* Add concatMapWith benchmarks for concurrent streams
* Add a linear-async-long benchmark that works on streams of 10 million
elements
We need large streams to detect memory leak issues. Specifically, we
could not figure out the concatMapWith memory leak issue without a
stream of at least 10s of millions of elements.
For long benchmarks we use gauge defaults so that we do not run many
iterations/samples.
Keep related benchmark code together and try to keep the order in sync with
the Streamly.Prelude module.
This is purely a reorganization of code by moving related code together
and adding comments for subsections of code. There is no functionality
change at all.
The change of StreamK/Type.serial impl makes this test fail because of IsStream
dictionary still being around. However, there is not much difference in the
actual benchmark results. And, serial impl change has other much better results
in other benchmarks.
Functor instances of various stream types are directly expressed in terms of
StreamD map impl.
Applicative instances now use concatMap instead of using "ap" from Monad.
Rewrite rules to specialize unfoldrM to WSerial and ZipSerial as well so
that it performs well for those. This was causing benchmarks for these
to be much slower beacause of unfoldrM used in the benchmarks being
slow. For this, I had to move the zipWith* impls into the Zip module so
that we can import the ZipSerial type in Prelude.
* Rearrange, update benchmarks for words/unwords, lines/unlines
* Add interpose/intercalate/interleaveInfix/interleaveSuffix to Prelude
* Add unfoldWords to Data.String
Use the newly added rotuines to write words/unwords and lines/unlines more
idiomatically and with improved performance.
* we are using "unfold" for unfolding, runUnfold is longer and sounds a bit
weird. So similar to "unfold" we should use "fold" for folding.
* Another reason is that "runFold" sound ends in "unfold" so can sound
confusing.
Especially add chaining of unfolds and outerProduct operations. outerProduct is
just the cartesian product of two streams, it is like the concatMap/bind for
streams. In contrast to concatMap, the unfold nested looping operations are
amenable to complete fusion providing us amazing performance equivalent to
linear stream operations.
The performance is similar to flattenArrays, the code fuses perfectly. But in
the linecount benchmark Unfold case performs 50% slower even though this is not
due to fusion, just less optimal code being generated.
Each container type (e.g. Handle/Socket/File) may have similar nested/stream
level operations. We need a standardized way of naming the combinators related
to streams of containers. Also, we cannot have a separate module for such
combinators for each container type. Therefore it makes sense to put them in
the same module.
rename toArray/toArrayN to writeF/writeNF to keep the API name consistent with
their regular counterparts. Now the only difference in fold APIs is a suffix F.
Measure space usage in interesting cases.
The motivation for these benchmarks is parallel stream consuming a lot of stack
space in concatMap case. These benchmarks will hopefully catch any such
problems in future.
These benchmarks may take a lot of time to allow memory consumption to slowly
buildup to noticeable amount in case there is a problem. Therefore they are
enabled only in dev builds. We can use `--include-first-iter`, `--min-duration
0` options in gauge to run them in reasonable time. They need to be run with
`--measure-with` option to allow isolated runs, otherwise memory measurement
won't be correct.
The implementation of fromStreamDArraysOf is now 3x more efficient compared to
the earlier implementation. This makes byte stream level operations almost as
efficient as array level operations.
Other than this the following changes are added in this commit:
* Add insertAfterEach
* Add writeArraysPackedUpto to Handle IO
* Implement `wc -l` example more efficiently using arrays
* Add benchmark for lines/unlines using arrays
* Add tests for splitArraysOn
* Rename some array/file/handle APIs
* Error handling when the group size in grouping operations is 0 or negative
* Streamly.FileSystem.FD module for unbuffered IO. Buffering can be controlled
from the stream itself.
* Support IO using writev as well
Array APIs include:
* coalesceChunksOf to coalesce smaller arrays in a stream into bigger ones
* unlinesArraysBy to join a stream of arrays using a line separator
* splitArraysOn to split a stream of arrays on a separator byte
* Implement a stream flavor specific (concurrent/wSerial etc.) concatMap
(concatMapBy). Now concatMap has the same capabilities that the monad bind of
different streams has. In fact the monad bind is implemented using concatMap
* Use foldr/build fusion for concatMap (currently disabled) and for map/fmap
and mapM (enabled)
* Add benchmarks for bind/concatMap performance
unsafe use of unsafeInlineIO caused each array allocated in the toArrayN fold
to share the same memory.
This fix uses the IO monad to make sure that the code is not pure and therefore
we always allocate new memory. All such usage of unsafePerformIo have been
fixed. The remaining ones are reviewed to be safe.
After perf measurement these seems to perform the same as a scan followed by
map therefore we have not exposed these but kept for perf comparison and just
in case use.
* Deprecate `scanx`, `foldx`, `foldxM`
* Remove deprecated APIs `scan`, `foldl`, `foldlM`
* Fix the signature of foldrM
* Implement some custom folds in terms of foldr
* Document folds and scans better
* Reorganize the documentation in Prelude
* Add foldrS and foldrT for transforming folds
* add toRevList
* Add benchmarks and tests for the new folds
APIs
----
Removed:
merge
lazy left scans: scanl, scanlM, scanl1, scanl1M
Renamed:
generate and generateM renamed to fromIndices and fromIndicesM
Added:
replicate
mergeByM, mergeAsyncBy, mergeAsyncByM
`intFrom`, `intFromTo`, `intFromThen`, `intFromThenTo`,
`intFromStep`, `fracFrom`, `fracFromThen`, `fracFromThenTo`,
`numFromStep`
Added StreamD version of replicateM and a rewrite rule for replicateMSerial.
Added but not exposed:
postscanl and prescanl ops
Rewrote mergeByS in StreamK, simplified quite a bit and got some perf
improvement too.
Added @since notations to new APIs.
Fixed lines exceeding 80 columns.
Tests
-----
Added tests for the new enumeration APIs.
Improved some tests by generating values randomly using quickcheck forAll. We
can improve more tests similarly.
Removed some redundant transformOps tests.
reorganized test code in groups so as to keep similar functionality together
and added header lines so that we can find relevant code easily.
Benchmarks
----------
Added benchmarks for enumeration primitives added above. Added benchmarks for
scan and fold mixed ops. Added benchmark for concatMap. Fixed foldr and foldrM
benchmarks to use a (+) operation instead of a list operation for fair
comparision with other folds.
Kept only one benchmark each for deleteBy, insertBy, isPrefixOf and
isSubsequenceOf.
Documentation
-------------
Updated documentation, added examples for the new primitives as well as many
old ones. Especially the documentation of folds and scans was rewritten.
Reordered and re-organized the groups of APIs in the doc.
Refactoring
-----------
Some related and urelated refactoring.
Hlint
-----
Fixed some hlint hints introduced recently.
TBD
---
Some APIs need concurrent versions. I have added "XXX" notes for those.
Some more tests have to be added.
Some more benchmarks have to be added.