rename toArray/toArrayN to writeF/writeNF to keep the API name consistent with
their regular counterparts. Now the only difference in fold APIs is a suffix F.
Measure space usage in interesting cases.
The motivation for these benchmarks is parallel stream consuming a lot of stack
space in concatMap case. These benchmarks will hopefully catch any such
problems in future.
These benchmarks may take a lot of time to allow memory consumption to slowly
buildup to noticeable amount in case there is a problem. Therefore they are
enabled only in dev builds. We can use `--include-first-iter`, `--min-duration
0` options in gauge to run them in reasonable time. They need to be run with
`--measure-with` option to allow isolated runs, otherwise memory measurement
won't be correct.
The implementation of fromStreamDArraysOf is now 3x more efficient compared to
the earlier implementation. This makes byte stream level operations almost as
efficient as array level operations.
Other than this the following changes are added in this commit:
* Add insertAfterEach
* Add writeArraysPackedUpto to Handle IO
* Implement `wc -l` example more efficiently using arrays
* Add benchmark for lines/unlines using arrays
* Add tests for splitArraysOn
* Rename some array/file/handle APIs
* Error handling when the group size in grouping operations is 0 or negative
* Streamly.FileSystem.FD module for unbuffered IO. Buffering can be controlled
from the stream itself.
* Support IO using writev as well
Array APIs include:
* coalesceChunksOf to coalesce smaller arrays in a stream into bigger ones
* unlinesArraysBy to join a stream of arrays using a line separator
* splitArraysOn to split a stream of arrays on a separator byte
* Implement a stream flavor specific (concurrent/wSerial etc.) concatMap
(concatMapBy). Now concatMap has the same capabilities that the monad bind of
different streams has. In fact the monad bind is implemented using concatMap
* Use foldr/build fusion for concatMap (currently disabled) and for map/fmap
and mapM (enabled)
* Add benchmarks for bind/concatMap performance
unsafe use of unsafeInlineIO caused each array allocated in the toArrayN fold
to share the same memory.
This fix uses the IO monad to make sure that the code is not pure and therefore
we always allocate new memory. All such usage of unsafePerformIo have been
fixed. The remaining ones are reviewed to be safe.
After perf measurement these seems to perform the same as a scan followed by
map therefore we have not exposed these but kept for perf comparison and just
in case use.
* Deprecate `scanx`, `foldx`, `foldxM`
* Remove deprecated APIs `scan`, `foldl`, `foldlM`
* Fix the signature of foldrM
* Implement some custom folds in terms of foldr
* Document folds and scans better
* Reorganize the documentation in Prelude
* Add foldrS and foldrT for transforming folds
* add toRevList
* Add benchmarks and tests for the new folds
APIs
----
Removed:
merge
lazy left scans: scanl, scanlM, scanl1, scanl1M
Renamed:
generate and generateM renamed to fromIndices and fromIndicesM
Added:
replicate
mergeByM, mergeAsyncBy, mergeAsyncByM
`intFrom`, `intFromTo`, `intFromThen`, `intFromThenTo`,
`intFromStep`, `fracFrom`, `fracFromThen`, `fracFromThenTo`,
`numFromStep`
Added StreamD version of replicateM and a rewrite rule for replicateMSerial.
Added but not exposed:
postscanl and prescanl ops
Rewrote mergeByS in StreamK, simplified quite a bit and got some perf
improvement too.
Added @since notations to new APIs.
Fixed lines exceeding 80 columns.
Tests
-----
Added tests for the new enumeration APIs.
Improved some tests by generating values randomly using quickcheck forAll. We
can improve more tests similarly.
Removed some redundant transformOps tests.
reorganized test code in groups so as to keep similar functionality together
and added header lines so that we can find relevant code easily.
Benchmarks
----------
Added benchmarks for enumeration primitives added above. Added benchmarks for
scan and fold mixed ops. Added benchmark for concatMap. Fixed foldr and foldrM
benchmarks to use a (+) operation instead of a list operation for fair
comparision with other folds.
Kept only one benchmark each for deleteBy, insertBy, isPrefixOf and
isSubsequenceOf.
Documentation
-------------
Updated documentation, added examples for the new primitives as well as many
old ones. Especially the documentation of folds and scans was rewritten.
Reordered and re-organized the groups of APIs in the doc.
Refactoring
-----------
Some related and urelated refactoring.
Hlint
-----
Fixed some hlint hints introduced recently.
TBD
---
Some APIs need concurrent versions. I have added "XXX" notes for those.
Some more tests have to be added.
Some more benchmarks have to be added.
1) Add foldr1, findIndices, lookup, find, sequence functions
in StreamD, earlier they were present only in StreamK.
2) Change the above functions to use S. in Prelude instead
of K. .
3) Add tests for (!!), insertBy, splitAt, the.
1) Define the instances manually instead of deriving. Deriving picks up
incorrect implementations of methods that have been redefined in the newtypes.
2) Fix the Traversable instance.
3) Define the instances only for Serial stream types. Other streams cannot be
instantiated for Foldable monads.
4) Add test cases
5) add benchmarks
Also, disable monadic versions of drop/take. Pure versions use the monadic
ones internally so the monadic benchmarks are redundant unless we change the
implementations in future.
The main change is a single line change in StreamK.hs in foldxM routine.
Major changes in this commit are due to:
1) Added strictness tests for all foldl and scanl rotuines
2) refactoring to enable independent benchmarking for StreamK, to measure the
impact of the change.
- monadic stream generation functions are now concurrent
- monadic stream transformation (mapM, sequence) functions are now concurrent
- fixed a race which caused blockedindefinitely on MVar in rare cases
The trigger for this change is that parallel stream is not really parallel it is
a concurrent lookahead like just in a different traversal style. So we make
parallel and coparallel as parAhead and coparAhead instead and later introduce
a new style for parallel which would be strictly parallel with one thread for
each stream started right away rather than speculatively.
This was earlier changed from StreamT to SerialT. However we have made this the
default type now and it makes more sense to call it StreamT now. A bigger
motivation for the change is that StreamT immediately conveys that it is a
stream which is helpful and intuitive for new learners. Also we have a plan to
have a specialized type called "Stream a = StreamT IO a", so the name StreamT
is in line with that common default type "Stream a". Calling that type as
"Serial a" does not sound as intuitive as calling it "Stream a".
Monoid instances should not be derived from underlying type where we want them
to be type specific. Added tests to make sure they are correct.
Other typeclass instances that are dependent on type specific behavior should
also be independently defined rather than derived.
Removed Alternative instances as they are not correct yet.
Removed redundancy by using CPP macros to define instances of different types.
Also rename asyncly to aparallely and runAsyncT to runAParallelT
The name Async is confusing and does not convey the right meaning. The 'A' in
AParallelT stands for 'Adaptive' and 'Parallel' indicates that this is a
variant of Parallel.
Another choice for the name was 'MParallelT' where 'M' stands for 'maybe', it
is maybe parallel since it may or may not start parallel threads depending on
demand but Adaptive fits better as it adapts to the demand.