We already run the whole thread with captured monad state (in doFork). Running
with captured state in parallel stream is unnecessary. This leads to buildup of
memory on thread stack and memory keeps growing continuously. Also with the
removal of this call we are at 8 seconds vs 34 seconds in concatMapBy parallel
benchmark.
memory usage reduced by 7% (930 vs 870 MB) and time improved by 9% (34 sec vs
31 sec) when concating 100 concurrent threads with 500K elements in each
thread stream.
Measure space usage in interesting cases.
The motivation for these benchmarks is parallel stream consuming a lot of stack
space in concatMap case. These benchmarks will hopefully catch any such
problems in future.
These benchmarks may take a lot of time to allow memory consumption to slowly
buildup to noticeable amount in case there is a problem. Therefore they are
enabled only in dev builds. We can use `--include-first-iter`, `--min-duration
0` options in gauge to run them in reasonable time. They need to be run with
`--measure-with` option to allow isolated runs, otherwise memory measurement
won't be correct.
* Earlier ParallelT was unaffected by `maxBuffer` directive, now `maxBuffer`
can limit the buffer of a ParallelT stream as well. When the buffer becomes
full the producer threads block.
* ParallelT streams no longer have an unlimited buffer by default. Now the
buffer for parallel streams is limited to 1500 by default, the same as other
concurrent stream types.
Currently maxThreads are independent of maxBuffer. If maxThreads are larger
than maxBuffer we may dispatch as many workers. Since we check the buffer limit
only after the worker yields a value, it may be too late and we may exceed the
buffer already. To make sure that we never exceed the buffer we restrict the
maxWorkers to maxBuffer value.
The invariant maxThreads <= maxBuffer holds.