Move explanatory paragraphs in "Stream" to user docs

Add the fusion limitation section from the Fold module to stream dev doc
This commit is contained in:
Harendra Kumar 2024-01-26 16:05:58 +05:30
parent 914a76f50d
commit ce02e14644
2 changed files with 134 additions and 120 deletions

View File

@ -8,22 +8,55 @@
-- Stability : released
-- Portability : GHC
--
-- Streams represented as state machines, that fuse together when composed
-- statically, eliminating function calls or intermediate constructor
-- allocations - generating tight, efficient loops. Suitable for high
-- performance looping operations.
-- Streamly is a framework for modular data flow based programming and
-- declarative concurrency. Powerful stream fusion framework in streamly
-- allows high performance combinatorial programming even when using byte level
-- streams. Streamly API is similar to Haskell lists.
--
-- If you need to call these operations recursively in a loop (i.e. composed
-- dynamically) then it is recommended to use the continuation passing style
-- (CPS) stream operations from the "Streamly.Data.StreamK" module. 'Stream'
-- and 'StreamK' types are interconvertible. See more details in the
-- documentation below regarding 'Stream' vs 'StreamK'.
-- Streams are represented as state machines, that fuse together when composed
-- statically, eliminating function calls or intermediate constructor
-- allocations - generating tight, efficient loops similar to code generated by
-- low level languages like C. Suitable for high performance looping
-- operations.
--
-- Operations in this module are not meant to be used recursively. In other
-- words, they are supposed to be composed statically rather than dynamically.
-- For dynamic, recursive composition use the continuation passing style (CPS)
-- stream operations from the "Streamly.Data.StreamK" module. 'Stream' and
-- 'Streamly.Data.StreamK.StreamK' types are interconvertible.
--
-- Please refer to "Streamly.Internal.Data.Stream" for more functions that have
-- not yet been released.
--
-- Checkout the <https://github.com/composewell/streamly-examples>
-- repository for many more real world examples of stream programming.
--
-- == Console Echo Example
--
-- Here is an example of program which reads lines from console and writes them
-- back to the console. It is a simple example of a declarative loop written
-- using streaming combinators. Compare it with an imperative @while@ loop
-- used to write a similar program.
--
-- >>> import Data.Function ((&))
-- >>> :{
-- echo =
-- Stream.repeatM getLine -- Stream IO String
-- & Stream.mapM putStrLn -- Stream IO ()
-- & Stream.fold Fold.drain -- IO ()
-- :}
--
-- In this example, 'repeatM' generates an infinite stream of 'String's by
-- repeatedly performing the 'getLine' IO action. 'mapM' then applies
-- 'putStrLn' on each element in the stream converting it to stream of '()'.
-- Finally, 'Streamly.Data.Fold.drain' 'fold's the stream to IO discarding the
-- () values, thus producing only effects.
--
-- Hopefully, this gives you an idea of how we can program declaratively by
-- representing loops using streams. In this module, you can find all
-- "Data.List"-like functions and many more powerful combinators to perform
-- common programming tasks.
--
module Streamly.Data.Stream
(
@ -51,12 +84,10 @@ module Streamly.Data.Stream
-- >>> fromFoldableM = Stream.sequence . fromFoldable
-- ** Primitives
-- | A fused 'Stream' is never constructed using these primitives, they are
-- typically generated by converting containers like list into streams, or
-- generated using custom functions provided in this module. The 'cons'
-- primitive in this module has a rare use in fusing a small number of
-- elements. On the other hand, it is common to construct 'StreamK' stream
-- using the StreamK.'StreamK.cons' primitive.
-- | These primitives are meant to statically fuse a small number of stream
-- elements. The 'Stream' type is never constructed at large scale using
-- these primitives. Use 'StreamK' if you need to construct a stream from
-- primitives.
, nil
, nilM
, cons
@ -638,108 +669,3 @@ import qualified Streamly.Internal.Data.Array.Type as Array
#include "DocTestDataStream.hs"
-- $overview
--
-- Streamly is a framework for modular data flow based programming and
-- declarative concurrency. Powerful stream fusion framework in streamly
-- allows high performance combinatorial programming even when using byte level
-- streams. Streamly API is similar to Haskell lists.
--
-- == Console Echo Example
--
-- In the following example, 'repeatM' generates an infinite stream of 'String'
-- by repeatedly performing the 'getLine' IO action. 'mapM' then applies
-- 'putStrLn' on each element in the stream converting it to stream of '()'.
-- Finally, 'drain' folds the stream to IO discarding the () values, thus
-- producing only effects.
--
-- >>> import Data.Function ((&))
--
-- >>> :{
-- echo =
-- Stream.repeatM getLine -- Stream IO String
-- & Stream.mapM putStrLn -- Stream IO ()
-- & Stream.fold Fold.drain -- IO ()
-- :}
--
-- This is a console echo program. It is an example of a declarative loop
-- written using streaming combinators. Compare it with an imperative @while@
-- loop.
--
-- Hopefully, this gives you an idea how we can program declaratively by
-- representing loops using streams. In this module, you can find all
-- "Data.List" like functions and many more powerful combinators to perform
-- common programming tasks.
--
-- == Stream Fusion
--
-- The fused 'Stream' type in this module employs stream fusion for C-like
-- performance when looping over data. It represents the stream as a state
-- machine using an explicit state, and a step function working on the state. A
-- typical stream operation consumes elements from the previous state machine
-- in a stream pipeline, transforms the elements and yields new values for the
-- next stage to consume. The stream operations are modular and represent a
-- single task, they have no knowledge of previous or next operation on the
-- elements.
--
-- A typical stream pipeline consists of a stream producer, several stream
-- transformation operations and a stream consumer. All these operations taken
-- together form a closed loop processing the stream elements. Elements are
-- transferred between stages using a boxed data constructor. However, all the
-- stages of the pipeline are fused together by GHC, eliminating the boxing of
-- intermediate constructors, and thus forming a tight C like loop without any
-- boxed data being used in the loop.
--
-- Stream fusion works effectively when:
--
-- * the stream pipeline is composed statically (known at compile time)
-- * all the operations forming the loop are inlined
-- * the loop is not recursively defined, recursion breaks inlining
--
-- If these conditions cannot be met, the CPS style stream type 'StreamK' may
-- turn out to be a better choice than the fused stream type 'Stream'.
--
-- == Stream vs StreamK
--
-- The fused stream model avoids constructor allocations and function call
-- overheads. However, the stream is represented as a state machine, and to
-- generate stream elements it has to navigate the decision tree of the state
-- machine. Moreover, the state machine is cranked for each element in the
-- stream. This performs extremely well when the number of states are limited.
-- The state machine starts getting expensive as the number of states increase.
-- For example, generating a stream from a list requires a single state and is
-- very efficient, even if it has millions of elements. However, using 'cons'
-- to construct a million element stream would be a disaster.
--
-- A typical worst case scenario for fused stream model is a large number of
-- `cons` or `append` operations. A few static `cons` or `append` operations
-- are very fast and much faster than a CPS style stream because CPS involves a
-- function call for each element whereas fused stream involves a few
-- conditional branches in the state machine. However, constructing a large
-- stream using `cons` introduces as many states in the state machine as the
-- number of elements. If we compose `cons` as a balanced binary tree it will
-- take @n * log n@ time to navigate the tree, and @n * n@ if it is a right
-- associative composition.
--
-- Operations like 'cons' or 'append'; are typically recursively called to
-- construct a lazy infinite stream. For such use cases the CPS style 'StreamK'
-- should be used. CPS streams do not have a state machine that needs to be
-- cranked for each element, past state has no effect on the future element
-- processing. However, CPS incurs a function call overhead for each element
-- processed, the overhead could be large compared to a fused state machine
-- even if it has many states. However, because of its linear performance
-- characterstics, after a certain threshold of stream compositions the CPS
-- stream would perform much better than the quadratic fused stream operations.
--
-- As a general guideline, you need to use 'StreamK' when you have to use
-- 'cons', 'append' or other operations having quadratic complexity at a large
-- scale. Typically, in such cases you need to compose the stream recursively,
-- by calling an operation in a loop. The decision to compose the stream is
-- taken at run time rather than statically at compile time.
--
-- Typically you would compose a 'StreamK' of chunks of data so that the
-- StreamK overhead is not high, and then process the chunks using 'Stream' by
-- using statically fused stream pipeline operations on the chunks.
--
-- 'Stream' and 'StreamK' types can be interconverted. See
-- "Streamly.Data.StreamK" module for conversion operations.

View File

@ -0,0 +1,88 @@
# Stream Fusion
The fused 'Stream' type employs stream fusion for C-like performance when
looping over data. It represents the stream as a state machine using an
explicit state, and a step function working on the state. A typical stream
operation consumes elements from the previous state machine in a stream
pipeline, transforms the elements and yields new values for the next stage to
consume. One stream operation typically represents one stage of a modular
pipeline, representing a single task, stages in the pipeline have no knowledge
of the state of the previous or next stage.
A typical stream pipeline consists of a stream producer, several stream
transformation operations and a stream consumer. All these operations taken
together form a closed loop (like a for or while loop) processing the stream
elements. Elements are transferred between stages using a boxed data
constructor. However, all the stages of the pipeline are "fused" together by
GHC, eliminating the boxing of intermediate constructors, and thus forming a
tight C like loop without any boxed data being used in the loop.
Stream fusion works effectively when:
* the stream pipeline is composed statically (known at compile time)
* all the operations forming the loop are inlined
* the loop is not recursively defined, recursion breaks inlining
If these conditions cannot be met, the CPS style stream type 'StreamK' may
turn out to be a better choice than the fused stream type 'Stream'.
## Stream vs StreamK
The fused stream model avoids constructor allocations and function call
overheads. However, the stream is represented as a state machine, and to
generate stream elements it has to navigate the decision tree of the state
machine. Moreover, the state machine is cranked for each element in the stream.
This performs extremely well when the number of states are limited. The state
machine starts getting expensive as the number of states increase. For example,
generating a stream from a list requires a single state and is very efficient,
even if it has millions of elements, there is only one state machine. However,
using 'cons' to construct a million element stream would be a disaster because
it is statically fusing a million state machines.
A typical worst case scenario for fused stream model is a large number of
`cons` or `append` operations. A few static `cons` or `append` operations
are very fast, much faster than a CPS style stream because CPS involves a
function call for each element whereas fused stream involves a few
conditional branches in the state machine. However, constructing a large
stream using `cons` introduces as many states in the state machine as the
number of elements. If we compose `cons` as a balanced binary tree it will
take @n * log n@ time to navigate the tree, and @n * n@ if it is a right
associative composition.
Operations like 'cons' or 'append' are typically recursively called to
construct a lazy infinite stream. For such use cases the CPS style 'StreamK'
should be used. CPS streams do not use an explicit state machine that needs to be
cranked for each element, past state has no effect on the future element
processing. However, CPS incurs a function call overhead for each element
processed, the overhead could be large compared to a fused state machine
even if it has many states. However, because of its linear performance
characteristics, after a certain threshold of stream compositions the CPS
stream would perform much better than the quadratic fused stream operations.
As a general guideline, you need to use 'StreamK' when you have to use
'cons', 'append' or other operations having quadratic complexity at a large
scale. Typically, in such cases you need to compose the stream recursively,
by calling an operation in a loop. The decision to compose the stream is
taken at run time rather than statically at compile time.
Typically you would compose a 'StreamK' of chunks of data so that the StreamK
function call overhead is mitigated, and then process each chunk using 'Stream'
type operations by using statically fused stream pipeline operations on the
chunks.
'Stream' and 'StreamK' types can be interconverted. See
"Streamly.Data.StreamK" module for conversion operations.
## Fold Fusion
Folds support stream fusion for generating loops comparable to the speed of
C. However, it has some limitations. For fusion to work, the folds must be
inlined, folds must be statically known and not generated dynamically, folds
should not be passed recursively.
Another limitation is due to the quadratic complexity causing slowdown when
too many nested compositions are used. Especially, the performance of the
Applicative instance and splitting operations (e.g. 'splitWith') degrades
quadratically (O(n^2)) when combined @n@ times, roughly 8 or less sequenced
operations are fine. For these cases folds can be converted to parsers and
then used as ParserK.