2019-07-25 22:43:13 +03:00
|
|
|
## INLINE Phases
|
|
|
|
|
|
|
|
A missing inline or inline in an incorrect GHC simplifier phase can adversely
|
|
|
|
impact performance. We use three builtin phases of GHC simplifier for inlining
|
|
|
|
i.e. phase 0, 1 and 2. We have defined them as follows in `inline.h`:
|
|
|
|
|
|
|
|
```
|
|
|
|
#define INLINE_EARLY INLINE [2]
|
|
|
|
#define INLINE_NORMAL INLINE [1]
|
|
|
|
#define INLINE_LATE INLINE [0]
|
|
|
|
```
|
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
## Low Level `fromStreamD/toStreamD` Fusion
|
|
|
|
|
|
|
|
The combinators in `Streamly.Prelude` are defined in terms of combinators in
|
2019-12-09 12:44:05 +03:00
|
|
|
`Streamly.Internal.Data.Stream.StreamD` (Direct style streams) or `Streamly.Internal.Data.Stream.StreamK`
|
2019-08-11 01:04:08 +03:00
|
|
|
(CPS style streams). We convert the stream from `StreamD` to `StreamK`
|
|
|
|
representation or vice versa in some cases.
|
|
|
|
|
|
|
|
In the first inlining phase (INLINE_EARLY or INLINE) we expand
|
|
|
|
the combinators in `Streamly.Prelude` into
|
|
|
|
fromStreamD/fromStreamK/toStreamD/toStreamK and combinators defined in StreamD
|
|
|
|
or StreamK modules. Once we do that fromStreamD/toStreamD get exposed and we
|
|
|
|
can apply rewrite rules to rewrite transformations like `fromStreamK .
|
|
|
|
toStreamK` to `id`. A plain `INLINE` pragma is usually enough on combinators in
|
|
|
|
`Streamly.Prelude`.
|
2019-07-25 22:43:13 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
{-# RULES "fromStreamK/toStreamK fusion"
|
|
|
|
forall s. toStreamK (fromStreamK s) = s #-}
|
|
|
|
```
|
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
Also, we have to prevent fromStreamK and toStreamK themselves from inlining in
|
|
|
|
this phase so that rewrite rules can be applied on them, therefore, we annotate
|
|
|
|
these functions with `INLINE_LATE`.
|
|
|
|
|
|
|
|
## Fallback Rules
|
|
|
|
|
|
|
|
In some cases, if the operation could not fuse we want to use a fallback
|
|
|
|
rewrite rule in the next phase. For such cases we use the INLINE_EARLY phase
|
|
|
|
for the first rewrite and the INLINE_NORMAL phase for the fallback rules.
|
2019-07-25 22:43:13 +03:00
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
The fallback rules make sure that if we could not fuse the direct style
|
|
|
|
operations then better use the CPS style operation, because unfused direct
|
|
|
|
style would have worse performance than the CPS style ops.
|
2019-07-25 22:43:13 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
{-# INLINE_EARLY unfoldr #-}
|
|
|
|
unfoldr :: (Monad m, IsStream t) => (b -> Maybe (a, b)) -> b -> t m a
|
|
|
|
unfoldr step seed = fromStreamS (S.unfoldr step seed)
|
|
|
|
{-# RULES "unfoldr fallback to StreamK" [1]
|
2019-07-26 12:48:46 +03:00
|
|
|
forall a b. S.toStreamK (S.unfoldr a b) = K.unfoldr a b #-}
|
2019-07-25 22:43:13 +03:00
|
|
|
```
|
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
## High Level Operation Fusion
|
2019-07-25 22:43:13 +03:00
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
Since each high level combinator in `Streamly.Prelude` is wrapped in
|
|
|
|
`fromStreamD/toStreamD` etc. the combinator fusion cannot work unless we have
|
|
|
|
removed those and exposed consecutive operations e.g. a `map` followed by
|
|
|
|
another `map`. Assuming that redundant `fromStreamK/toStreamK` have been
|
|
|
|
removed in the `INLINE_EARLY` phase, we can then apply the combinator fusion
|
|
|
|
rules in the `INLINE_NORMAL` phase. For example, we can fuse two `map`
|
|
|
|
operations into a single `map` operation. Note that now we have exposed the
|
|
|
|
`StreamD/StreamK` implementations of combinators and the rules would apply on
|
|
|
|
those.
|
|
|
|
|
|
|
|
## Inlining Higher Order Functions
|
|
|
|
|
|
|
|
Note that partially applied functions cannot be inlined. So if we have a code
|
|
|
|
like this:
|
2019-07-25 22:43:13 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
concatMap1 src = runStream $ S.concatMap (S.replicate 3) src
|
|
|
|
```
|
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
We want to ensure that `concatMap` gets inlined before `replicate` so that
|
|
|
|
`replicate` becomes fully applied before it gets inlined. Currently ensuring
|
|
|
|
that both of them are inlined in the same phase (`INLINE_NORMAL`) seems to be
|
|
|
|
enough to achieve that. In general, we should try to ensure that higher order
|
|
|
|
functions are inlined before or in the same phase as the functions they can
|
|
|
|
consume as arguments. This means `StreamD` combinators should not be marked
|
|
|
|
as `INLINE` or `INLINE_EARLY`, instead they should all be marked as
|
|
|
|
`INLINE_NORMAL` because higher order funcitons like `concatMap`/`map`/`mapM`
|
|
|
|
etc are marked as `INLINE_NORMAL`. `StreamD` functions in other modules like
|
|
|
|
`Streamly.Memory.Array` should also follow the same rules.
|
2019-07-25 22:43:13 +03:00
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
## Stream Fusion
|
2019-07-25 22:43:13 +03:00
|
|
|
|
2019-08-11 01:04:08 +03:00
|
|
|
In StreamD combinators, inlining the inner step or loop functions too early
|
|
|
|
i.e. in the same pahse or before the outer function is inlined may block stream
|
|
|
|
fusion opportunities. Therefore, the inner step functions and folding loops are
|
|
|
|
marked as INLINE_LATE.
|
2019-07-25 22:43:13 +03:00
|
|
|
|
|
|
|
## Specialization
|
|
|
|
|
|
|
|
In some cases, the `step` function in `StreamD` does not get specialized when
|
|
|
|
inlined unless it is provided with an explicit signature or made a lambda, for
|
|
|
|
example, in the `replicate/replicateM` combinator we need the type annotation
|
|
|
|
on `i` to get it specialized:
|
|
|
|
|
|
|
|
```
|
|
|
|
{-# INLINE_LATE step #-}
|
|
|
|
step _ (i :: Int) =
|
|
|
|
if i <= 0
|
|
|
|
then return Stop
|
|
|
|
else do
|
|
|
|
x <- action
|
|
|
|
return $ Yield x (i - 1)
|
|
|
|
```
|
2019-08-11 01:04:08 +03:00
|
|
|
|
|
|
|
`-flate-specialise` also helps in this case.
|
2019-08-25 08:27:19 +03:00
|
|
|
|
|
|
|
## Stream and Fold State Data Structures
|
|
|
|
|
|
|
|
Since state is an internal data structure threaded around in the loop, it is a
|
|
|
|
good practice to use strict unboxed fields for state data structures where
|
|
|
|
possible. In most cases it is not necessary, but in some cases it may affect
|
|
|
|
fusion and make a difference of 10x performance or more. For example, using
|
|
|
|
non-strict fields can increase the code size for internal join points and
|
|
|
|
functions created during transformations, which can affect the inlining of
|
|
|
|
these code blocks which in turn can affect stream fusion.
|
|
|
|
|
|
|
|
See https://gitlab.haskell.org/ghc/ghc/issues/17075 .
|