Describe issues with monad transformers and mtl

This commit is contained in:
Andrzej Rybczak 2022-04-08 21:20:44 +02:00
parent ba4798f1eb
commit 765e455589

470
transformers.md Normal file
View File

@ -0,0 +1,470 @@
# Issues with the `transformers`/`mtl` library
TL;DR:
From monad transformers used the majority of time:
- `ExceptT` can't produce stack traces and its errors are not runtime
exceptions, which can easily introduce subtle bugs leading to resource
exhaustion.
- `StateT` discards state updates when interacting with runtime exceptions and
`ExceptT`-specific errors in very surprising ways.
- `WriterT` has too many variants, choosing a right variant for your use case is
extremely tricky unless you're an expert in the language and it overall has
very niche applications.
- `RWST` inherits issues of both `StateT` and `WriterT`, which makes it resemble
a spike trap.
That leaves `ReaderT`, which is the only one with predictable behavior.
## ExceptT
Errors returned by `ExceptT` lack a very important feature: the ability to
obtain associated stack traces. It is simply impossible to get them with errors
produced by `ExceptT` in an automatic manner, which combined with wonky
interactions with various libraries (as demonstrated below) makes its usability
extremely limited.
### Interaction with the `exceptions` library
Consider the following:
```haskell
{-# LANGUAGE TypeApplications #-}
import Control.Monad.Catch
import Control.Monad.Except
import Control.Monad.IO.Class
data Resource = Resource
acquireResource :: MonadIO m => m Resource
acquireResource = do
liftIO $ putStrLn "acquireResource"
pure Resource
releaseResourceOnSuccess :: MonadIO m => Resource -> m ()
releaseResourceOnSuccess _ = liftIO $ putStrLn "releaseResourceOnSuccess"
releaseResourceOnFailure :: MonadIO m => Resource -> m ()
releaseResourceOnFailure _ = liftIO $ putStrLn "releaseResourceOnFailure"
withResource :: (MonadMask m, MonadIO m) => (Resource -> m a) -> m a
withResource action = mask $ \unmask -> do
r <- acquireResource
a <- unmask (action r) `onException` do
releaseResourceOnFailure r
releaseResourceOnSuccess r
pure a
main :: IO ()
main = do
putStrLn "1. IO - no exception"
test . withResource $ \Resource -> pure ()
putStrLn "2. IO - exception"
test . withResource $ \Resource -> error "oops"
putStrLn "3. ExceptT IO - no exception"
test . runExceptT @String . withResource $ \Resource -> pure ()
putStrLn "4. ExceptT IO - exception"
test . runExceptT @String . withResource $ \Resource -> error "oops"
putStrLn "4. ExceptT IO - error"
test . runExceptT @String . withResource $ \Resource -> throwError "oops"
where
test :: IO a -> IO ()
test = void . try @_ @SomeException
```
Does `withResource` correctly handle resource management in all cases?
**No**.
Here's the output:
```
1. IO - no exception
acquireResource
releaseResourceOnSuccess
2. IO - exception
acquireResource
releaseResourceOnFailure
3. ExceptT IO - no exception
acquireResource
releaseResourceOnSuccess
4. ExceptT IO - exception
acquireResource
releaseResourceOnFailure
4. ExceptT IO - error
acquireResource
```
Note that the resource is never released when an `ExceptT`-specific error is
raised.
The issue here is the use of `onException` as it doesn't capture transformer
specific errors **because they are not exceptions**.
Granted, this is more of a problem with the API of the `exceptions` library. It
mentions this caveat in the documentation and provides a function `onError` that
should be used instead, but overall the library makes it far too easy to write
code that looks correct, but is subtly broken and will sneakily leak resources
until they are exhausted and your application grinds to a halt.
## StateT
### Interaction with the `exceptions` library
What is the output of the following program?
```haskell
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception (ErrorCall)
import Control.Monad.Catch
import Control.Monad.Trans.State
main :: IO ()
main = do
s <- (`execStateT` (0::Int)) $ do
(modify (+1) >> error "oops") `catch` \(e::ErrorCall) -> modify (+2)
putStrLn $ show s
```
It would be reasonable to expect `3`, but that's not the case.
```
$ ./test
2
```
The problem is that state updates tracked by `StateT` within a computation
wrapped in `catch` are discarded when an exception is raised. This is confusing
and will lead to bugs if one doesn't know about this subtle behavior.
The same thing happens when `lifted-base` (backed by `monad-control`) is used:
```haskell
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception.Lifted
import Control.Monad.Trans.State
main :: IO ()
main = do
s <- (`execStateT` (0::Int)) $ do
(modify (+1) >> error "oops") `catch` \(e::ErrorCall) -> modify (+2)
putStrLn $ show s
```
```
$ ./test
2
```
### Interaction with `EitherT`
The initial state is `0`. What is the value of the state after `test` runs?
```haskell
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Except
import Control.Monad.State
test :: (MonadError String m, MonadState Int m) => m ()
test = (modify (+1) >> throwError "oops") `catchError` \_ -> modify (+2)
```
After previous section you will most likely be cautious about the
answer. However, neither `2` nor `3` is correct!
It depends on the order of the transformer stack:
```haskell
main :: IO ()
main = do
putStrLn $ "1. StateT Int (ExceptT IO)"
putStrLn . show =<< (runExceptT . (`runStateT` (0::Int)) $ test)
putStrLn $ "2. ExceptT (StateT Int IO)"
putStrLn . show =<< ((`runStateT` (0::Int)) . runExceptT $ test)
```
```
$ ./test
1. StateT Int (ExceptT IO)
Right ((),2)
2. ExceptT (StateT Int IO)
(Right (),3)
```
This is even worse than the previous section, because:
- You can't predict the behavior of code based on the definition of `test`
alone.
- Seemingly unrelated code change, i.e. rearranging the order of monad
transformers in the stack will lead to subtle change of behavior in a
completely different part of the application.
## WriterT
### Control.Monad.Trans.Writer.Lazy
The excerpt of its definition:
```haskell
newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }
writer :: Monad m => (a, w) -> WriterT w m a
writer = WriterT . return
instance (Monoid w, Monad m) => Monad (WriterT w m) where
m >>= k = WriterT $ do
~(a, w) <- runWriterT m
~(b, w') <- runWriterT (k a)
pure (b, w `mappend` w')
```
Usage of lazy `WriterT w m` makes sense only when:
1. The bind of `m` is lazy (e.g. `Identity` qualifies, `IO` doesn't) as then the
bind of `WriterT w m` is also lazy.
2. `w` can be produced and consumed lazily (e.g. `[a]` qualifies, `Sum Int`
doesn't).
If both conditions are met, it'll run in constant space:
```haskell
import Control.Monad.Trans.Writer.Lazy
import Data.Foldable
main :: IO ()
main = do
let xs = execWriter $ forM_ [1..1000000::Int] $ \n -> tell [n]
putStrLn . show $ sum xs
```
```
$ ./test +RTS -s
500000500000
456,052,432 bytes allocated in the heap
1,792 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
```
If the first one is not met, it'll leak space:
```haskell
import Control.Monad.Trans.Writer.Lazy
import Data.Foldable
main :: IO ()
main = execWriterT $ forM_ [1..1000000::Int] $ \_ -> pure ()
```
```
$ ./test +RTS -s
520,687,800 bytes allocated in the heap
2,640 bytes copied during GC
87,127,800 bytes maximum residency (8 sample(s))
1,706,248 bytes maximum slop
235 MiB total memory in use (0 MB lost due to fragmentation)
```
If the second one is not met, it'll leak space:
```haskell
import Control.Monad.Trans.Writer.Lazy
import Data.Foldable
import Data.Monoid
main :: IO ()
main = do
let Sum xs = execWriter $ forM_ [1..1000000::Int] $ \n -> tell $ Sum n
putStrLn $ show xs
```
```
$ ./test +RTS -s
500000500000
473,892,576 bytes allocated in the heap
2,360 bytes copied during GC
51,145,184 bytes maximum residency (6 sample(s))
29,400 bytes maximum slop
100 MiB total memory in use (0 MB lost due to fragmentation)
```
### Control.Monad.Trans.Writer.Strict
The excerpt of its definition:
```haskell
newtype WriterT w m a = WriterT { runWriterT :: m (a, w) }
writer :: (Monad m) => (a, w) -> WriterT w m a
writer = WriterT . return
instance (Monoid w, Monad m) => Monad (WriterT w m) where
m >>= k = WriterT $ do
(a, w) <- runWriterT m
(b, w') <- runWriterT (k a)
pure (b, w `mappend` w')
```
Usage of the strict `WriterT` **never makes sense**. It always leaks space,
because:
1. Its bind is not tail recursive and strict pattern matches force computation
of `k` even if the bind of `m` is lazy.
2. `w ``mappend`` w'` is never evaluated, which results in accumulation of
thunks.
All three tests leak space:
```haskell
import Control.Monad.Trans.Writer.Strict
import Data.Foldable
main :: IO ()
main = do
let xs = execWriter $ forM_ [1..1000000::Int] $ \n -> tell [n]
putStrLn . show $ sum xs
```
```
$ ./test +RTS -s
500000500000
433,958,096 bytes allocated in the heap
1,808 bytes copied during GC
104,548,664 bytes maximum residency (6 sample(s))
247,496 bytes maximum slop
187 MiB total memory in use (0 MB lost due to fragmentation)
```
```haskell
import Control.Monad.Trans.Writer.Strict
import Data.Foldable
main :: IO ()
main = execWriterT $ forM_ [1..1000000::Int] $ \_ -> pure ()
```
```
$ ./test +RTS -s
472,687,776 bytes allocated in the heap
2,448 bytes copied during GC
57,333,760 bytes maximum residency (7 sample(s))
83,968 bytes maximum slop
116 MiB total memory in use (0 MB lost due to fragmentation)
```
```haskell
import Control.Monad.Trans.Writer.Strict
import Data.Foldable
import Data.Monoid
main :: IO ()
main = do
let Sum xs = execWriter $ forM_ [1..1000000::Int] $ \n -> tell $ Sum n
putStrLn $ show xs
```
```
$ ./test +RTS -s
500000500000
435,217,632 bytes allocated in the heap
9,720 bytes copied during GC
79,255,400 bytes maximum residency (7 sample(s))
346,264 bytes maximum slop
195 MiB total memory in use (0 MB lost due to fragmentation)
```
### Control.Monad.Trans.Writer.CPS
What the strict `WriterT` should have been.
Here is the excerpt of its definition:
```haskell
newtype WriterT w m a = WriterT { unWriterT :: w -> m (a, w) }
writer :: (Monoid w, Monad m) => (a, w) -> WriterT w m a
writer (a, w') = WriterT $ \ w ->
let wt = w `mappend` w' in wt `seq` return (a, wt)
instance Monad m => Monad (WriterT w m) where
m >>= k = WriterT $ \ w -> do
(a, w') <- unWriterT m w
unWriterT (k a) w'
```
It's essentially a `StateT` with a restricted API.
1. Its bind is tail recursive and strict, so will run in constant space.
2. `w ``mappend`` w'` is continuously evaluated, so thunks will not be
accumulated.
The downside of (2) is that time complexity of the first test degrades to
`O(n^2)` because each `tell` has to append to the end of evaluated list.
However, the rest run in constant space:
```haskell
import Control.Monad.Trans.Writer.CPS
import Data.Foldable
main :: IO ()
main = execWriterT $ forM_ [1..1000000::Int] $ \_ -> pure ()
```
```
$ ./test +RTS -s
352,041,696 bytes allocated in the heap
1,360 bytes copied during GC
35,984 bytes maximum residency (2 sample(s))
29,552 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
```
```haskell
import Control.Monad.Trans.Writer.CPS
import Data.Foldable
import Data.Monoid
main :: IO ()
main = do
let Sum xs = execWriter $ forM_ [1..1000000::Int] $ \n -> tell $ Sum n
putStrLn $ show xs
```
```
$ ./test +RTS -s
500000500000
376,052,496 bytes allocated in the heap
1,464 bytes copied during GC
44,328 bytes maximum residency (2 sample(s))
29,400 bytes maximum slop
5 MiB total memory in use (0 MB lost due to fragmentation)
```
### Summary
1. Lazy `WriterT` makes sense in niche scenarios like lazy production and
consumption of `w` (arguably in such case it's better to use a dedicated
streaming library instead of relying on laziness, which is quite fragile).
2. Strict `WriterT` never makes sense.
3. CPS `WriterT` makes sense if the left-associated chain of `mappendS` is
efficient for the `w` of your choice (in particular it's not for `[a]`, which
tends to be often used with `WriterT` by inexperienced users).
In conclusion, `WriterT` flavors range from "useless" to "full of traps", so
they are best avoided.
## RWST
Inherits all issues of `StateT` and `WriterT`, best avoided.