renderBox: avoid redundant cropResultToContext calls

In profiling output I noticed that `renderBox` crops the child widgets
three times.  For example:

```
                                       individual      inherited
                            entries  %time %alloc   %time %alloc

vLimit                        32637    1.4    1.5    10.4   11.5
 cropResultToContext          32637    0.4    1.5     1.4    2.6
  crop                        32637    0.9    1.0     0.9    1.0
   setCoordinates             32637    0.0    0.0     0.0    0.0
  crop                        32637    0.0    0.1     0.2    0.1
   cropBottom                 32637    0.2    0.0     0.2    0.0
    imageHeight               32637    0.0    0.0     0.0    0.0
   cropRight                  32637    0.0    0.0     0.0    0.0
    imageWidth                32637    0.0    0.0     0.0    0.0
  availHeight                 32626    0.0    0.0     0.0    0.0
 hLimit                           0    0.9    1.3     7.5    7.5
  cropResultToContext         65274    2.0    2.9     5.9    5.2
 …
```

Note the calls to `cropResultToContext`.  It is called once for each
call to the enclosing `vLimit`, and a further two times in the inner
`hLimit` call.  In the implementation, that pattern that causes this is:

```haskell
render $ limitPrimary br remainingPrimary
       $ limitSecondary br availSecondary
       $ cropToContext widget
```

`limitPrimary` and `limitSecondary` are `vLimit` and `hLimit` or
vice-versa, depending on whether we are rendering a `vBox` or `hBox`.
Both `vLimit` and `hLimit` modify the available space in the render
context and call `cropToContext`, so the explicit `cropToContext` is
redundant.  The `limitSecondary` is also redundant, because
`availSecondary` is taken directly from the render context.  So, remove
the redundant operations.

As a result of this change, total program allocation, and allocation for
the `vBox` cost centre that is the parent of the cost centres table
shown above, improve from:

```
total time  =        0.61 secs   (614 ticks @ 1000 us, 1 processor)
total alloc = 856,554,568 bytes  (excludes profiling overheads)

                                 individual      inherited
                      entries  %time %alloc   %time %alloc

vBox                        0    5.0    9.0    23.9   25.9
```

to

```
total time  =        0.61 secs   (606 ticks @ 1000 us, 1 processor)
total alloc = 773,138,088 bytes  (excludes profiling overheads)

                                 individual      inherited
                      entries  %time %alloc   %time %alloc

vBox                        0    5.3    8.9    19.0   19.9
```

Exactly the same number of `vLimit` calls occurs; this is the profiling
caused by navigating around a specific, large mail message in Purebred.
Total program CPU time and allocation have non-deterministic factors, so
take them with a grain of salt.

Allocations in `renderBox` are reduced significantly for long lists of
widgets.  In my scenario (rendering and navigating around a long email
in Purebred), this changes eliminates ~10% of all program allocation.

Surprisingly, although the cost centres report lower CPU time, the total
program CPU time did not change significantly.  I don't have a theory to
explain this.
This commit is contained in:
Fraser Tweedale 2022-10-14 17:27:57 +10:00
parent 99d41c28fd
commit 386aedadc0

View File

@ -639,18 +639,13 @@ renderBox br ws =
(his, lows) = partition (\p -> (primaryWidgetSize br $ snd p) == Fixed)
pairsIndexed
let availPrimary = c^.(contextPrimary br)
availSecondary = c^.(contextSecondary br)
renderHi prim = do
remainingPrimary <- get
result <- lift $ render $ limitPrimary br remainingPrimary
$ limitSecondary br availSecondary
$ cropToContext prim
result <- lift $ render $ limitPrimary br remainingPrimary prim
result <$ (put $! remainingPrimary - (result^.imageL.(to $ imagePrimary br)))
(renderedHis, remainingPrimary) <-
runStateT (traverse (traverse renderHi) his) availPrimary
runStateT (traverse (traverse renderHi) his) (c ^. contextPrimary br)
renderedLows <- case lows of
[] -> return []
@ -660,10 +655,7 @@ renderBox br ws =
primaries = replicate rest (primaryPerLow + 1) <>
replicate (length ls - rest) primaryPerLow
let renderLow ((i, prim), pri) =
(i,) <$> (render $ limitPrimary br pri
$ limitSecondary br availSecondary
$ cropToContext prim)
let renderLow ((i, prim), pri) = (i,) <$> render (limitPrimary br pri prim)
if remainingPrimary > 0 then mapM renderLow (zip ls primaries) else return []