In profiling output I noticed that `renderBox` crops the child widgets
three times. For example:
```
individual inherited
entries %time %alloc %time %alloc
vLimit 32637 1.4 1.5 10.4 11.5
cropResultToContext 32637 0.4 1.5 1.4 2.6
crop 32637 0.9 1.0 0.9 1.0
setCoordinates 32637 0.0 0.0 0.0 0.0
crop 32637 0.0 0.1 0.2 0.1
cropBottom 32637 0.2 0.0 0.2 0.0
imageHeight 32637 0.0 0.0 0.0 0.0
cropRight 32637 0.0 0.0 0.0 0.0
imageWidth 32637 0.0 0.0 0.0 0.0
availHeight 32626 0.0 0.0 0.0 0.0
hLimit 0 0.9 1.3 7.5 7.5
cropResultToContext 65274 2.0 2.9 5.9 5.2
…
```
Note the calls to `cropResultToContext`. It is called once for each
call to the enclosing `vLimit`, and a further two times in the inner
`hLimit` call. In the implementation, that pattern that causes this is:
```haskell
render $ limitPrimary br remainingPrimary
$ limitSecondary br availSecondary
$ cropToContext widget
```
`limitPrimary` and `limitSecondary` are `vLimit` and `hLimit` or
vice-versa, depending on whether we are rendering a `vBox` or `hBox`.
Both `vLimit` and `hLimit` modify the available space in the render
context and call `cropToContext`, so the explicit `cropToContext` is
redundant. The `limitSecondary` is also redundant, because
`availSecondary` is taken directly from the render context. So, remove
the redundant operations.
As a result of this change, total program allocation, and allocation for
the `vBox` cost centre that is the parent of the cost centres table
shown above, improve from:
```
total time = 0.61 secs (614 ticks @ 1000 us, 1 processor)
total alloc = 856,554,568 bytes (excludes profiling overheads)
individual inherited
entries %time %alloc %time %alloc
vBox 0 5.0 9.0 23.9 25.9
```
to
```
total time = 0.61 secs (606 ticks @ 1000 us, 1 processor)
total alloc = 773,138,088 bytes (excludes profiling overheads)
individual inherited
entries %time %alloc %time %alloc
vBox 0 5.3 8.9 19.0 19.9
```
Exactly the same number of `vLimit` calls occurs; this is the profiling
caused by navigating around a specific, large mail message in Purebred.
Total program CPU time and allocation have non-deterministic factors, so
take them with a grain of salt.
Allocations in `renderBox` are reduced significantly for long lists of
widgets. In my scenario (rendering and navigating around a long email
in Purebred), this changes eliminates ~10% of all program allocation.
Surprisingly, although the cost centres report lower CPU time, the total
program CPU time did not change significantly. I don't have a theory to
explain this.
`renderBox` computes translations by summing the primary dimensions
of widgets preceding the current widget. This has O(n²) complexity
and causes poor performance with long lists of widgets. In one
purebred scenario `renderBox` contributes ~30% of the CPU time of
the program.
Use a State computation to accumulate the total offset during the
traversal. In the scenario described above, this change caused
`renderBox` to drop to ~4% of the program CPU time.
Also remove an occurence of `sum` that calculateed the remaining
primary dimension space after rendering of greedy widgets. We
already calculate this value during the greedy widget render
traversal, so reuse it.
Using the strict state monad to calculate `renderedHis` brings
another small performance increase.
I find the StateT version a little easier to comprehend too,
although it's a subjective matter.
`renderBox` uses dlist to grow the result list while doing a render
traversal. An ordinary fmap'd list construction suffices and offers
a very small but measurable performance boost.
Furthermore this is the only place the *dlist* library was used so
*brick* no longer depends on it.