From 386aedadc09086e9b6f45e9a4b95999e6bc4a6ed Mon Sep 17 00:00:00 2001 From: Fraser Tweedale Date: Fri, 14 Oct 2022 17:27:57 +1000 Subject: [PATCH] renderBox: avoid redundant cropResultToContext calls MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In profiling output I noticed that `renderBox` crops the child widgets three times. For example: ``` individual inherited entries %time %alloc %time %alloc vLimit 32637 1.4 1.5 10.4 11.5 cropResultToContext 32637 0.4 1.5 1.4 2.6 crop 32637 0.9 1.0 0.9 1.0 setCoordinates 32637 0.0 0.0 0.0 0.0 crop 32637 0.0 0.1 0.2 0.1 cropBottom 32637 0.2 0.0 0.2 0.0 imageHeight 32637 0.0 0.0 0.0 0.0 cropRight 32637 0.0 0.0 0.0 0.0 imageWidth 32637 0.0 0.0 0.0 0.0 availHeight 32626 0.0 0.0 0.0 0.0 hLimit 0 0.9 1.3 7.5 7.5 cropResultToContext 65274 2.0 2.9 5.9 5.2 … ``` Note the calls to `cropResultToContext`. It is called once for each call to the enclosing `vLimit`, and a further two times in the inner `hLimit` call. In the implementation, that pattern that causes this is: ```haskell render $ limitPrimary br remainingPrimary $ limitSecondary br availSecondary $ cropToContext widget ``` `limitPrimary` and `limitSecondary` are `vLimit` and `hLimit` or vice-versa, depending on whether we are rendering a `vBox` or `hBox`. Both `vLimit` and `hLimit` modify the available space in the render context and call `cropToContext`, so the explicit `cropToContext` is redundant. The `limitSecondary` is also redundant, because `availSecondary` is taken directly from the render context. So, remove the redundant operations. As a result of this change, total program allocation, and allocation for the `vBox` cost centre that is the parent of the cost centres table shown above, improve from: ``` total time = 0.61 secs (614 ticks @ 1000 us, 1 processor) total alloc = 856,554,568 bytes (excludes profiling overheads) individual inherited entries %time %alloc %time %alloc vBox 0 5.0 9.0 23.9 25.9 ``` to ``` total time = 0.61 secs (606 ticks @ 1000 us, 1 processor) total alloc = 773,138,088 bytes (excludes profiling overheads) individual inherited entries %time %alloc %time %alloc vBox 0 5.3 8.9 19.0 19.9 ``` Exactly the same number of `vLimit` calls occurs; this is the profiling caused by navigating around a specific, large mail message in Purebred. Total program CPU time and allocation have non-deterministic factors, so take them with a grain of salt. Allocations in `renderBox` are reduced significantly for long lists of widgets. In my scenario (rendering and navigating around a long email in Purebred), this changes eliminates ~10% of all program allocation. Surprisingly, although the cost centres report lower CPU time, the total program CPU time did not change significantly. I don't have a theory to explain this. --- src/Brick/Widgets/Core.hs | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/src/Brick/Widgets/Core.hs b/src/Brick/Widgets/Core.hs index 433434b..955cb63 100644 --- a/src/Brick/Widgets/Core.hs +++ b/src/Brick/Widgets/Core.hs @@ -639,18 +639,13 @@ renderBox br ws = (his, lows) = partition (\p -> (primaryWidgetSize br $ snd p) == Fixed) pairsIndexed - let availPrimary = c^.(contextPrimary br) - availSecondary = c^.(contextSecondary br) - renderHi prim = do remainingPrimary <- get - result <- lift $ render $ limitPrimary br remainingPrimary - $ limitSecondary br availSecondary - $ cropToContext prim + result <- lift $ render $ limitPrimary br remainingPrimary prim result <$ (put $! remainingPrimary - (result^.imageL.(to $ imagePrimary br))) (renderedHis, remainingPrimary) <- - runStateT (traverse (traverse renderHi) his) availPrimary + runStateT (traverse (traverse renderHi) his) (c ^. contextPrimary br) renderedLows <- case lows of [] -> return [] @@ -660,10 +655,7 @@ renderBox br ws = primaries = replicate rest (primaryPerLow + 1) <> replicate (length ls - rest) primaryPerLow - let renderLow ((i, prim), pri) = - (i,) <$> (render $ limitPrimary br pri - $ limitSecondary br availSecondary - $ cropToContext prim) + let renderLow ((i, prim), pri) = (i,) <$> render (limitPrimary br pri prim) if remainingPrimary > 0 then mapM renderLow (zip ls primaries) else return []