1
1
mirror of https://github.com/wez/wezterm.git synced 2024-09-19 18:57:59 +03:00
wezterm/mux
Wez Furlong 00ddfbf9b8 perf: cache quads by line
Introduces a heap-based quad allocator that we cache on a per-line
basis, so if a line is unchanged we simply need to copy the previously
computed set of quads for it into the gpu quad buffer.

The results are encouraging wrt. constructing those quads; the
`quad_buffer_apply` is the cost of the copy operation, compare with
`render_screen_line_opengl` which is the cost of computing the quads;
it's 300x better at the p50 and >100x better at p95 for a full-screen
updating program:

full 2880x1800 screen top:

```
STAT                                             p50      p75      p95
Key(quad_buffer_apply)                           2.26µs   5.22µs   9.60µs
Key(render_screen_line_opengl)                   610.30µs 905.22µs 1.33ms
Key(gui.paint.opengl)                            35.39ms  37.75ms  45.88ms
```

However, the extra buffering does increase the latency of
`gui.paint.opengl` (the overall cost of painting a frame); contrast the
above with the latency in the same scenario with the current `main`
(rather than this branch):

```
Key(gui.paint.opengl)                            19.14ms  21.10ms  28.18ms
```

Note that for an idle screen this latency is ~1.5ms but that is also true
of `main`.

While the overall latency in the histogram isn't a slam dunk,
running `time cat bigfile` is ~10% faster on my mac.

I'm sure there's something that can be shaved off to get a more
convincing win.
2022-08-23 06:37:12 -07:00
..
src perf: cache quads by line 2022-08-23 06:37:12 -07:00
Cargo.toml cargo update 2022-08-21 08:51:16 -07:00