wezterm

mirror of https://github.com/wez/wezterm.git synced 2024-09-19 18:57:59 +03:00

History

Wez Furlong 00ddfbf9b8 perf: cache quads by line Introduces a heap-based quad allocator that we cache on a per-line basis, so if a line is unchanged we simply need to copy the previously computed set of quads for it into the gpu quad buffer. The results are encouraging wrt. constructing those quads; the `quad_buffer_apply` is the cost of the copy operation, compare with `render_screen_line_opengl` which is the cost of computing the quads; it's 300x better at the p50 and >100x better at p95 for a full-screen updating program: full 2880x1800 screen top: ``` STAT p50 p75 p95 Key(quad_buffer_apply) 2.26µs 5.22µs 9.60µs Key(render_screen_line_opengl) 610.30µs 905.22µs 1.33ms Key(gui.paint.opengl) 35.39ms 37.75ms 45.88ms ``` However, the extra buffering does increase the latency of `gui.paint.opengl` (the overall cost of painting a frame); contrast the above with the latency in the same scenario with the current `main` (rather than this branch): ``` Key(gui.paint.opengl) 19.14ms 21.10ms 28.18ms ``` Note that for an idle screen this latency is ~1.5ms but that is also true of `main`. While the overall latency in the histogram isn't a slam dunk, running `time cat bigfile` is ~10% faster on my mac. I'm sure there's something that can be shaved off to get a more convincing win.	2022-08-23 06:37:12 -07:00
..
src	perf: cache quads by line	2022-08-23 06:37:12 -07:00
Cargo.toml	cargo update	2022-08-21 08:51:16 -07:00

Wez Furlong 00ddfbf9b8 perf: cache quads by line

Introduces a heap-based quad allocator that we cache on a per-line
basis, so if a line is unchanged we simply need to copy the previously
computed set of quads for it into the gpu quad buffer.

The results are encouraging wrt. constructing those quads; the
`quad_buffer_apply` is the cost of the copy operation, compare with
`render_screen_line_opengl` which is the cost of computing the quads;
it's 300x better at the p50 and >100x better at p95 for a full-screen
updating program:

full 2880x1800 screen top:

```
STAT                                             p50      p75      p95
Key(quad_buffer_apply)                           2.26µs   5.22µs   9.60µs
Key(render_screen_line_opengl)                   610.30µs 905.22µs 1.33ms
Key(gui.paint.opengl)                            35.39ms  37.75ms  45.88ms
```

However, the extra buffering does increase the latency of
`gui.paint.opengl` (the overall cost of painting a frame); contrast the
above with the latency in the same scenario with the current `main`
(rather than this branch):

```
Key(gui.paint.opengl)                            19.14ms  21.10ms  28.18ms
```

Note that for an idle screen this latency is ~1.5ms but that is also true
of `main`.

While the overall latency in the histogram isn't a slam dunk,
running `time cat bigfile` is ~10% faster on my mac.

I'm sure there's something that can be shaved off to get a more
convincing win.

2022-08-23 06:37:12 -07:00

src

perf: cache quads by line

2022-08-23 06:37:12 -07:00

Cargo.toml

cargo update

2022-08-21 08:51:16 -07:00