The tcgetpgrp call appears to have high variance in latency, ranging
from 200-700us on my system.
If you have 10 tabs and mouse over the tab bar, that's around 7ms
spent per frame just figuring out the foreground process; that doesn't
include actually extracting the process executable or current working
directory paths.
This was exacerbated by the mouse move events triggering a tab bar
recompute on every pixel of mouse movement.
This commit takes the following steps to resolve this:
* We now only re-compute the tab bar when the UI item is changed by
a mouse movement
* A simple single-item cache is now used on unix that allows the caller
to proceed quickly with stale-but-probably-still-mostly-accurate data
while queuing up an update to a background thread which can absorb
the latency.
The result of this is that hovering over several tabs in quick
succession no longer takes a noticeable length of time to render the
hover, but the consequence is that the contents of a given tab may be
stale by 300-400ms.
I think that trade-off is worth while.
We already have a similar trade-off on Windows, although we don't
yet do the updates in a different thread on Windows. Perhaps in
a follow up commit?
refs: https://github.com/wez/wezterm/issues/2991
Avoids:
```
warning: the following packages contain code that will be rejected by a future version of Rust: ntapi v0.3.7
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 36`
```
This allows removing a bunch of unwrap/expect calls.
However, my primary motive was to replace the cases where we used
Mux::get() == None to indicate that we were not on the main thread.
A separate API has been added to test for that explicitly rather than
implicitly.
This is a step towards making it Send+Sync.
I'm a little cagey about this in the long term, as there are some mux
operations that may technically require multiple fields to be locked for
their duration: allowing free-threaded access may introduce some subtle
(or not so subtle!) interleaving conditions where the overall mux state
is not yet consistent.
I'm thinking of prune_dead_windows kicking in while the mux is in the
middle of being manipulated.
I did try an initial pass of just moving everything under one lock, but
there is already quite a lot of mixed read/write access to different
aspects of the mux.
We'll see what bubbles up later!
Now that we use Arc<Pane> we can directly pass the pane to the
background thread that we're using to parse the terminal output, cutting
out some context switching and reducing the latency between output and
rendering that output.
I spent a few hours in heap profilers. What I found was:
* Inefficient use of heap when building up runs of
`Action::Print(char)`.
-> Solve by adding `Action::PrintString(String)`
and accumulating utf8 bytes rather than u32 codepoints.
* Inefficient use of heap when building Quad buffers: the default
exponential growth of `Vec` tended to waste 40%-75% of the allocated
capacity, and since we could keep ~1024 of these in cache, there's
a lot of potential for waste.
-> Solve by bounding the growth to 64 at a time. This has similar
characteristics to exponential growth at the default 80x24 terminal
size. May need to add a config option for this step size for users
with very large terminals.
* Lazy eviction from the LFU caches. The underlying cache advisor is
somewhat probabilistic and has a minimum cache size of 256, making
it difficult to maintain low heap utilization.
-> Solve by replacing it with a very simple LFU algorithm. It doesn't
seem to hurt much at the default terminal size with the default
cache sizes. If we make the cache sizes smaller, its overhead is
reduced.
Some further experimentation is needed to adjust defaults, but this
should help reduce heap usage.
refs: https://github.com/wez/wezterm/issues/2626
`iter_panes` returns the renderable set of panes, but most functions
in the mux want to operate on the full set of panes.
Notably, when closing a tab, we were not killing panes other than
the zoomed pane, which caused wezterm to linger in the background.
refs: https://github.com/wez/wezterm/issues/2548
According to its benchmarks, it's almost 2x faster than
unicode_segmentation. It doesn't appear to make a visible
difference to `time cat bigfile`, but I'll take anything
that gives more headroom for such little effort of switching.
There were two problems:
* We weren't correctly invalidating when the hover state changed
(a recent regression caused by recent caching changes)
* We'd underline every link with the same destination on hover,
not just the one under the mouse (longstanding wart)
Recent changes allow the application layer to reference the underlying
Lines directly, so we can restore the original and expected
only-highlight-under-the-mouse by switching to those newer APIs.
Adjust the cache values so that we know to also verify the current
highlight and invalidate.
I was a little surprised to see that this also works with mux client
panes: I was expecting to need to do some follow up on those because
they return copies of Line rather than references to them. That happens
to work because the mux client updates the hyperlinks at the time where
it inserts into its cache. The effect of that is that lines in mux
client panes won't update to new hyperlink rules if they were received
prior to a change in the config.
refs: https://github.com/wez/wezterm/issues/2496
There are caveats to determining this, but when we think
password entry is enabled, switch the cursor to the font-awesome
lock glyph instead of the normal cursor sprite.
fa_lock is used because it is monochrome and can thus be tinted
to the configured cursor color, and it respects blinking/easing.
refs: https://github.com/wez/wezterm/issues/2460
The idea here is that different kinds of panes may want to expose
additional metadata to lua scripts. It would be a bit weird to add
a Pane method for each of those and plumb it all the way through
the various APIs, so just allowing a pane impl to return a dynamic
value (likely an Object) allows a bunch of flexibility.
This commit exposes the clientpane is_tardy boolean and the time
since the last data was recevied (since_last_response_ms) from
the mux client pane implementation: these are used to show the
tardiness indicator in the client pane.
Exposing this data enables the user to add that info to their
status bar if they wish.
The logic in the exit_behavior case was a bit smarter than that
in emit_output_for_pane, so adopt the former in the latter, then
use the latter for the former!
Tidy some things up to avoid some dead code and redundant impls.
Make it easier to select whether you want to implement the new
methods in terms of the old, or the old methods in terms of
the new in a given pane impl.
Adds Pane::for_each_logical_line_in_stable_range_mut and
Pane::with_lines_mut which allow iterating mutably over lines.
The idea is that this will allow the renderer to directly cache
data in the Line via its appdata without having to build cumbersome
external caching logic and managing cache keys.
This commit just swaps the implementation around for localpane
and sanity checks that the renderer functions.
Various overlays and the mux client don't properly implement these
yet and current warn at compile time and panic at runtime.
To follow is the logic to cache the data and make sure that it
works the way that I think before converting the other Pane
implementations.
It didn't really belong there; it was added as a bit of a hack
to propagate screen reverse video mode.
Move that to the RenderableDims struct and remove the related
bits from Line
Introduces a heap-based quad allocator that we cache on a per-line
basis, so if a line is unchanged we simply need to copy the previously
computed set of quads for it into the gpu quad buffer.
The results are encouraging wrt. constructing those quads; the
`quad_buffer_apply` is the cost of the copy operation, compare with
`render_screen_line_opengl` which is the cost of computing the quads;
it's 300x better at the p50 and >100x better at p95 for a full-screen
updating program:
full 2880x1800 screen top:
```
STAT p50 p75 p95
Key(quad_buffer_apply) 2.26µs 5.22µs 9.60µs
Key(render_screen_line_opengl) 610.30µs 905.22µs 1.33ms
Key(gui.paint.opengl) 35.39ms 37.75ms 45.88ms
```
However, the extra buffering does increase the latency of
`gui.paint.opengl` (the overall cost of painting a frame); contrast the
above with the latency in the same scenario with the current `main`
(rather than this branch):
```
Key(gui.paint.opengl) 19.14ms 21.10ms 28.18ms
```
Note that for an idle screen this latency is ~1.5ms but that is also true
of `main`.
While the overall latency in the histogram isn't a slam dunk,
running `time cat bigfile` is ~10% faster on my mac.
I'm sure there's something that can be shaved off to get a more
convincing win.
It's not the first time that I've solved a problem by slowing things
down... in this situation, a couple of very inefficient TUI programs had
flickering outputs in wezterm because they were filling a buffer with a
bunch of spaces to erase a screen before sending the main body of their
updates in a subsequent buffer chunk. wezterm would render the
intervening partially blank frame and appear to flicker.
The resolution is to add a small delay (3ms by default) before sending
data to the terminal model. If the output is readable in that time
we'll accumulate it with the pending set of actions so that the
whole batch can be applied "more atomically".
Take care: `time cat bigfile` is sensitive to this, so we want to
keep the latency as small as possible, and we also want to avoid
accumulating actions and only flushing them at the end of the file.
We use the existing buffer size (~1MB) as a threshold: we bump
a count of the number of input bytes that resulted in the current
set of actions, and if that exceeds that buffer size we flush it.
refs: https://github.com/wez/wezterm/issues/2443
d2892c6 switched to using recency only, but neglected to verify that
the edges of the candidate panes were actually touching, leading to
some weird results.
This commit uses recency only when the edges intersect, otherwise,
scores 0 for the candidate.
refs: #2374
This breaking API change allows us to explicitly generate EOF when the
taken writer is dropped.
The examples have been updated to show how to manage read, write
and waiting without deadlock for both linux and windows.
Need to confirm that this is still good on macOS, but my
confidence is high.
I've also removed ssh2 support from this crate as part of this
change. We haven't used it directly in wezterm in a long while
and removing it from here means that there is slightly less code
to keep compiling over and over.
refs: https://github.com/wez/wezterm/discussions/2392
refs: https://github.com/wez/wezterm/issues/1396