Previously, when resizing a tab, we'd unzoom it, recompute the resize
deltas and adjust every pane's non-zoomed position and re-zoom the
original pane.
When the alt screen is active, wezterm doesn't reflow resized lines,
and there a number of situations where the only effective change to
the line was updating a seqno; the content of those panes doesn't
actually update until the application(s) attached to the PTY
receive SIGWINCH from the kernel.
Since we were resizing the zoomed pane twice in quick succession
we could double-tap SIGWINCH and the application might coalesce
and process only one of the resize events.
The result of that was that we might see the state from either
the first or second resize event and then not get any other updates
until the application repainted itself.
This commit re-structures the resize behavior around zooms so that
we only resize the zoomed pane. When unzooming we'll fixup the
no-zoomed sizes for the whole tab. That means that we need to
store the pre-zoom size in order to correctly calculate those
sizes for the case where a pane was zoomed, the tab resized, and
then the pane was unzoomed again.
refs: https://github.com/wez/wezterm/issues/3068
Threads through a GuiPosition from mux window creation to allow it to be
used when the corresponding gui window is created.
SpawnCommand now has an optional position field to use for that purpose.
```lua
wezterm.mux.spawn_window {
position = {
x = 10,
y = 300,
-- Optional origin to use for x and y.
-- Possible values:
-- * "ScreenCoordinateSystem" (this is the default)
-- * "MainScreen" (the primary or main screen)
-- * "ActiveScreen" (whichever screen hosts the active/focused window)
-- * {Named="HDMI-1"} - uses a screen by name. See wezterm.gui.screens()
-- origin = "ScreenCoordinateSystem"
},
}
```
refs: https://github.com/wez/wezterm/issues/2976
The tcgetpgrp call appears to have high variance in latency, ranging
from 200-700us on my system.
If you have 10 tabs and mouse over the tab bar, that's around 7ms
spent per frame just figuring out the foreground process; that doesn't
include actually extracting the process executable or current working
directory paths.
This was exacerbated by the mouse move events triggering a tab bar
recompute on every pixel of mouse movement.
This commit takes the following steps to resolve this:
* We now only re-compute the tab bar when the UI item is changed by
a mouse movement
* A simple single-item cache is now used on unix that allows the caller
to proceed quickly with stale-but-probably-still-mostly-accurate data
while queuing up an update to a background thread which can absorb
the latency.
The result of this is that hovering over several tabs in quick
succession no longer takes a noticeable length of time to render the
hover, but the consequence is that the contents of a given tab may be
stale by 300-400ms.
I think that trade-off is worth while.
We already have a similar trade-off on Windows, although we don't
yet do the updates in a different thread on Windows. Perhaps in
a follow up commit?
refs: https://github.com/wez/wezterm/issues/2991
Avoids:
```
warning: the following packages contain code that will be rejected by a future version of Rust: ntapi v0.3.7
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 36`
```
This allows removing a bunch of unwrap/expect calls.
However, my primary motive was to replace the cases where we used
Mux::get() == None to indicate that we were not on the main thread.
A separate API has been added to test for that explicitly rather than
implicitly.
This is a step towards making it Send+Sync.
I'm a little cagey about this in the long term, as there are some mux
operations that may technically require multiple fields to be locked for
their duration: allowing free-threaded access may introduce some subtle
(or not so subtle!) interleaving conditions where the overall mux state
is not yet consistent.
I'm thinking of prune_dead_windows kicking in while the mux is in the
middle of being manipulated.
I did try an initial pass of just moving everything under one lock, but
there is already quite a lot of mixed read/write access to different
aspects of the mux.
We'll see what bubbles up later!
Now that we use Arc<Pane> we can directly pass the pane to the
background thread that we're using to parse the terminal output, cutting
out some context switching and reducing the latency between output and
rendering that output.
I spent a few hours in heap profilers. What I found was:
* Inefficient use of heap when building up runs of
`Action::Print(char)`.
-> Solve by adding `Action::PrintString(String)`
and accumulating utf8 bytes rather than u32 codepoints.
* Inefficient use of heap when building Quad buffers: the default
exponential growth of `Vec` tended to waste 40%-75% of the allocated
capacity, and since we could keep ~1024 of these in cache, there's
a lot of potential for waste.
-> Solve by bounding the growth to 64 at a time. This has similar
characteristics to exponential growth at the default 80x24 terminal
size. May need to add a config option for this step size for users
with very large terminals.
* Lazy eviction from the LFU caches. The underlying cache advisor is
somewhat probabilistic and has a minimum cache size of 256, making
it difficult to maintain low heap utilization.
-> Solve by replacing it with a very simple LFU algorithm. It doesn't
seem to hurt much at the default terminal size with the default
cache sizes. If we make the cache sizes smaller, its overhead is
reduced.
Some further experimentation is needed to adjust defaults, but this
should help reduce heap usage.
refs: https://github.com/wez/wezterm/issues/2626
`iter_panes` returns the renderable set of panes, but most functions
in the mux want to operate on the full set of panes.
Notably, when closing a tab, we were not killing panes other than
the zoomed pane, which caused wezterm to linger in the background.
refs: https://github.com/wez/wezterm/issues/2548
According to its benchmarks, it's almost 2x faster than
unicode_segmentation. It doesn't appear to make a visible
difference to `time cat bigfile`, but I'll take anything
that gives more headroom for such little effort of switching.
There were two problems:
* We weren't correctly invalidating when the hover state changed
(a recent regression caused by recent caching changes)
* We'd underline every link with the same destination on hover,
not just the one under the mouse (longstanding wart)
Recent changes allow the application layer to reference the underlying
Lines directly, so we can restore the original and expected
only-highlight-under-the-mouse by switching to those newer APIs.
Adjust the cache values so that we know to also verify the current
highlight and invalidate.
I was a little surprised to see that this also works with mux client
panes: I was expecting to need to do some follow up on those because
they return copies of Line rather than references to them. That happens
to work because the mux client updates the hyperlinks at the time where
it inserts into its cache. The effect of that is that lines in mux
client panes won't update to new hyperlink rules if they were received
prior to a change in the config.
refs: https://github.com/wez/wezterm/issues/2496
There are caveats to determining this, but when we think
password entry is enabled, switch the cursor to the font-awesome
lock glyph instead of the normal cursor sprite.
fa_lock is used because it is monochrome and can thus be tinted
to the configured cursor color, and it respects blinking/easing.
refs: https://github.com/wez/wezterm/issues/2460
The idea here is that different kinds of panes may want to expose
additional metadata to lua scripts. It would be a bit weird to add
a Pane method for each of those and plumb it all the way through
the various APIs, so just allowing a pane impl to return a dynamic
value (likely an Object) allows a bunch of flexibility.
This commit exposes the clientpane is_tardy boolean and the time
since the last data was recevied (since_last_response_ms) from
the mux client pane implementation: these are used to show the
tardiness indicator in the client pane.
Exposing this data enables the user to add that info to their
status bar if they wish.
The logic in the exit_behavior case was a bit smarter than that
in emit_output_for_pane, so adopt the former in the latter, then
use the latter for the former!
Tidy some things up to avoid some dead code and redundant impls.
Make it easier to select whether you want to implement the new
methods in terms of the old, or the old methods in terms of
the new in a given pane impl.
Adds Pane::for_each_logical_line_in_stable_range_mut and
Pane::with_lines_mut which allow iterating mutably over lines.
The idea is that this will allow the renderer to directly cache
data in the Line via its appdata without having to build cumbersome
external caching logic and managing cache keys.
This commit just swaps the implementation around for localpane
and sanity checks that the renderer functions.
Various overlays and the mux client don't properly implement these
yet and current warn at compile time and panic at runtime.
To follow is the logic to cache the data and make sure that it
works the way that I think before converting the other Pane
implementations.
It didn't really belong there; it was added as a bit of a hack
to propagate screen reverse video mode.
Move that to the RenderableDims struct and remove the related
bits from Line
Introduces a heap-based quad allocator that we cache on a per-line
basis, so if a line is unchanged we simply need to copy the previously
computed set of quads for it into the gpu quad buffer.
The results are encouraging wrt. constructing those quads; the
`quad_buffer_apply` is the cost of the copy operation, compare with
`render_screen_line_opengl` which is the cost of computing the quads;
it's 300x better at the p50 and >100x better at p95 for a full-screen
updating program:
full 2880x1800 screen top:
```
STAT p50 p75 p95
Key(quad_buffer_apply) 2.26µs 5.22µs 9.60µs
Key(render_screen_line_opengl) 610.30µs 905.22µs 1.33ms
Key(gui.paint.opengl) 35.39ms 37.75ms 45.88ms
```
However, the extra buffering does increase the latency of
`gui.paint.opengl` (the overall cost of painting a frame); contrast the
above with the latency in the same scenario with the current `main`
(rather than this branch):
```
Key(gui.paint.opengl) 19.14ms 21.10ms 28.18ms
```
Note that for an idle screen this latency is ~1.5ms but that is also true
of `main`.
While the overall latency in the histogram isn't a slam dunk,
running `time cat bigfile` is ~10% faster on my mac.
I'm sure there's something that can be shaved off to get a more
convincing win.