Do it "more properly": use intrusive list linkage to manage three
indices:
* hash lookup
* recency
* frequency
eviction is frequency based, but in order to avoid things that were
super hot in the past and that are no longer hot clogging up the cache,
eviction will incrementally age out least recently used entries that
haven't been active in the span of some number of get/put operations.
Aging scales down the frequency value to reduce its strength, so an aged
item isn't necessarily immediately a candidate for removal, it just
makes it more likely to be picked up by the frequency based removal as
it goes unused for an extended period.
This sequence forms a grapheme with cell_width=2, but harfbuzz returns
it as two distinct clusters, causing our prior logic to shape them
independently from different fonts, but our logic for assessing width
would resolve them both to the same cell and double-count their width,
leading to issues with the rendered result.
This commit revises our clustering logic to add a pass that compares
the harfbuzz cluster positions with the cell-based positions from
the presentation_width that may have been provided. We use the starting
cell positions from that to order and de-dup so that clusters aren't
split in the wrong place.
refs: https://github.com/wez/wezterm/issues/2572
I was going to upgrade to the unicode 15 font, but in testing this I
decided that the logic is slightly complex and the glyphs are often
difficult to see at most terminal font sizes, which generates questions
from users, so just fall back to notdef.
The clustered line storage impl of get_cell() is O(column-index)
as cells have to be iterated in order.
The render pass was calling it 3 times to resolve information
about the cursor; reduce that to just once.
It was also calling it once per cell to determine whether the
cell needed to be replaced with a custom glyph if custom block
glyphs were enabled, making it accidentally quadratic!
While it wasn't especially expensive, it did show up in profiling,
so this commit removes that call: we can cache the block glyph
key in the shaper info which is a better place for it anyway,
so that's what we do.
Similarly, there was some extraneous work to call get_cell
when computing some shaper info; remove that too!
That one might be slightly contentious: the is-followed-by-space
logic used to check the successor cell by index to see if it
was a space, but now looks at the successor shaped glyph to see
if it was a space. That might actually be a better choice, but
it may have some ripple effects.
According to its benchmarks, it's almost 2x faster than
unicode_segmentation. It doesn't appear to make a visible
difference to `time cat bigfile`, but I'll take anything
that gives more headroom for such little effort of switching.
When falling back to presentation-ambivalent font lookup, restart
our shaper run from font_idx=0 rather than the current font index.
For:
```
wezterm ls-fonts --text "$(printf \\U1faf0)"
```
the current font index was the last resort font so we'd never
search over anything useful in the second pass.
Looking back at 4fc8dfb374, I can't
see a good reason for starting at the later offset.
We were getting data like this back from harfbuzz:
shaped font_idx=0 "파인더에서 만든 폴더" as: [.notdef=0+448|.notdef=0+448|.notdef=6+448|.notde=6+448|.notdef=6+448|.notdef=15+448|.notdef=15+448|.notdef=21+448|.notdef=21+448|.notdef=27+448|.notdef=27+448|space=33+448|.notdef=34+448|.notdef=34+448|.notdef=34+448|.notdef=43+448|.notdef=43+448|.notdef=43+448|space=52+448|.notdef=53+448|.notdef=53+448|.notdef=53+448|.notdef=62+448|.notdef=62+448]
which had a series of discontiguous and repeated runs for the same
cluster/region, but due to a logic error, we weren't coalescing them
together and were passing somewhat nonsensical ranges to the
next font for shaping.
This commit "simply" ensures that we code the checks for codepoint==0
(the `.nodef` seen above) and the results are at least correct
for this particular case.
refs: https://github.com/wez/wezterm/issues/2482
This commit is a result of debugging a problem with a particular font
where a VS2 variation of a glyph produced incorrect shaping.
Further investigation showed that this was specific to using freetype
font functions and that switching to opentype functions produces the
correct shaping information in that case.
Further-further investigation showed that this difference in behavior
was really due to the font file being out of spec and freetype returning
unexpected data as a result.
This commit allows switching to using harfbuzz's own opentype font+face
(rather than freetype) and/or switching the freetype-based font to using
opentype font functions. These two options don't yield equivalent
results in the wezterm integration: a couple of our shaping tests
fail due to the x-advances not being the same. Those can be drastically
different in some cases (eg: I seem to get 0 for certain bitmap strikes,
and for Roboto the value is off by a factor of about 1.5).
This is incomplete and not something I want to turn on by default at
the moment, but I don't want to lose this work, hence this commit.
refs: https://github.com/wez/wezterm/issues/2475
refs: https://github.com/harfbuzz/harfbuzz/issues/3806
harfbuzz can return incomplete overlapping runs when it processes
text in unicode NFD. Add another check for the case where we've
accumulated the bytes in the range 0-12 and then harfbuzz returns
another range of 6-12. We coalesce the two together so that we can
pass the full unresolved sequence to the next fallback pass.
refs: https://github.com/wez/wezterm/issues/2032
Go directly to the underlying env_logger crate, as pretty_env_logger
hasn't been updated in some time, and I'd like to be able to redirect
the log output to a file more directly, and that feature is in a newer
version of the env logger than pretty_env_logger was pulling in.
The gist of the issue is that when setting eg: scale=1.2 to draw
a larger CJK glyph, it is drawn at the same descender level, which
makes it more likely to leave the top of the cell.
This commit adjusts the y position by the difference between the
original and the scaled descender so that is less likely to cause
problems.
refs: https://github.com/wez/wezterm/issues/1803
The broot icon font has glyphs with horizontal advance set to 0. That
would cause us to consider the glyph to be zero width, so handle that as
a special case. Note that it is legit for certain cells to end up with
a zero advance/width during shaping if they represent combining
characters: this is more common in Arabic scripts.
refs: https://github.com/wez/wezterm/issues/1787
This commit allows specifying a scaling factor as part of the font
attribute definition. This scaling factor is fed through to the
rasterizer and the shaper to adjust the actual font size that is
loaded.
The intent is to provide manual control for situations where the
fallback font has a different scale to the primary font and renders
either too small or too large.
The concrete example is
https://github.com/wez/wezterm/issues/1761#issuecomment-1079708207 where
the CJK fallback looks too small.
The scaling factor doesn't influence font metrics so it may also be
desirable to configure line height.
```lua
local wezterm = require 'wezterm'
return {
line_height = 1.2,
font = wezterm.font_with_fallback({
"JetBrains Mono",
{family="Microsoft YaHei", scale=1.5},
}),
}
```
There are three slants that are broadly recognized; normal, italic and
oblique. Prior to this commit, we only considered normal or italic.
This is mostly a mechanical change to replace the boolean with the enum,
and rename the field from `italic` to `slant`.
For the sake of backwards compatibility with existing configs, the lua
helpers for working with fonts continue to accept boolean italic values
and rewrite them to the equivalent slant value.
refs: #1646
For a sequence like `e U+20d7` the intent is to render the `e` with
a vector arrow over the top.
This is typically implemented by fonts as an `e` followed by the
vector glyph (or vice versa), where either one of those may have
a zero advance so that the two elements are combined.
There were two problems here:
* During shaping we'd see the zero advance and assume that the entry
was useless and skip it
* During rendering, if we didn't think it had any cell width, we'd
not render it
Cursoring through that particular sequence can hide the vector
mark if the cursor is set to the default block cursor due to annoyances
in how the block cursor is rendered (it changes the fg color to match
the bg, but for elements outside where we think the cursor is, this
makes those elements invisible).
refs: https://github.com/wez/wezterm/issues/1617
This is a more robust approach; we make a separate pass to figure
out information about the (harfbuzz) cluster for a sequence of glyphs,
and then map that sequence back to the original cell sequence, and
from there compute the total cell width for the run, then distribute
the glyphs across the run.
This should yield more sane results for bidi.
Fixup the x-position math; it was still wonky despite the
efforts in 5f2c905db8 and
af92265ffb
refs: #1570
refs: #1607
refs: #1563
This allows unicode_version to be respected again when rendering.
The updated emoji-presentation.sh script now highlights this slightly
better by putting `.` characters after the emoji; unicode version 14
emoji presentation will show the `.` in the 3rd column, while earlier
versions will show it in the 2nd column for glyphs that are sensitive
to the version.
refs: #1607
refs: #1563
This commit is larger than it appears to due fanout from threading
through bidi parameters. The main changes are:
* When clustering cells, add an additional phase to resolve embedding
levels and further sub-divide a cluster based on the resolved bidi
runs; this is where we get the direction for a run and this needs
to be passed through to the shaper.
* When doing bidi, the forced cluster boundary hack that we use to
de-ligature when cursoring through text needs to be disabled,
otherwise the cursor appears to push/rotate the text in that
cluster when moving through it! We'll need to find a different
way to handle shading the cursor that eliminates the original
cursor/ligature/black issue.
* In the shaper, the logic for coalescing unresolved runs for font
fallback assumed LTR and needed to be adjusted to cluster RTL.
That meant also computing a little index of codepoint lengths.
* Added `experimental_bidi` boolean option that defaults to false.
When enabled, it activates the bidi processing phase in clustering
with a strong hint that the paragraph is LTR.
This implementation is incomplete and/or wrong for a number of cases:
* The config option should probably allow specifying the paragraph
direction hint to use by default.
* https://terminal-wg.pages.freedesktop.org/bidi/recommendation/paragraphs.html
recommends that bidi be applied to logical lines, not physical
lines (or really: ranges within physical lines) that we're doing
at the moment
* The paragraph direction hint should be overridden by cell attributes
and other escapes; see 85a6b178cf
and probably others.
However, as of this commit, if you `experimental_bidi=true` then
```
echo This is RTL -> عربي فارسی bidi
```
(that text was sourced from:
https://github.com/microsoft/terminal/issues/538#issuecomment-677017322)
then wezterm will display the text in the same order as the text
renders in Chrome for that github comment.
```
; ./target/debug/wezterm --config experimental_bidi=false ls-fonts --text "عربي فارسی ->"
LeftToRight
0 ع \u{639} x_adv=8 glyph=300 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
2 ر \u{631} x_adv=3.78125 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
4 ب \u{628} x_adv=4 glyph=244 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
6 ي \u{64a} x_adv=4 glyph=363 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
8 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
9 ف \u{641} x_adv=11 glyph=328 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
11 ا \u{627} x_adv=4 glyph=240 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
13 ر \u{631} x_adv=3.78125 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
15 س \u{633} x_adv=10 glyph=278 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
17 ی \u{6cc} x_adv=4 glyph=664 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
19 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
20 - \u{2d} x_adv=8 glyph=276 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
21 > \u{3e} x_adv=8 glyph=338 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
```
```
; ./target/debug/wezterm --config experimental_bidi=true ls-fonts --text "عربي فارسی ->"
RightToLeft
17 ی \u{6cc} x_adv=9 glyph=906 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
15 س \u{633} x_adv=10 glyph=277 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
13 ر \u{631} x_adv=4.78125 glyph=272 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
11 ا \u{627} x_adv=4 glyph=241 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
9 ف \u{641} x_adv=5 glyph=329 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
8 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
6 ي \u{64a} x_adv=9 glyph=904 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
4 ب \u{628} x_adv=4 glyph=243 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
2 ر \u{631} x_adv=5 glyph=273 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
0 ع \u{639} x_adv=6 glyph=301 wezterm.font(".Geeza Pro Interface", {weight="Regular", stretch="Normal", italic=false})
/System/Library/Fonts/GeezaPro.ttc index=2 variation=0, CoreText
LeftToRight
0 \u{20} x_adv=8 glyph=2 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
1 - \u{2d} x_adv=8 glyph=480 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
2 > \u{3e} x_adv=8 glyph=470 wezterm.font("Operator Mono SSm Lig", {weight="DemiLight", stretch="Normal", italic=false})
/Users/wez/.fonts/OperatorMonoSSmLig-Medium.otf, FontDirs
;
```
refs: https://github.com/wez/wezterm/issues/784
`wezterm.font` and `wezterm.font_with_fallback` can now specify
harfbuzz_features and freetype load/render target and flags as
options on a per-font basis.
This allows you to do things such as adjust shaping (eg: ligatures) or
rendering (eg: disable bitmaps, or adjust hinting) for a single font in
a fallback rather than globally for all fonts.
This is a fairly far-reaching commit. The idea is:
* Introduce a unicode_version config that specifies the default level
of unicode conformance for each newly created Terminal (each Pane)
* The unicode_version is passed down to the `grapheme_column_width`
function which interprets the width based on the version
* `Cell` records the width so that later calculations don't need to
know the unicode version
In a subsequent diff, I will introduce an escape sequence that allows
setting/pushing/popping the unicode version so that it can be overridden
via eg: a shell alias prior to launching an application that uses a
different version of unicode from the default.
This approach allows output from multiple applications with differing
understanding of unicode to coexist on the same screen a little more
sanely.
Note that the default `unicode_version` is set to 9, which means that
emoji presentation selectors are now by-default ignored. This was
selected to better match the level of support in widely deployed
applications.
I expect to raise that default version in the future.
Also worth noting: there are a number of callers of
`unicode_column_width` in things like overlays and lua helper functions
that pass `None` for the unicode version: these will assume the latest
known-to-wezterm/termwiz version of unicode to be desired. If those
overlays do things with emoji presentation selectors, then there may be
some alignment artifacts. That can be tackled in a follow up commit.
refs: #1231
refs: #997
We rely on using freetype in order to support more fonts in more
situations, and we have a deeper existing integration with harfbuzz.
I'm unlikely to come back to allsorts to complete our integration,
and in the meantime, it just adds overhead to build/test and those
builds are taking longer and longer.
I loved the idea of using pure rust for all the font stuff, but
its time is not now.
closes: #587closes: #66
We now compute the cap-height from the rasterized glyph data.
Moved the scaling action of use_cap_height_to_scale_fallback_fonts from
glyphcache into the font resolver: when enabled, and we have data
about the baseline font and the font being resolved, then the resolving
font will be scaled such that the cap-height of both fonts has the same
pixel size.
The effect of this is that `I` glyphs from both fonts should appear to
have the same height.
Added a row of `I`'s in differing styles at the bottom of styles.txt
to make this easier to visualize.
refs: #1189
This is to handle situations such as some versions of the Terminus
bitmap font, where the individual bitmap strike sizes are broken
out across multiple individual files.
Font matching now passes down the nominal pixel height based on
the current DPI and font scale factor, and will use that to select
the font file that has the closest pixel size.
Previously, it would be potentially undefined which of the Terminus
font files would be selected.
refs: https://github.com/wez/wezterm/issues/1189
This was a bit of a PITA to run down; the essence of the problem
was that the shaper was returning an x_advance of 0 for U+3000,
which caused wezterm's shaping layer to elide that glyph.
I eventually tracked down the x_advance to be the result of
scaling by an x_scale of 0, which in turn is the result of
harfbuzz not knowing the font size.
The critical portion of this diff is the line that advises
harfbuzz that the font has changed after we've applied the
font size.
The rest is just stuff to make it easier to debug and verify.
This:
```
printf "x\u3000x."
```
Now correctly renders on screen as "x x".
fixes: #1161
The introduction of the Emoji vs Text VS processing means that we might
in some cases not find a glyph with the requested presentation.
In that case, we'd rather show the emoji presentation glyph than none at
all, so we'll retry fallback processing with unspecified presentation.
refs: #997
The recent presentation logic needs to be tweaked to ensure that
we ignore presentation when we reach the fallback font, otherwise
we'll end up in a bad error stack and crash the program.
This commit annotates fonts with a boolean that indicates whether
we think it contains glyphs with emoji presentation, and then
passes the cluster.presentation field down to the shaper.
If the presentation doesn't match the current font in the fallback,
then it will be skipped until we exhaust its options.
`wezterm ls-fonts` also shows whether we think a font has emoji
presentation.
refs: #997
This commit introduces the knowledge about whether a font is
scalable or was using bitmap strikes (eg: color emoji bitmaps).
Then that information is used to help figure out whether and
how to scale a glyph.
refs: https://github.com/wez/wezterm/issues/685
we now compute the ratio of the cap height (the height of a capital
letter) vs. the em-square (which relates to our chosen point size) to
understand what proportion of the font point-size that a given font
occupies when rendered.
When rendering glyphs from secondary fonts we can use the cap height
ratios of both to scale the secondary font such that its effective
cap height matches that of the primary font.
In plainer-english: if you mix say bold, italic and regular text
style in the same line, and you have different font families for
those fonts, then they will now appear to be the same height where
previously they may have varied more noticeably.
For emoji and symbol fonts there may not be a cap-height metric
encoded in the font. We can however, improve our scaling: prior
to this commit we'd use the ratio of the cell metrics of the two
fonts to scale the icon/emoji glyph, but this could cause the glyph
to be slightly oversized as seen in https://github.com/wez/wezterm/issues/624
If we know the cap-height of the primary font then we can additionaly
apply that factor to scale the emoji to better fit the cell.
While looking at this, I noticed that the aspect ratio calculation
for when to apply to the allow_square_glyphs_to_overflow_width option
had width and height flipped :-(
See also: https://tonsky.me/blog/font-size/
refs: https://github.com/wez/wezterm/issues/624
When we process the system fallback list, we can produce a long list
of fonts to be speculatively processed by the shaper.
Until this commit, the shaper would always keep the associated
freetype face open forever, which increases the number of open
files and the amount of allocated memory.
This commit allows the shaper to release a font if it has never
produced any valid shaper results, which keeps the list down
to just the fonts that are in use.