1
1
mirror of https://github.com/wez/wezterm.git synced 2024-11-29 21:44:24 +03:00
wezterm/termwiz/benches/wcwidth.rs

102 lines
3.0 KiB
Rust
Raw Normal View History

use criterion::{black_box, criterion_group, criterion_main, Criterion};
termwiz: micro-optimize grapheme_column_width While profiling `time cat bigfile` I noted that a big chunk of the time is spent computing widths, so I wanted to dig into a bit. After playing around with a few options, I settled on the approach in this commit. The key observations: * WcWidth::from_char performs a series of binary searches. The fast path was for ASCII, but anything outside that range suffered in terms of latency. * Binary search does a lot more work than a simple table lookup, so it is desirable to use a lookup, and moreso to combine the different tables into a single table so that classification is an O(1) super fast lookup in the most common cases. Here's some benchmarking results comparing the prior implementation (grapheme_column_width) against this new pre-computed table implementation (grapheme_column_width_tbl). The ASCII case is more than 5x faster than before at a reasonably snappy ~3.5ns, with the more complex cases being closer to a constant ~20ns down from 120ns in some cases. There are changes here to widechar_width.rs that should get upstreamed. ``` column_width ASCII/grapheme_column_width time: [23.413 ns 23.432 ns 23.451 ns] column_width ASCII/grapheme_column_width_tbl time: [3.4066 ns 3.4092 ns 3.4121 ns] column_width variation selector/grapheme_column_width time: [119.99 ns 120.13 ns 120.28 ns] column_width variation selector/grapheme_column_width_tbl time: [21.185 ns 21.253 ns 21.346 ns] column_width variation selector unicode 14/grapheme_column_width time: [119.44 ns 119.56 ns 119.69 ns] column_width variation selector unicode 14/grapheme_column_width_tbl time: [21.214 ns 21.236 ns 21.264 ns] column_width WidenedIn9/grapheme_column_width time: [99.652 ns 99.905 ns 100.18 ns] column_width WidenedIn9/grapheme_column_width_tbl time: [21.394 ns 21.419 ns 21.446 ns] column_width Unassigned/grapheme_column_width time: [82.767 ns 82.843 ns 82.926 ns] column_width Unassigned/grapheme_column_width_tbl time: [24.230 ns 24.319 ns 24.428 ns] ``` Here's the benchmark summary after cleaning this diff up ready to commit; it shows ~70-80% improvement in these cases: ``` ; cargo criterion -- column_width column_width ASCII/grapheme_column_width time: [3.4237 ns 3.4347 ns 3.4463 ns] change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector/grapheme_column_width time: [20.918 ns 20.935 ns 20.957 ns] change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector unicode 14/grapheme_column_width time: [21.190 ns 21.210 ns 21.233 ns] change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05) Performance has improved. column_width WidenedIn9/grapheme_column_width time: [21.603 ns 21.630 ns 21.662 ns] change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05) Performance has improved. column_width Unassigned/grapheme_column_width time: [23.283 ns 23.355 ns 23.435 ns] change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05) Performance has improved. ```
2022-04-30 07:10:58 +03:00
use termwiz::cell::{grapheme_column_width, UnicodeVersion};
include!("../src/widechar_width.rs");
pub fn criterion_benchmark(c: &mut Criterion) {
termwiz: micro-optimize grapheme_column_width While profiling `time cat bigfile` I noted that a big chunk of the time is spent computing widths, so I wanted to dig into a bit. After playing around with a few options, I settled on the approach in this commit. The key observations: * WcWidth::from_char performs a series of binary searches. The fast path was for ASCII, but anything outside that range suffered in terms of latency. * Binary search does a lot more work than a simple table lookup, so it is desirable to use a lookup, and moreso to combine the different tables into a single table so that classification is an O(1) super fast lookup in the most common cases. Here's some benchmarking results comparing the prior implementation (grapheme_column_width) against this new pre-computed table implementation (grapheme_column_width_tbl). The ASCII case is more than 5x faster than before at a reasonably snappy ~3.5ns, with the more complex cases being closer to a constant ~20ns down from 120ns in some cases. There are changes here to widechar_width.rs that should get upstreamed. ``` column_width ASCII/grapheme_column_width time: [23.413 ns 23.432 ns 23.451 ns] column_width ASCII/grapheme_column_width_tbl time: [3.4066 ns 3.4092 ns 3.4121 ns] column_width variation selector/grapheme_column_width time: [119.99 ns 120.13 ns 120.28 ns] column_width variation selector/grapheme_column_width_tbl time: [21.185 ns 21.253 ns 21.346 ns] column_width variation selector unicode 14/grapheme_column_width time: [119.44 ns 119.56 ns 119.69 ns] column_width variation selector unicode 14/grapheme_column_width_tbl time: [21.214 ns 21.236 ns 21.264 ns] column_width WidenedIn9/grapheme_column_width time: [99.652 ns 99.905 ns 100.18 ns] column_width WidenedIn9/grapheme_column_width_tbl time: [21.394 ns 21.419 ns 21.446 ns] column_width Unassigned/grapheme_column_width time: [82.767 ns 82.843 ns 82.926 ns] column_width Unassigned/grapheme_column_width_tbl time: [24.230 ns 24.319 ns 24.428 ns] ``` Here's the benchmark summary after cleaning this diff up ready to commit; it shows ~70-80% improvement in these cases: ``` ; cargo criterion -- column_width column_width ASCII/grapheme_column_width time: [3.4237 ns 3.4347 ns 3.4463 ns] change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector/grapheme_column_width time: [20.918 ns 20.935 ns 20.957 ns] change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector unicode 14/grapheme_column_width time: [21.190 ns 21.210 ns 21.233 ns] change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05) Performance has improved. column_width WidenedIn9/grapheme_column_width time: [21.603 ns 21.630 ns 21.662 ns] change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05) Performance has improved. column_width Unassigned/grapheme_column_width time: [23.283 ns 23.355 ns 23.435 ns] change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05) Performance has improved. ```
2022-04-30 07:10:58 +03:00
let table = WcLookupTable::new();
{
let mut group = c.benchmark_group("Classify ASCII");
group.bench_function("WcWidth", |b| b.iter(|| WcWidth::from_char(black_box('a'))));
group.bench_function("WcLookupTable", |b| {
b.iter(|| table.classify(black_box('a')))
});
group.finish();
}
{
let mut group = c.benchmark_group("Classify DoubleWidth");
group.bench_function("WcWidth", |b| {
b.iter(|| WcWidth::from_char(black_box('\u{1100}')))
});
group.bench_function("WcLookupTable", |b| {
b.iter(|| table.classify(black_box('\u{1100}')))
});
group.finish();
}
{
let mut group = c.benchmark_group("Classify WidenedIn9");
group.bench_function("WcWidth", |b| {
b.iter(|| WcWidth::from_char(black_box('\u{231a}')))
});
group.bench_function("WcLookupTable", |b| {
b.iter(|| table.classify(black_box('\u{231a}')))
});
group.finish();
}
{
let mut group = c.benchmark_group("Classify Unassigned");
group.bench_function("WcWidth", |b| {
b.iter(|| WcWidth::from_char(black_box('\u{fbc9}')))
});
group.bench_function("WcLookupTable", |b| {
b.iter(|| table.classify(black_box('\u{fbc9}')))
});
group.finish();
}
{
let mut group = c.benchmark_group("column_width ASCII");
group.bench_function("grapheme_column_width", |b| {
b.iter(|| grapheme_column_width(black_box("a"), None))
});
group.finish();
}
{
let mut group = c.benchmark_group("column_width variation selector");
group.bench_function("grapheme_column_width", |b| {
b.iter(|| grapheme_column_width(black_box("\u{00a9}\u{FE0F}"), None))
});
group.finish();
}
termwiz: micro-optimize grapheme_column_width While profiling `time cat bigfile` I noted that a big chunk of the time is spent computing widths, so I wanted to dig into a bit. After playing around with a few options, I settled on the approach in this commit. The key observations: * WcWidth::from_char performs a series of binary searches. The fast path was for ASCII, but anything outside that range suffered in terms of latency. * Binary search does a lot more work than a simple table lookup, so it is desirable to use a lookup, and moreso to combine the different tables into a single table so that classification is an O(1) super fast lookup in the most common cases. Here's some benchmarking results comparing the prior implementation (grapheme_column_width) against this new pre-computed table implementation (grapheme_column_width_tbl). The ASCII case is more than 5x faster than before at a reasonably snappy ~3.5ns, with the more complex cases being closer to a constant ~20ns down from 120ns in some cases. There are changes here to widechar_width.rs that should get upstreamed. ``` column_width ASCII/grapheme_column_width time: [23.413 ns 23.432 ns 23.451 ns] column_width ASCII/grapheme_column_width_tbl time: [3.4066 ns 3.4092 ns 3.4121 ns] column_width variation selector/grapheme_column_width time: [119.99 ns 120.13 ns 120.28 ns] column_width variation selector/grapheme_column_width_tbl time: [21.185 ns 21.253 ns 21.346 ns] column_width variation selector unicode 14/grapheme_column_width time: [119.44 ns 119.56 ns 119.69 ns] column_width variation selector unicode 14/grapheme_column_width_tbl time: [21.214 ns 21.236 ns 21.264 ns] column_width WidenedIn9/grapheme_column_width time: [99.652 ns 99.905 ns 100.18 ns] column_width WidenedIn9/grapheme_column_width_tbl time: [21.394 ns 21.419 ns 21.446 ns] column_width Unassigned/grapheme_column_width time: [82.767 ns 82.843 ns 82.926 ns] column_width Unassigned/grapheme_column_width_tbl time: [24.230 ns 24.319 ns 24.428 ns] ``` Here's the benchmark summary after cleaning this diff up ready to commit; it shows ~70-80% improvement in these cases: ``` ; cargo criterion -- column_width column_width ASCII/grapheme_column_width time: [3.4237 ns 3.4347 ns 3.4463 ns] change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector/grapheme_column_width time: [20.918 ns 20.935 ns 20.957 ns] change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector unicode 14/grapheme_column_width time: [21.190 ns 21.210 ns 21.233 ns] change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05) Performance has improved. column_width WidenedIn9/grapheme_column_width time: [21.603 ns 21.630 ns 21.662 ns] change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05) Performance has improved. column_width Unassigned/grapheme_column_width time: [23.283 ns 23.355 ns 23.435 ns] change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05) Performance has improved. ```
2022-04-30 07:10:58 +03:00
{
let mut group = c.benchmark_group("column_width variation selector unicode 14");
let version = UnicodeVersion {
version: 14,
ambiguous_are_wide: false,
};
group.bench_function("grapheme_column_width", |b| {
b.iter(|| grapheme_column_width(black_box("\u{00a9}\u{FE0F}"), Some(version)))
});
group.finish();
}
termwiz: micro-optimize grapheme_column_width While profiling `time cat bigfile` I noted that a big chunk of the time is spent computing widths, so I wanted to dig into a bit. After playing around with a few options, I settled on the approach in this commit. The key observations: * WcWidth::from_char performs a series of binary searches. The fast path was for ASCII, but anything outside that range suffered in terms of latency. * Binary search does a lot more work than a simple table lookup, so it is desirable to use a lookup, and moreso to combine the different tables into a single table so that classification is an O(1) super fast lookup in the most common cases. Here's some benchmarking results comparing the prior implementation (grapheme_column_width) against this new pre-computed table implementation (grapheme_column_width_tbl). The ASCII case is more than 5x faster than before at a reasonably snappy ~3.5ns, with the more complex cases being closer to a constant ~20ns down from 120ns in some cases. There are changes here to widechar_width.rs that should get upstreamed. ``` column_width ASCII/grapheme_column_width time: [23.413 ns 23.432 ns 23.451 ns] column_width ASCII/grapheme_column_width_tbl time: [3.4066 ns 3.4092 ns 3.4121 ns] column_width variation selector/grapheme_column_width time: [119.99 ns 120.13 ns 120.28 ns] column_width variation selector/grapheme_column_width_tbl time: [21.185 ns 21.253 ns 21.346 ns] column_width variation selector unicode 14/grapheme_column_width time: [119.44 ns 119.56 ns 119.69 ns] column_width variation selector unicode 14/grapheme_column_width_tbl time: [21.214 ns 21.236 ns 21.264 ns] column_width WidenedIn9/grapheme_column_width time: [99.652 ns 99.905 ns 100.18 ns] column_width WidenedIn9/grapheme_column_width_tbl time: [21.394 ns 21.419 ns 21.446 ns] column_width Unassigned/grapheme_column_width time: [82.767 ns 82.843 ns 82.926 ns] column_width Unassigned/grapheme_column_width_tbl time: [24.230 ns 24.319 ns 24.428 ns] ``` Here's the benchmark summary after cleaning this diff up ready to commit; it shows ~70-80% improvement in these cases: ``` ; cargo criterion -- column_width column_width ASCII/grapheme_column_width time: [3.4237 ns 3.4347 ns 3.4463 ns] change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector/grapheme_column_width time: [20.918 ns 20.935 ns 20.957 ns] change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector unicode 14/grapheme_column_width time: [21.190 ns 21.210 ns 21.233 ns] change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05) Performance has improved. column_width WidenedIn9/grapheme_column_width time: [21.603 ns 21.630 ns 21.662 ns] change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05) Performance has improved. column_width Unassigned/grapheme_column_width time: [23.283 ns 23.355 ns 23.435 ns] change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05) Performance has improved. ```
2022-04-30 07:10:58 +03:00
{
let mut group = c.benchmark_group("column_width WidenedIn9");
group.bench_function("grapheme_column_width", |b| {
b.iter(|| grapheme_column_width(black_box("\u{231a}"), None))
});
group.finish();
}
termwiz: micro-optimize grapheme_column_width While profiling `time cat bigfile` I noted that a big chunk of the time is spent computing widths, so I wanted to dig into a bit. After playing around with a few options, I settled on the approach in this commit. The key observations: * WcWidth::from_char performs a series of binary searches. The fast path was for ASCII, but anything outside that range suffered in terms of latency. * Binary search does a lot more work than a simple table lookup, so it is desirable to use a lookup, and moreso to combine the different tables into a single table so that classification is an O(1) super fast lookup in the most common cases. Here's some benchmarking results comparing the prior implementation (grapheme_column_width) against this new pre-computed table implementation (grapheme_column_width_tbl). The ASCII case is more than 5x faster than before at a reasonably snappy ~3.5ns, with the more complex cases being closer to a constant ~20ns down from 120ns in some cases. There are changes here to widechar_width.rs that should get upstreamed. ``` column_width ASCII/grapheme_column_width time: [23.413 ns 23.432 ns 23.451 ns] column_width ASCII/grapheme_column_width_tbl time: [3.4066 ns 3.4092 ns 3.4121 ns] column_width variation selector/grapheme_column_width time: [119.99 ns 120.13 ns 120.28 ns] column_width variation selector/grapheme_column_width_tbl time: [21.185 ns 21.253 ns 21.346 ns] column_width variation selector unicode 14/grapheme_column_width time: [119.44 ns 119.56 ns 119.69 ns] column_width variation selector unicode 14/grapheme_column_width_tbl time: [21.214 ns 21.236 ns 21.264 ns] column_width WidenedIn9/grapheme_column_width time: [99.652 ns 99.905 ns 100.18 ns] column_width WidenedIn9/grapheme_column_width_tbl time: [21.394 ns 21.419 ns 21.446 ns] column_width Unassigned/grapheme_column_width time: [82.767 ns 82.843 ns 82.926 ns] column_width Unassigned/grapheme_column_width_tbl time: [24.230 ns 24.319 ns 24.428 ns] ``` Here's the benchmark summary after cleaning this diff up ready to commit; it shows ~70-80% improvement in these cases: ``` ; cargo criterion -- column_width column_width ASCII/grapheme_column_width time: [3.4237 ns 3.4347 ns 3.4463 ns] change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector/grapheme_column_width time: [20.918 ns 20.935 ns 20.957 ns] change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05) Performance has improved. column_width variation selector unicode 14/grapheme_column_width time: [21.190 ns 21.210 ns 21.233 ns] change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05) Performance has improved. column_width WidenedIn9/grapheme_column_width time: [21.603 ns 21.630 ns 21.662 ns] change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05) Performance has improved. column_width Unassigned/grapheme_column_width time: [23.283 ns 23.355 ns 23.435 ns] change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05) Performance has improved. ```
2022-04-30 07:10:58 +03:00
{
let mut group = c.benchmark_group("column_width Unassigned");
group.bench_function("grapheme_column_width", |b| {
b.iter(|| grapheme_column_width(black_box("\u{fbc9}"), None))
});
group.finish();
}
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);