While profiling `time cat bigfile` I noted that a big chunk of the
time is spent computing widths, so I wanted to dig into a bit.
After playing around with a few options, I settled on the approach
in this commit.
The key observations:
* WcWidth::from_char performs a series of binary searches.
The fast path was for ASCII, but anything outside that range
suffered in terms of latency.
* Binary search does a lot more work than a simple table lookup,
so it is desirable to use a lookup, and moreso to combine the
different tables into a single table so that classification
is an O(1) super fast lookup in the most common cases.
Here's some benchmarking results comparing the prior implementation
(grapheme_column_width) against this new pre-computed table
implementation (grapheme_column_width_tbl).
The ASCII case is more than 5x faster than before at a reasonably snappy
~3.5ns, with the more complex cases being closer to a constant ~20ns
down from 120ns in some cases.
There are changes here to widechar_width.rs that should get
upstreamed.
```
column_width ASCII/grapheme_column_width
time: [23.413 ns 23.432 ns 23.451 ns]
column_width ASCII/grapheme_column_width_tbl
time: [3.4066 ns 3.4092 ns 3.4121 ns]
column_width variation selector/grapheme_column_width
time: [119.99 ns 120.13 ns 120.28 ns]
column_width variation selector/grapheme_column_width_tbl
time: [21.185 ns 21.253 ns 21.346 ns]
column_width variation selector unicode 14/grapheme_column_width
time: [119.44 ns 119.56 ns 119.69 ns]
column_width variation selector unicode 14/grapheme_column_width_tbl
time: [21.214 ns 21.236 ns 21.264 ns]
column_width WidenedIn9/grapheme_column_width
time: [99.652 ns 99.905 ns 100.18 ns]
column_width WidenedIn9/grapheme_column_width_tbl
time: [21.394 ns 21.419 ns 21.446 ns]
column_width Unassigned/grapheme_column_width
time: [82.767 ns 82.843 ns 82.926 ns]
column_width Unassigned/grapheme_column_width_tbl
time: [24.230 ns 24.319 ns 24.428 ns]
```
Here's the benchmark summary after cleaning this diff up ready
to commit; it shows ~70-80% improvement in these cases:
```
; cargo criterion -- column_width
column_width ASCII/grapheme_column_width
time: [3.4237 ns 3.4347 ns 3.4463 ns]
change: [-85.401% -85.353% -85.302%] (p = 0.00 < 0.05)
Performance has improved.
column_width variation selector/grapheme_column_width
time: [20.918 ns 20.935 ns 20.957 ns]
change: [-82.562% -82.384% -82.152%] (p = 0.00 < 0.05)
Performance has improved.
column_width variation selector unicode 14/grapheme_column_width
time: [21.190 ns 21.210 ns 21.233 ns]
change: [-82.294% -82.261% -82.224%] (p = 0.00 < 0.05)
Performance has improved.
column_width WidenedIn9/grapheme_column_width
time: [21.603 ns 21.630 ns 21.662 ns]
change: [-78.429% -78.375% -78.322%] (p = 0.00 < 0.05)
Performance has improved.
column_width Unassigned/grapheme_column_width
time: [23.283 ns 23.355 ns 23.435 ns]
change: [-71.826% -71.734% -71.641%] (p = 0.00 < 0.05)
Performance has improved.
```