ladybird/Tests/LibUnicode
Timothy Flynn 0652cc48c0 LibUnicode: Perform code point property lookups in constant time
We currently produce a single table for all categories of code point
properties (GeneralCategory, Script, etc.). Each row contains a field
indicating the range of code points to which that property applies. At
runtime, we then do a binary search through that table to decide if a
code point has a property.

This changes our approach to generate a 2-stage lookup table for each of
those categories. There is an in-depth explanation of these tables above
the new `create_code_point_tables` method. The end effect is that code
point property lookup is reduced from a binary search to constant-time
array lookups.

In total, this change:

    * Increases the size of libunicode.so from 2.7 MB to 2.9 MB.

    * Reduces the runtime of the new benchmark test case added here from
      3.576s to 1.020s (a 3.5x speedup).

    * In a profile of resizing a TextEditor window with a 3MB file open,
      the runtime of checking if a code point has a word break property
      reduces from ~81% to ~56%.
2023-07-26 08:36:20 +02:00
..
CMakeLists.txt LibUnicode: Add a unit test for Unicode grapheme and word segmentation 2023-02-25 22:23:39 +01:00
TestEmoji.cpp LibUnicode: Detect ZWJ sequences when filtering by emoji presentation 2023-03-05 20:21:57 +01:00
TestSegmentation.cpp LibUnicode: Add a unit test for Unicode grapheme and word segmentation 2023-02-25 22:23:39 +01:00
TestUnicodeCharacterTypes.cpp LibUnicode: Perform code point property lookups in constant time 2023-07-26 08:36:20 +02:00
TestUnicodeNormalization.cpp LibUnicode+LibJS: Propagate OOM from Unicode normalization 2023-01-09 22:48:15 +00:00