mirror of
https://github.com/wez/wezterm.git
synced 2024-12-27 15:37:29 +03:00
601a85e12b
In order to support RTL/BIDI, wezterm needs a bidi implementation. I don't think a well-conforming rust implementation exists today; what I found were implementations that didn't pass 100% of the conformance tests. So I decided to port "bidiref", the reference implementation of the UBA described in http://unicode.org/reports/tr9/ to Rust. This implementation focuses on conformance: no special measures have been taken to optimize it so far, with my focus having been to ensure that all of the approx 780,000 test cases in the unicode data for unicode 14 pass. Having the tests passing 100% allows for making performance improvements with confidence in the future. The API isn't completely designed/fully baked. Until I get to hooking it up to wezterm's shaper, I'm not 100% sure exactly what I'll need. There's a good discussion on API in https://github.com/open-i18n/rust-unic/issues/273 that suggests omitting "legacy" operations such as reordering. I suspect that wezterm may actually need that function to support monospace text layout in some terminal scenarios, but regardless: reordering is part of the conformance test suite so it remains a part of the API. That said: the API does model the major operations as separate phases, so you should be able to pay for just what you use: * Resolving the embedding levels from a paragraph * Returning paragraph runs of those levels (and their directions) * Returning the whitespace-level-reset runs for a line-slice within the paragraph * Returning the reordered indices + levels for a line-slice within the paragraph. refs: https://github.com/wez/wezterm/issues/784 refs: https://github.com/kas-gui/kas-text/issues/20
194 lines
8.7 KiB
Plaintext
194 lines
8.7 KiB
Plaintext
# BidiBrackets-14.0.0.txt
|
|
# Date: 2021-06-30, 23:59:00 GMT [AG, LI, KW]
|
|
# © 2021 Unicode®, Inc.
|
|
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
|
|
# For terms of use, see https://www.unicode.org/terms_of_use.html
|
|
#
|
|
# Unicode Character Database
|
|
# For documentation, see https://www.unicode.org/reports/tr44/
|
|
#
|
|
# Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type Properties
|
|
#
|
|
# This file is a normative contributory data file in the Unicode
|
|
# Character Database.
|
|
#
|
|
# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
|
|
# which establishes a mapping between characters that are treated as
|
|
# bracket pairs by the Unicode Bidirectional Algorithm.
|
|
#
|
|
# Bidi_Paired_Bracket_Type is a normative property of type Enumeration,
|
|
# which classifies characters into opening and closing paired brackets
|
|
# for the purposes of the Unicode Bidirectional Algorithm.
|
|
#
|
|
# This file lists the set of code points with Bidi_Paired_Bracket_Type
|
|
# property values Open and Close. The set is derived from the character
|
|
# properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
|
|
# and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
|
|
# form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
|
|
# Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
|
|
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
|
|
# Open (o) and Close (c), respectively.
|
|
#
|
|
# The brackets with ticks U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
|
|
# through U+2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER are paired the
|
|
# same way their glyphs form mirror pairs, according to their bmg property
|
|
# values. They are not paired on the basis of a diagonal or antidiagonal
|
|
# matching of the corner ticks inferred from code point order.
|
|
#
|
|
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
|
|
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
|
|
# and therefore do not form a bracket pair.
|
|
#
|
|
# The Unicode property value stability policy guarantees that characters
|
|
# which have bpt=o or bpt=c also have bc=ON and Bidi_M=Y. As a result, an
|
|
# implementation can optimize the lookup of the Bidi_Paired_Bracket_Type
|
|
# property values Open and Close by restricting the processing to characters
|
|
# with bc=ON.
|
|
#
|
|
# The format of the file is three fields separated by a semicolon.
|
|
# Field 0: Unicode code point value, represented as a hexadecimal value
|
|
# Field 1: Bidi_Paired_Bracket property value, a code point value or <none>
|
|
# Field 2: Bidi_Paired_Bracket_Type property value, one of the following:
|
|
# o Open
|
|
# c Close
|
|
# n None
|
|
# The names of the characters in field 0 are given in comments at the end
|
|
# of each line.
|
|
#
|
|
# For information on bidirectional paired brackets, see UAX #9: Unicode
|
|
# Bidirectional Algorithm, at https://www.unicode.org/reports/tr9/
|
|
#
|
|
# This file was originally created by Andrew Glass and Laurentiu Iancu
|
|
# for Unicode 6.3.
|
|
|
|
0028; 0029; o # LEFT PARENTHESIS
|
|
0029; 0028; c # RIGHT PARENTHESIS
|
|
005B; 005D; o # LEFT SQUARE BRACKET
|
|
005D; 005B; c # RIGHT SQUARE BRACKET
|
|
007B; 007D; o # LEFT CURLY BRACKET
|
|
007D; 007B; c # RIGHT CURLY BRACKET
|
|
0F3A; 0F3B; o # TIBETAN MARK GUG RTAGS GYON
|
|
0F3B; 0F3A; c # TIBETAN MARK GUG RTAGS GYAS
|
|
0F3C; 0F3D; o # TIBETAN MARK ANG KHANG GYON
|
|
0F3D; 0F3C; c # TIBETAN MARK ANG KHANG GYAS
|
|
169B; 169C; o # OGHAM FEATHER MARK
|
|
169C; 169B; c # OGHAM REVERSED FEATHER MARK
|
|
2045; 2046; o # LEFT SQUARE BRACKET WITH QUILL
|
|
2046; 2045; c # RIGHT SQUARE BRACKET WITH QUILL
|
|
207D; 207E; o # SUPERSCRIPT LEFT PARENTHESIS
|
|
207E; 207D; c # SUPERSCRIPT RIGHT PARENTHESIS
|
|
208D; 208E; o # SUBSCRIPT LEFT PARENTHESIS
|
|
208E; 208D; c # SUBSCRIPT RIGHT PARENTHESIS
|
|
2308; 2309; o # LEFT CEILING
|
|
2309; 2308; c # RIGHT CEILING
|
|
230A; 230B; o # LEFT FLOOR
|
|
230B; 230A; c # RIGHT FLOOR
|
|
2329; 232A; o # LEFT-POINTING ANGLE BRACKET
|
|
232A; 2329; c # RIGHT-POINTING ANGLE BRACKET
|
|
2768; 2769; o # MEDIUM LEFT PARENTHESIS ORNAMENT
|
|
2769; 2768; c # MEDIUM RIGHT PARENTHESIS ORNAMENT
|
|
276A; 276B; o # MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
|
|
276B; 276A; c # MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
|
|
276C; 276D; o # MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
|
|
276D; 276C; c # MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
|
|
276E; 276F; o # HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
|
|
276F; 276E; c # HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
|
|
2770; 2771; o # HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
|
|
2771; 2770; c # HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
|
|
2772; 2773; o # LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
|
|
2773; 2772; c # LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
|
|
2774; 2775; o # MEDIUM LEFT CURLY BRACKET ORNAMENT
|
|
2775; 2774; c # MEDIUM RIGHT CURLY BRACKET ORNAMENT
|
|
27C5; 27C6; o # LEFT S-SHAPED BAG DELIMITER
|
|
27C6; 27C5; c # RIGHT S-SHAPED BAG DELIMITER
|
|
27E6; 27E7; o # MATHEMATICAL LEFT WHITE SQUARE BRACKET
|
|
27E7; 27E6; c # MATHEMATICAL RIGHT WHITE SQUARE BRACKET
|
|
27E8; 27E9; o # MATHEMATICAL LEFT ANGLE BRACKET
|
|
27E9; 27E8; c # MATHEMATICAL RIGHT ANGLE BRACKET
|
|
27EA; 27EB; o # MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
|
|
27EB; 27EA; c # MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
|
|
27EC; 27ED; o # MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
|
|
27ED; 27EC; c # MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
|
|
27EE; 27EF; o # MATHEMATICAL LEFT FLATTENED PARENTHESIS
|
|
27EF; 27EE; c # MATHEMATICAL RIGHT FLATTENED PARENTHESIS
|
|
2983; 2984; o # LEFT WHITE CURLY BRACKET
|
|
2984; 2983; c # RIGHT WHITE CURLY BRACKET
|
|
2985; 2986; o # LEFT WHITE PARENTHESIS
|
|
2986; 2985; c # RIGHT WHITE PARENTHESIS
|
|
2987; 2988; o # Z NOTATION LEFT IMAGE BRACKET
|
|
2988; 2987; c # Z NOTATION RIGHT IMAGE BRACKET
|
|
2989; 298A; o # Z NOTATION LEFT BINDING BRACKET
|
|
298A; 2989; c # Z NOTATION RIGHT BINDING BRACKET
|
|
298B; 298C; o # LEFT SQUARE BRACKET WITH UNDERBAR
|
|
298C; 298B; c # RIGHT SQUARE BRACKET WITH UNDERBAR
|
|
298D; 2990; o # LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
|
|
298E; 298F; c # RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
|
|
298F; 298E; o # LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
|
|
2990; 298D; c # RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
|
|
2991; 2992; o # LEFT ANGLE BRACKET WITH DOT
|
|
2992; 2991; c # RIGHT ANGLE BRACKET WITH DOT
|
|
2993; 2994; o # LEFT ARC LESS-THAN BRACKET
|
|
2994; 2993; c # RIGHT ARC GREATER-THAN BRACKET
|
|
2995; 2996; o # DOUBLE LEFT ARC GREATER-THAN BRACKET
|
|
2996; 2995; c # DOUBLE RIGHT ARC LESS-THAN BRACKET
|
|
2997; 2998; o # LEFT BLACK TORTOISE SHELL BRACKET
|
|
2998; 2997; c # RIGHT BLACK TORTOISE SHELL BRACKET
|
|
29D8; 29D9; o # LEFT WIGGLY FENCE
|
|
29D9; 29D8; c # RIGHT WIGGLY FENCE
|
|
29DA; 29DB; o # LEFT DOUBLE WIGGLY FENCE
|
|
29DB; 29DA; c # RIGHT DOUBLE WIGGLY FENCE
|
|
29FC; 29FD; o # LEFT-POINTING CURVED ANGLE BRACKET
|
|
29FD; 29FC; c # RIGHT-POINTING CURVED ANGLE BRACKET
|
|
2E22; 2E23; o # TOP LEFT HALF BRACKET
|
|
2E23; 2E22; c # TOP RIGHT HALF BRACKET
|
|
2E24; 2E25; o # BOTTOM LEFT HALF BRACKET
|
|
2E25; 2E24; c # BOTTOM RIGHT HALF BRACKET
|
|
2E26; 2E27; o # LEFT SIDEWAYS U BRACKET
|
|
2E27; 2E26; c # RIGHT SIDEWAYS U BRACKET
|
|
2E28; 2E29; o # LEFT DOUBLE PARENTHESIS
|
|
2E29; 2E28; c # RIGHT DOUBLE PARENTHESIS
|
|
2E55; 2E56; o # LEFT SQUARE BRACKET WITH STROKE
|
|
2E56; 2E55; c # RIGHT SQUARE BRACKET WITH STROKE
|
|
2E57; 2E58; o # LEFT SQUARE BRACKET WITH DOUBLE STROKE
|
|
2E58; 2E57; c # RIGHT SQUARE BRACKET WITH DOUBLE STROKE
|
|
2E59; 2E5A; o # TOP HALF LEFT PARENTHESIS
|
|
2E5A; 2E59; c # TOP HALF RIGHT PARENTHESIS
|
|
2E5B; 2E5C; o # BOTTOM HALF LEFT PARENTHESIS
|
|
2E5C; 2E5B; c # BOTTOM HALF RIGHT PARENTHESIS
|
|
3008; 3009; o # LEFT ANGLE BRACKET
|
|
3009; 3008; c # RIGHT ANGLE BRACKET
|
|
300A; 300B; o # LEFT DOUBLE ANGLE BRACKET
|
|
300B; 300A; c # RIGHT DOUBLE ANGLE BRACKET
|
|
300C; 300D; o # LEFT CORNER BRACKET
|
|
300D; 300C; c # RIGHT CORNER BRACKET
|
|
300E; 300F; o # LEFT WHITE CORNER BRACKET
|
|
300F; 300E; c # RIGHT WHITE CORNER BRACKET
|
|
3010; 3011; o # LEFT BLACK LENTICULAR BRACKET
|
|
3011; 3010; c # RIGHT BLACK LENTICULAR BRACKET
|
|
3014; 3015; o # LEFT TORTOISE SHELL BRACKET
|
|
3015; 3014; c # RIGHT TORTOISE SHELL BRACKET
|
|
3016; 3017; o # LEFT WHITE LENTICULAR BRACKET
|
|
3017; 3016; c # RIGHT WHITE LENTICULAR BRACKET
|
|
3018; 3019; o # LEFT WHITE TORTOISE SHELL BRACKET
|
|
3019; 3018; c # RIGHT WHITE TORTOISE SHELL BRACKET
|
|
301A; 301B; o # LEFT WHITE SQUARE BRACKET
|
|
301B; 301A; c # RIGHT WHITE SQUARE BRACKET
|
|
FE59; FE5A; o # SMALL LEFT PARENTHESIS
|
|
FE5A; FE59; c # SMALL RIGHT PARENTHESIS
|
|
FE5B; FE5C; o # SMALL LEFT CURLY BRACKET
|
|
FE5C; FE5B; c # SMALL RIGHT CURLY BRACKET
|
|
FE5D; FE5E; o # SMALL LEFT TORTOISE SHELL BRACKET
|
|
FE5E; FE5D; c # SMALL RIGHT TORTOISE SHELL BRACKET
|
|
FF08; FF09; o # FULLWIDTH LEFT PARENTHESIS
|
|
FF09; FF08; c # FULLWIDTH RIGHT PARENTHESIS
|
|
FF3B; FF3D; o # FULLWIDTH LEFT SQUARE BRACKET
|
|
FF3D; FF3B; c # FULLWIDTH RIGHT SQUARE BRACKET
|
|
FF5B; FF5D; o # FULLWIDTH LEFT CURLY BRACKET
|
|
FF5D; FF5B; c # FULLWIDTH RIGHT CURLY BRACKET
|
|
FF5F; FF60; o # FULLWIDTH LEFT WHITE PARENTHESIS
|
|
FF60; FF5F; c # FULLWIDTH RIGHT WHITE PARENTHESIS
|
|
FF62; FF63; o # HALFWIDTH LEFT CORNER BRACKET
|
|
FF63; FF62; c # HALFWIDTH RIGHT CORNER BRACKET
|
|
|
|
# EOF
|