mirror of
https://github.com/urbit/shrub.git
synced 2024-12-14 20:02:51 +03:00
131 lines
4.2 KiB
Plaintext
131 lines
4.2 KiB
Plaintext
|
|%
|
||
|
++ line
|
||
|
:> an individual codepoint definition
|
||
|
:>
|
||
|
:> code: the codepoint in hexadecimal format
|
||
|
:> name: the character name
|
||
|
:> gen: the type of character this is
|
||
|
:> can: the canonical combining class for ordering algorithms
|
||
|
:> bidi: the bidirectional category of this character
|
||
|
:> de: the character decomposition mapping
|
||
|
:> decimal: the decimal digit value (or ~)
|
||
|
:> digit: the digit value, covering non decimal radix forms
|
||
|
:> numeric: the numeric value, including fractions
|
||
|
:> mirrored: whether char is mirrored in bidirectional text
|
||
|
:> old-name: unicode 1.0 compatibility name
|
||
|
:> iso: iso 10646 comment field
|
||
|
:> up: uppercase mapping codepoint
|
||
|
:> low: lowercase mapping codepoint
|
||
|
:> title: titlecase mapping codepoint
|
||
|
:>
|
||
|
$: code/@c
|
||
|
name/tape
|
||
|
gen/general
|
||
|
can/@ud
|
||
|
bi/bidi
|
||
|
de/decomp
|
||
|
:: todo: decimal/digit/numeric need to be parsed.
|
||
|
decimal/tape
|
||
|
digit/tape
|
||
|
numeric/tape
|
||
|
mirrored/?
|
||
|
old-name/tape
|
||
|
iso/tape
|
||
|
up/(unit @c)
|
||
|
low/(unit @c)
|
||
|
title/(unit @c)
|
||
|
==
|
||
|
::
|
||
|
++ general
|
||
|
:> one of the normative or informative unicode general categories
|
||
|
:>
|
||
|
:> these abbreviations are as found in the unicode standard, except
|
||
|
:> lowercased as to be valid symbols.
|
||
|
$? $lu :< letter, uppercase
|
||
|
$ll :< letter, lowercase
|
||
|
$lt :< letter, titlecase
|
||
|
$mn :< mark, non-spacing
|
||
|
$mc :< mark, spacing combining
|
||
|
$me :< mark, enclosing
|
||
|
$nd :< number, decimal digit
|
||
|
$nl :< number, letter
|
||
|
$no :< number, other
|
||
|
$zs :< separator, space
|
||
|
$zl :< separator, line
|
||
|
$zp :< separator, paragraph
|
||
|
$cc :< other, control
|
||
|
$cf :< other, format
|
||
|
$cs :< other, surrogate
|
||
|
$co :< other, private use
|
||
|
$cn :< other, not assigned
|
||
|
::
|
||
|
$lm :< letter, modifier
|
||
|
$lo :< letter, other
|
||
|
$pc :< punctuation, connector
|
||
|
$pd :< punctuation, dash
|
||
|
$ps :< punctuation, open
|
||
|
$pe :< punctuation, close
|
||
|
$pi :< punctuation, initial quote
|
||
|
$pf :< punctuation, final quote
|
||
|
$po :< punctuation, other
|
||
|
$sm :< symbol, math
|
||
|
$sc :< symbol, currency
|
||
|
$sk :< symbol, modifier
|
||
|
$so :< symbol, other
|
||
|
==
|
||
|
::
|
||
|
++ bidi
|
||
|
:> bidirectional category of a unicode character
|
||
|
$? $l :< left-to-right
|
||
|
$lre :< left-to-right embedding
|
||
|
$lri :< left-to-right isolate
|
||
|
$lro :< left-to-right override
|
||
|
$fsi :< first strong isolate
|
||
|
$r :< right-to-left
|
||
|
$al :< right-to-left arabic
|
||
|
$rle :< right-to-left embedding
|
||
|
$rli :< right-to-left isolate
|
||
|
$rlo :< right-to-left override
|
||
|
$pdf :< pop directional format
|
||
|
$pdi :< pop directional isolate
|
||
|
$en :< european number
|
||
|
$es :< european number separator
|
||
|
$et :< european number terminator
|
||
|
$an :< arabic number
|
||
|
$cs :< common number separator
|
||
|
$nsm :< non-spacing mark
|
||
|
$bn :< boundary neutral
|
||
|
$b :< paragraph separator
|
||
|
$s :< segment separator
|
||
|
$ws :< whitespace
|
||
|
$on :< other neutrals
|
||
|
==
|
||
|
::
|
||
|
++ decomp
|
||
|
:> character decomposition mapping.
|
||
|
:>
|
||
|
:> tag: type of decomposition.
|
||
|
:> c: a list of codepoints this decomposes into.
|
||
|
(unit {tag/(unit decomp-tag) c/(list @c)})
|
||
|
::
|
||
|
++ decomp-tag
|
||
|
:> tag that describes the type of a character decomposition.
|
||
|
$? $font :< a font variant
|
||
|
$no-break :< a no-break version of a space or hyphen
|
||
|
$initial :< an initial presentation form (arabic)
|
||
|
$medial :< a medial presentation form (arabic)
|
||
|
$final :< a final presentation form (arabic)
|
||
|
$isolated :< an isolated presentation form (arabic)
|
||
|
$circle :< an encircled form
|
||
|
$super :< a superscript form
|
||
|
$sub :< a subscript form
|
||
|
$vertical :< a vertical layout presentation form
|
||
|
$wide :< a wide (or zenkaku) compatibility character
|
||
|
$narrow :< a narrow (or hankaku) compatibility character
|
||
|
$small :< a small variant form (cns compatibility)
|
||
|
$square :< a cjk squared font variant
|
||
|
$fraction :< a vulgar fraction form
|
||
|
$compat :< otherwise unspecified compatibility character
|
||
|
==
|
||
|
--
|