urbit/mar/unicode-data.hoon

80 lines
2.6 KiB
Plaintext
Raw Normal View History

/- unicode-data
=, eyre
=, format
::
|_ all/(list line:unicode-data)
++ grab
2018-05-25 01:39:56 +03:00
:: converts from mark to unicode-data.
|%
2018-01-19 09:19:48 +03:00
++ mime |=([* a=octs] (txt (to-wain q.a))) :: XX mark translation
++ txt
2018-01-19 09:19:48 +03:00
|^ |= a=wain
^+ all
2018-01-19 09:19:48 +03:00
%+ murn a
|= b=cord
^- (unit line:unicode-data)
?~ b ~
`(rash b line)
::
2018-05-25 01:39:56 +03:00
:: parses a single character information line of the unicode data file.
++ line
;~ (glue sem)
hex :: code/@c codepoint in hex format
name-string :: name/tape character name
general-category :: gen/general type of character
(bass 10 (plus dit)) :: can/@ud canonical combining class
bidi-category :: bi/bidi bidirectional category
decomposition-mapping :: de/decomp decomposition mapping
::
:: todo: decimal/digit/numeric need to be parsed.
::
string-number :: decimal/tape decimal digit value (or ~)
string-number :: digit/tape digit value, even if non-decimal
string-number :: numeric/tape numeric value, including fractions
::
2018-03-19 05:40:38 +03:00
(fuss 'Y' 'N') :: mirrored/? is char mirrored in bidi text?
name-string :: old-name/tape unicode 1.0 compatibility name
name-string :: iso/tape iso 10646 comment field
(punt hex) :: up/(unit @c) uppercase mapping codepoint
(punt hex) :: low/(unit @c) lowercase mapping codepoint
(punt hex) :: title/(unit @c) titlecase mapping codepoint
==
::
2018-05-25 01:39:56 +03:00
:: parses a single name or comment string.
++ name-string
%+ cook
2018-01-19 09:19:48 +03:00
|=(a=tape a)
(star ;~(less sem prn))
::
2018-05-25 01:39:56 +03:00
:: parses a unicode general category abbreviation to symbol
++ general-category
%+ sear (soft general:unicode-data)
:(cook crip cass ;~(plug hig low (easy ~)))
::
2018-05-25 01:39:56 +03:00
:: parses a bidirectional category abbreviation to symbol.
++ bidi-category
%+ sear (soft bidi:unicode-data)
:(cook crip cass (star hig))
::
++ decomposition-mapping
%- punt :: optional
:: a tag and a list of characters to decompose to
;~ plug
(punt (ifix [gal ;~(plug gar ace)] decomp-tag))
(cook |=(a=(list @c) a) (most ace hex))
==
::
++ decomp-tag
%+ sear (soft decomp-tag:unicode-data)
:(cook crip cass (star alf))
::
++ string-number
%+ cook
2018-01-19 09:19:48 +03:00
|=(a=tape a)
(star ;~(pose nud fas hep))
::
--
--
++ grad %txt
--