feat:allow other kinds of unicode space as digit group separators

Based on feedback in chat, I added support for several more kinds of
Unicode space character for separating digit groups, both when reading
and when displaying numbers. These are the spaces currently supported,
which are just my best guess at the ones that might show up in CSV files
now and then:

space,
no-break space,
en space,
em space,
punctuation space,
thin space,
narrow no-break space,
medium mathematical space
This commit is contained in:
Simon Michael 2024-02-28 09:12:17 -10:00
parent eb0f736899
commit fac3ee89af
3 changed files with 68 additions and 11 deletions

View File

@ -1169,7 +1169,20 @@ rawnumberp = label "number" $ do
pure $ NoSeparators grp1 (Just (decPt, mempty)) pure $ NoSeparators grp1 (Just (decPt, mempty))
isDigitSeparatorChar :: Char -> Bool isDigitSeparatorChar :: Char -> Bool
isDigitSeparatorChar c = isDecimalMark c || c == ' ' isDigitSeparatorChar c = isDecimalMark c || isDigitSeparatorSpaceChar c
-- | Kinds of unicode space character we accept as digit group marks.
-- See also https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping .
isDigitSeparatorSpaceChar :: Char -> Bool
isDigitSeparatorSpaceChar c =
c == ' ' -- space
|| c == ' ' -- no-break space
|| c == '' -- en space
|| c == '' -- em space
|| c == '' -- punctuation space
|| c == '' -- thin space
|| c == '' -- narrow no-break space
|| c == '' -- medium mathematical space
-- | Some kinds of number literal we might parse. -- | Some kinds of number literal we might parse.
data RawNumber data RawNumber

View File

@ -1200,13 +1200,23 @@ A *decimal mark* can be written as a period or a comma:
1,23 1,23
In the integer part of the quantity (left of the decimal mark), groups In the integer part of the quantity (left of the decimal mark), groups
of digits can optionally be separated by a *digit group mark* - a of digits can optionally be separated by a *digit group mark* -
space, comma, or period (different from the decimal mark): a comma or period (whichever is not used as decimal mark),
or a space (any of these Unicode space characters:
space,
no-break space,
en space,
em space,
punctuation space,
thin space,
narrow no-break space,
medium mathematical space).
$1,000,000.00 $1,000,000.00
EUR 2.000.000,00 EUR 2.000.000,00
INR 9,99,99,999.00 INR 9,99,99,999.00
1 000 000.9455 1 000 000.00 <- ordinary space
1 000 000.00 <- no-break space
hledger is not biased towards [period or comma decimal marks][international number formats], hledger is not biased towards [period or comma decimal marks][international number formats],
so a number containing just one period or comma, like `1,000` or `1.000`, is ambiguous. so a number containing just one period or comma, like `1,000` or `1.000`, is ambiguous.

View File

@ -36,14 +36,27 @@ $ hledger bal -f -
>2 // >2 //
>= 1 >= 1
# ** 5. Space between digits groups # ** 5. Spaces between digit groups, any of the 8 unicode space characters we support.
< <
2017/1/1 2000-01-01
a 1 000.00 EUR (a) 1 000. A ; space
b -1 000.00 EUR (b) 1 000. B ; no-break space
$ hledger bal -f - --no-total (c) 1000. C ; en space
1 000.00 EUR a (d) 1000. D ; em space
-1 000.00 EUR b (e) 1000. E ; punctuation space
(f) 1000. F ; thin space
(g) 1000. G ; narrow no-break space
(h) 1000. H ; medium mathematical space
$ hledger bal -f - -N
1 000 A a
1 000 B b
1000 C c
1000 D d
1000 E e
1000 F f
1000 G g
1000 H h
# ** 6. Space between digits groups in commodity directive # ** 6. Space between digits groups in commodity directive
< <
@ -203,3 +216,24 @@ Balance changes in 2021:
===++=================================== ===++===================================
a || -0.12345678901234567890123456 EUR a || -0.12345678901234567890123456 EUR
# ** 21. When spaces are used inconsistently, what happens ? The usual, first one seen sets commodity style.
<
2000-01-01
(a) 1 000. A ; no-break space
(a) 1 000. A ; space
(a) 1000. A ; en space
$ hledger bal -f - -N
3 000 A a
# ** 22. And a commodity directive can enforce consistency as usual.
<
commodity 1000. A ; narrow no-break space
2000-01-01
(a) 1 000. A ; space
(a) 1 000. A ; no-break space
(a) 1000. A ; en space
$ hledger bal -f - -N
3000 A a