2020-10-09 19:44:22 +03:00
|
|
|
Test for new Text -> Bytes conversions explicitly using UTF-8 as the encoding
|
|
|
|
|
|
|
|
Unison has function for converting between `Text` and a UTF-8 `Bytes` encoding of the Text.
|
|
|
|
|
|
|
|
```ucm
|
2024-06-12 01:22:09 +03:00
|
|
|
scratch/main> find Utf8
|
2020-10-09 19:44:22 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
ascii characters are encoded as single bytes (in the range 0-127).
|
|
|
|
|
|
|
|
```unison
|
|
|
|
ascii: Text
|
|
|
|
ascii = "ABCDE"
|
|
|
|
|
|
|
|
> toUtf8 ascii
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
non-ascii characters are encoded as multiple bytes.
|
|
|
|
|
|
|
|
```unison
|
|
|
|
greek: Text
|
|
|
|
greek = "ΑΒΓΔΕ"
|
|
|
|
|
|
|
|
> toUtf8 greek
|
|
|
|
```
|
|
|
|
|
2024-04-14 23:35:11 +03:00
|
|
|
We can check that encoding and then decoding should give us back the same `Text` we started with
|
2020-10-09 19:44:22 +03:00
|
|
|
|
|
|
|
```unison
|
|
|
|
checkRoundTrip: Text -> [Result]
|
2024-04-14 23:35:11 +03:00
|
|
|
checkRoundTrip t =
|
2020-10-09 19:44:22 +03:00
|
|
|
bytes = toUtf8 t
|
2024-04-14 23:35:11 +03:00
|
|
|
match fromUtf8.impl bytes with
|
2020-10-09 19:44:22 +03:00
|
|
|
Left e -> [Result.Fail "could not decode"]
|
|
|
|
Right t' -> if t == t' then [Result.Ok "Passed"] else [Result.Fail ("Got: " ++ t' ++ " Expected: " ++ t)]
|
|
|
|
|
|
|
|
greek = "ΑΒΓΔΕ"
|
|
|
|
|
2021-02-05 05:31:12 +03:00
|
|
|
test> greekTest = checkRoundTrip greek
|
2020-10-09 19:44:22 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
If we try to decode an invalid set of bytes, we get back `Text` explaining the decoding error:
|
|
|
|
|
|
|
|
```unison
|
|
|
|
greek_bytes = Bytes.fromList [206, 145, 206, 146, 206, 147, 206, 148, 206]
|
|
|
|
|
|
|
|
|
|
|
|
-- Its an error if we drop the first byte
|
2021-02-02 01:10:22 +03:00
|
|
|
> match fromUtf8.impl (drop 1 greek_bytes) with
|
2021-01-29 00:57:03 +03:00
|
|
|
Left (Failure _ t _) -> t
|
2023-01-13 18:16:55 +03:00
|
|
|
_ -> bug "expected a left"
|
2020-10-09 19:44:22 +03:00
|
|
|
|
|
|
|
```
|