unison/unison-src/transcripts-using-base/utf8.md

Test for new Text -> Bytes conversions explicitly using UTF-8 as the encoding

Unison has function for converting between `Text` and a UTF-8 `Bytes` encoding of the Text.

```ucm
scratch/main> find Utf8
```

ascii characters are encoded as single bytes (in the range 0-127).

```unison
ascii: Text
ascii = "ABCDE"

> toUtf8 ascii

```

non-ascii characters are encoded as multiple bytes.

```unison
greek: Text
greek = "ΑΒΓΔΕ"

> toUtf8 greek
```

We can check that encoding and then decoding should give us back the same `Text` we started with

```unison
checkRoundTrip: Text -> [Result]
checkRoundTrip t =
  bytes = toUtf8 t
  match fromUtf8.impl bytes with
    Left e -> [Result.Fail "could not decode"]
    Right t' -> if t == t' then [Result.Ok "Passed"] else [Result.Fail ("Got: " ++ t' ++ " Expected: " ++ t)]

greek = "ΑΒΓΔΕ"

test> greekTest = checkRoundTrip greek
```

If we try to decode an invalid set of bytes, we get back `Text` explaining the decoding error:

```unison
greek_bytes = Bytes.fromList [206, 145, 206, 146, 206, 147, 206, 148, 206]


-- Its an error if we drop the first byte
> match fromUtf8.impl (drop 1 greek_bytes) with
  Left (Failure _ t _) -> t
  _ -> bug "expected a left"

```
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00			`Test for new Text -> Bytes conversions explicitly using UTF-8 as the encoding`

			Unison has function for converting between `Text` and a UTF-8 `Bytes` encoding of the Text.

			```ucm
Replace '.' references with scratch/main 2024-06-12 01:22:09 +03:00			`scratch/main> find Utf8`
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00			```

			`ascii characters are encoded as single bytes (in the range 0-127).`

			```unison
			`ascii: Text`
			`ascii = "ABCDE"`

			`> toUtf8 ascii`

			```

			`non-ascii characters are encoded as multiple bytes.`

			```unison
			`greek: Text`
			`greek = "ΑΒΓΔΕ"`

			`> toUtf8 greek`
			```

deprecate `cd`, `up`, `reset-root`, add `find-in` also hides unimplemented `update.builtins` removing some unnecessary `cd`s from these files produced different slurp output, because we are no longer moving away to typecheck, meaning different names are in scope at slurp time - propagate.md - sum-type-update-conflicts.md similarly, in `todo.md` it shifted the hashes for the same reason 2024-04-14 23:35:11 +03:00			We can check that encoding and then decoding should give us back the same `Text` we started with
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00
			```unison
			`checkRoundTrip: Text -> [Result]`
deprecate `cd`, `up`, `reset-root`, add `find-in` also hides unimplemented `update.builtins` removing some unnecessary `cd`s from these files produced different slurp output, because we are no longer moving away to typecheck, meaning different names are in scope at slurp time - propagate.md - sum-type-update-conflicts.md similarly, in `todo.md` it shifted the hashes for the same reason 2024-04-14 23:35:11 +03:00			`checkRoundTrip t =`
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00			`bytes = toUtf8 t`
deprecate `cd`, `up`, `reset-root`, add `find-in` also hides unimplemented `update.builtins` removing some unnecessary `cd`s from these files produced different slurp output, because we are no longer moving away to typecheck, meaning different names are in scope at slurp time - propagate.md - sum-type-update-conflicts.md similarly, in `todo.md` it shifted the hashes for the same reason 2024-04-14 23:35:11 +03:00			`match fromUtf8.impl bytes with`
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00			`Left e -> [Result.Fail "could not decode"]`
			`Right t' -> if t == t' then [Result.Ok "Passed"] else [Result.Fail ("Got: " ++ t' ++ " Expected: " ++ t)]`

			`greek = "ΑΒΓΔΕ"`

Add names for tests 2021-02-05 05:31:12 +03:00			`test> greekTest = checkRoundTrip greek`
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00			```

			If we try to decode an invalid set of bytes, we get back `Text` explaining the decoding error:

			```unison
			`greek_bytes = Bytes.fromList [206, 145, 206, 146, 206, 147, 206, 148, 206]`


			`-- Its an error if we drop the first byte`
Make IO functions that throw exceptions closes: 1796 This adds two new abilities into IOSource.hs ability Throw e where throw: e -> x ability Exception where raise Failure -> x All of the builtin functions which return an `Either Failure a` have been renamed from `foo` to `foo.impl`, and for each of these functions we implement a new function named `foo` in `IOSource.hs` which wraps the `.impl` and `raises` a Failure using tyhe above `Exception` ability. Since we already have a Exception in `.base`, which should likely coordinate this PR with one to the base repo which removes `Exception` and `Either.toExcetpion` 2021-02-02 01:10:22 +03:00			`> match fromUtf8.impl (drop 1 greek_bytes) with`
Add an Any parameter to the Failure type 2021-01-29 00:57:03 +03:00			`Left (Failure _ t _) -> t`
complete patterns in utf8 2023-01-13 18:16:55 +03:00			`_ -> bug "expected a left"`
add a transcript for the new utf8 conversions 2020-10-09 19:44:22 +03:00
			```