Most changes in this PR relate to renaming of the Anoma Nock
StandardLibrary to AnomaLibrary. This is because the Anoma library now
consists of a standard library from
[anoma.hoon](a20b5e7838/hoon/anoma.hoon)
and the resource machine library
[resource-machine.hoon](a20b5e7838/hoon/anoma.hoon).
The Anoma RM functions and value references are also added. Their
integration into the compiler pipeline will happen in a separate PR.
The Anoma Library RM functions and stdlib functions are enumerated
separately. There is a separate type for Anoma Library values because
these are compiled differently than functions.
Part of:
* https://github.com/anoma/juvix/issues/3084
This PR:
* Adds a new implementation of {decode, encode}ByteString functions,
used by anomaEncode and anomaDecode in the Core evaluator
* Adds property tests for roundtripping and benchmarks for the new
functions.
The old implementation used
[bitvec](https://hackage.haskell.org/package/bitvec) to manipulate the
ByteString. This was far too slow. The new implementation uses bit
operations directly on the input integer and ByteArray.
It's now possible to run
[anoma-app-patterns:`Tests/Swap.juvix`](https://github.com/anoma/anoma-app-patterns/blob/feature/tests/Tests/Swap.juvix)
to completion.
For encoding, if the size of the output integer exceeds 64 bits (and
therefore a BigInt must be used) then the new implementation has
quadratic time complexity in the number of input bytes if an
implementation of `ByteString -> Integer` is used as follows:
```
byteStringToIntegerLE :: ByteString -> Integer
byteStringToIntegerLE = BS.foldr (\b acc -> acc `shiftL` 8 .|. fromIntegral b) 0
```
```
byteStringToInteger' :: ByteString -> Integer
byteStringToInteger' = BS.foldl' (\acc b -> acc `shiftL` 8 .|. fromIntegral b) 0
```
I think this is because `shiftL` is expensive for large Integers. To
mitigate this I'm splitting the input ByteString into 1024 byte chunks
and processing each separately. Using this we get 100x speed up at
~0.25Mb input over the non-chunked approach and linear time-complexity
thereafter.
## Benchmarks
The benchmarks for encoding and decoding 250000 bytes:
```
ByteString Encoding to/from integer
encode bytes to integer: OK
59.1 ms ± 5.3 ms
decode bytes from integer: OK
338 ms ± 16 ms
```
The previous implementation would never complete for this input.
Benchmarks for encoding and decoding 2 * 250000 bytes:
```
ByteString Encoding to/from integer
encode bytes to integer: OK
121 ms ± 8.3 ms
decode bytes from integer: OK
651 ms ± 27 ms
```
Benchmarks for encoding and decoding 4 * 250000 bytes:
```
ByteString Encoding to/from integer
encode bytes to integer: OK
249 ms ± 17 ms
decode bytes from integer: OK
1.317 s ± 16 ms
```
---------
Co-authored-by: Lukasz Czajka <lukasz@heliax.dev>
This PR adds support for the `anoma-decode` builtin
```
builtin anoma-decode
axiom anomaDecode : {A : Type} -> Nat -> A
```
Adds:
* An implementation of the `cue` function in Haskell
* Unit tests for `cue`
* A benchmark for `cue` applied to the Anoma / nockma stdlib
Benchmark results:
```
cue (jam stdlib): OK
36.0 ms ± 2.0 ms
```
Closes:
* https://github.com/anoma/juvix/issues/2764
This PR adds support for the `anoma-encode` builtin:
```
builtin anoma-encode
axiom anomaEncode : {A : Type} -> A -> Nat
```
In the backend this is compiled to a call to the Anoma / nockma stdlib
`jam` function.
This PR also contains:
* An implementation of the `jam` function in Haskell. This is used in
the Nockma evaluator.
* Unit tests for `jam`
* A benchmark for `jam` applied to the Anoma / nockma stdlib.
Benchmark results:
```
$ juvixbench -p 'Jam'
All
Nockma
Jam
jam stdlib: OK
109 ms ± 6.2 ms
```
The following benchmark compares juvix 0.6.0 with polysemy and a new
version (implemented in this pr) which replaces polysemy by effectful.
# Typecheck standard library without caching
```
hyperfine --warmup 2 --prepare 'juvix-polysemy clean' 'juvix-polysemy typecheck Stdlib/Prelude.juvix' 'juvix-effectful typecheck Stdlib/Prelude.juvix'
Benchmark 1: juvix-polysemy typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 3.924 s ± 0.143 s [User: 3.787 s, System: 0.084 s]
Range (min … max): 3.649 s … 4.142 s 10 runs
Benchmark 2: juvix-effectful typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 2.558 s ± 0.074 s [User: 2.430 s, System: 0.084 s]
Range (min … max): 2.403 s … 2.646 s 10 runs
Summary
juvix-effectful typecheck Stdlib/Prelude.juvix ran
1.53 ± 0.07 times faster than juvix-polysemy typecheck Stdlib/Prelude.juvix
```
# Typecheck standard library with caching
```
hyperfine --warmup 1 'juvix-effectful typecheck Stdlib/Prelude.juvix' 'juvix-polysemy typecheck Stdlib/Prelude.juvix' --min-runs 20
Benchmark 1: juvix-effectful typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 1.194 s ± 0.068 s [User: 0.979 s, System: 0.211 s]
Range (min … max): 1.113 s … 1.307 s 20 runs
Benchmark 2: juvix-polysemy typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 1.237 s ± 0.083 s [User: 0.997 s, System: 0.231 s]
Range (min … max): 1.061 s … 1.476 s 20 runs
Summary
juvix-effectful typecheck Stdlib/Prelude.juvix ran
1.04 ± 0.09 times faster than juvix-polysemy typecheck Stdlib/Prelude.juvix
```
# Overview
This pr implements a simple benchmark suite to compare the efficiency of
[`effectful-core`](https://hackage.haskell.org/package/effectful-core)
and [`polysemy`](https://hackage.haskell.org/package/polysemy).
I've implemented the suite with the help of
[`tasty-bench`](https://hackage.haskell.org/package/tasty-bench). It is
a simple benchmarking library that has minimal dependencies and it can
be run with a default main using the same cli options as our
[`tasty`](https://hackage.haskell.org/package/tasty) test suite.
# How to run
```
stack run juvixbench
```
If you only want to run a particular benchmark:
```
stack run juvixbench -- -p "/Output/"
```
# Results
The results show that `effectful` is the clear winner, in some cases it
is extremely close to the raw version.
## State
This benchmark adds the first 2 ^ 22 first naturals:
```
countRaw :: Natural -> Natural
countRaw = go 0
where
go :: Natural -> Natural -> Natural
go acc = \case
0 -> acc
m -> go (acc + m) (pred m)
```
Results:
```
State
Eff State (Static): OK
25.2 ms ± 2.4 ms
Sem State: OK
2.526 s ± 5.1 ms
Raw State: OK
22.3 ms ± 1.5 ms
```
## Output
This benchmark collects the first 2 ^ 21 naturals in a list and adds
them.
```
countdownRaw :: Natural -> Natural
countdownRaw = sum' . reverse . go []
where
go :: [Natural] -> Natural -> [Natural]
go acc = \case
0 -> acc
m -> go (m : acc) (pred m)
```
Results:
```
Eff Output (Dynamic): OK
693 ms ± 61 ms
Eff Accum (Static): OK
553 ms ± 36 ms
Sem Output: OK
2.606 s ± 91 ms
Raw Output: OK
604 ms ± 26 ms
```
## Reader (First Order)
Repeats a constant in a list and adds it. The effects based version ask
the constant value in each iteration.
```
countRaw :: Natural -> Natural
countRaw = sum' . go []
where
go :: [Natural] -> Natural -> [Natural]
go acc = \case
0 -> acc
m -> go (c : acc) (pred m)
```
Results:
```
Reader (First order)
Eff Reader (Static): OK
103 ms ± 6.9 ms
Sem Reader: OK
328 ms ± 31 ms
Raw Reader: OK
106 ms ± 1.9 ms
```
## Reader (Higher Order)
Adds the first 2 ^ 21 naturals. The effects based version use `local`
(from the `Reader`) effect to pass down the argument that counts the
iterations.
```
countRaw :: Natural -> Natural
countRaw = sum' . go []
where
go :: [Natural] -> Natural -> [Natural]
go acc = \case
0 -> acc
m -> go (m : acc) (pred m)
```
Results:
```
Reader (Higher order)
Eff Reader (Static): OK
720 ms ± 56 ms
Sem Reader: OK
2.094 s ± 182 ms
Raw Reader: OK
154 ms ± 2.2 ms
```
## Embed IO
Opens a temporary file and appends a character to it a number of times.
```
countRaw :: Natural -> IO ()
countRaw n =
withSystemTempFile "tmp" $ \_ h -> go h n
where
go :: Handle -> Natural -> IO ()
go h = \case
0 -> return ()
a -> hPutChar h c >> go h (pred a)
```
Results:
```
Embed IO
Raw IO: OK
464 ms ± 12 ms
Eff RIO: OK
487 ms ± 3.5 ms
Sem Embed IO: OK
582 ms ± 33 ms
```