2023-03-21 22:01:48 +03:00
|
|
|
name: juvix
|
2024-09-03 20:10:01 +03:00
|
|
|
version: 0.6.6
|
2023-03-21 22:01:48 +03:00
|
|
|
license: GPL-3.0-only
|
|
|
|
license-file: LICENSE.md
|
|
|
|
copyright: (c) 2022- Heliax AG.
|
2023-07-25 11:35:26 +03:00
|
|
|
maintainer: The Compilers Team at Heliax AG <hello@heliax.dev>
|
2023-03-21 22:01:48 +03:00
|
|
|
author:
|
|
|
|
[
|
|
|
|
Jonathan Prieto-Cubides,
|
|
|
|
Jan Mas Rovira,
|
|
|
|
Paul Cadman,
|
|
|
|
Lukasz Czajka,
|
|
|
|
Github's contributors,
|
|
|
|
]
|
2024-09-19 23:02:43 +03:00
|
|
|
tested-with: ghc == 9.8.2
|
2023-03-21 22:01:48 +03:00
|
|
|
homepage: https://juvix.org
|
|
|
|
bug-reports: https://github.com/anoma/juvix/issues
|
|
|
|
description: The Juvix compiler
|
|
|
|
category: Compilers/Interpreters
|
|
|
|
github: anoma/juvix
|
2021-12-23 12:57:55 +03:00
|
|
|
|
2023-08-11 12:49:33 +03:00
|
|
|
flags:
|
|
|
|
# This flag can only be used in an environment that contains static libraries
|
|
|
|
# for all dependencies, including libc We use this when doing a static build
|
|
|
|
# using the ghc-musl alpine container
|
|
|
|
static:
|
|
|
|
description: Build static executable
|
|
|
|
default: false
|
|
|
|
manual: true
|
|
|
|
|
2021-12-23 12:57:55 +03:00
|
|
|
extra-source-files:
|
2023-03-21 22:01:48 +03:00
|
|
|
- README.md
|
|
|
|
- assets/css/*.css
|
|
|
|
- assets/js/*.js
|
|
|
|
- assets/images/*.svg
|
|
|
|
- juvix-stdlib/**/*.juvix
|
2023-10-23 11:08:44 +03:00
|
|
|
- include/package/**/*.juvix
|
2023-12-01 17:12:54 +03:00
|
|
|
- include/package-base/**/*.juvix
|
2024-05-22 13:26:51 +03:00
|
|
|
- runtime/c/include/**/*.h
|
|
|
|
- runtime/c/**/*.a
|
2024-06-07 08:57:27 +03:00
|
|
|
- runtime/rust/juvix/target/**/*.rlib
|
2024-05-22 13:26:51 +03:00
|
|
|
- runtime/tree/*.jvt
|
|
|
|
- runtime/vampir/*.pir
|
|
|
|
- runtime/casm/*.casm
|
|
|
|
- runtime/nockma/*.nockma
|
2024-10-16 12:47:23 +03:00
|
|
|
- config/config.json
|
|
|
|
- config/configure.sh
|
2021-12-23 12:57:55 +03:00
|
|
|
|
|
|
|
dependencies:
|
2023-03-21 22:01:48 +03:00
|
|
|
- aeson-better-errors == 0.9.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- aeson == 2.2.*
|
2024-06-08 15:43:33 +03:00
|
|
|
- ansi-terminal == 1.1.*
|
2024-09-19 23:02:43 +03:00
|
|
|
- base == 4.19.*
|
2023-11-23 01:21:29 +03:00
|
|
|
- base16-bytestring == 1.0.*
|
2024-10-29 19:32:59 +03:00
|
|
|
- base64-bytestring == 1.2.*
|
2024-05-14 19:45:49 +03:00
|
|
|
- bitvec == 1.1.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- blaze-html == 0.9.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- bytestring == 0.12.*
|
2023-12-30 22:15:35 +03:00
|
|
|
- cereal == 0.5.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- containers == 0.6.*
|
2023-11-23 01:21:29 +03:00
|
|
|
- cryptohash-sha256 == 0.11.*
|
2024-05-15 20:30:17 +03:00
|
|
|
- deepseq == 1.5.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- directory == 1.3.*
|
|
|
|
- dlist == 1.0.*
|
2024-05-23 15:40:05 +03:00
|
|
|
- ed25519 == 0.0.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- edit-distance == 0.2.*
|
Effectful Juvix tree evaluator (#2623)
This pr implements two additional versions of the Juvix Tree evaluator.
Now we have
1. The raw implementation that does not use effects. It throws Haskell
exceptions for errors. It uses `unsafePerformIO` for traces. It relies
on bang patterns to force strictness and guarantee the expected order of
execution (for traces). Avoiding effects allows for improved execution
efficiency.
2. [`polysemy`](https://hackage.haskell.org/package/polysemy-1.9.1.3)
based implementation.
3.
[`effectful-core`](https://hackage.haskell.org/package/effectful-core)
based implementation.
One can specify which evaluator to use thus:
```
juvix dev tree eval --evaluator XXX test.jvt
```
where XXX is one of `raw`, `polysemy`, `effectful`.
# Preliminary benchmarks
More thorough benchmarks should be run, but here are some preliminary
results:
## Test032 (Fibonacci 20)
I've adapted test032 so that it main is a single call to fibonacci of
20.
Command:
```
hyperfine --warmup 2 --runs 10 'juvix dev tree eval test032.jvt --evaluator polysemy' 'juvix dev tree eval test032.jvt --evaluator raw' 'juvix dev tree eval test032.jvt --evaluator e
ffectful'
```
Output:
```
Benchmark 1: juvix dev tree eval test032.jvt --evaluator polysemy
Time (mean ± σ): 2.133 s ± 0.040 s [User: 2.113 s, System: 0.016 s]
Range (min … max): 2.088 s … 2.227 s 10 runs
Benchmark 2: juvix dev tree eval test032.jvt --evaluator raw
Time (mean ± σ): 308.7 ms ± 13.8 ms [User: 293.6 ms, System: 14.1 ms]
Range (min … max): 286.5 ms … 330.1 ms 10 runs
Benchmark 3: juvix dev tree eval test032.jvt --evaluator effectful
Time (mean ± σ): 366.0 ms ± 2.8 ms [User: 345.4 ms, System: 19.4 ms]
Range (min … max): 362.5 ms … 372.6 ms 10 runs
Summary
juvix dev tree eval test032.jvt --evaluator raw ran
1.19 ± 0.05 times faster than juvix dev tree eval test032.jvt --evaluator effectful
6.91 ± 0.34 times faster than juvix dev tree eval test032.jvt --evaluator polysemy
```
## Test034 (Higher-order function composition)
A modified version of test034 where main defined as `call[exp](3, 12)`
Command:
```
hyperfine --warmup 2 --runs 10 'juvix dev tree eval test034.jvt --evaluator polysemy' 'juvix dev tree eval test034.jvt --evaluator raw' 'juvix dev tree eval test034.jvt --evaluator effectful'
```
Output:
```
Benchmark 1: juvix dev tree eval test034.jvt --evaluator polysemy
Time (mean ± σ): 7.025 s ± 0.184 s [User: 6.518 s, System: 0.469 s]
Range (min … max): 6.866 s … 7.327 s 10 runs
Benchmark 2: juvix dev tree eval test034.jvt --evaluator raw
Time (mean ± σ): 835.6 ms ± 7.4 ms [User: 757.2 ms, System: 75.9 ms]
Range (min … max): 824.7 ms … 847.4 ms 10 runs
Benchmark 3: juvix dev tree eval test034.jvt --evaluator effectful
Time (mean ± σ): 1.578 s ± 0.010 s [User: 1.427 s, System: 0.143 s]
Range (min … max): 1.563 s … 1.595 s 10 runs
Summary
juvix dev tree eval test034.jvt --evaluator raw ran
1.89 ± 0.02 times faster than juvix dev tree eval test034.jvt --evaluator effectful
8.41 ± 0.23 times faster than juvix dev tree eval test034.jvt --evaluator polysemy
```
## Test036 (Streams without memoization)
A modified version of test036 where main defined as `call[nth](700,
call[primes]())`
Command:
```
hyperfine --warmup 2 --runs 5 'juvix dev tree eval test036.jvt --evaluator polysemy' 'juvix dev tree eval test036.jvt --evaluator raw' 'juvix dev tree eval test036.jvt --evaluator effectful'
```
Output:
```
Benchmark 1: juvix dev tree eval test036.jvt --evaluator polysemy
Time (mean ± σ): 1.993 s ± 0.026 s [User: 1.946 s, System: 0.043 s]
Range (min … max): 1.969 s … 2.023 s 5 runs
Benchmark 2: juvix dev tree eval test036.jvt --evaluator raw
Time (mean ± σ): 137.5 ms ± 7.1 ms [User: 127.5 ms, System: 8.9 ms]
Range (min … max): 132.8 ms … 149.8 ms 5 runs
Benchmark 3: juvix dev tree eval test036.jvt --evaluator effectful
Time (mean ± σ): 329.0 ms ± 7.3 ms [User: 289.3 ms, System: 37.4 ms]
Range (min … max): 319.9 ms … 336.0 ms 5 runs
Summary
juvix dev tree eval test036.jvt --evaluator raw ran
2.39 ± 0.13 times faster than juvix dev tree eval test036.jvt --evaluator effectful
14.50 ± 0.77 times faster than juvix dev tree eval test036.jvt --evaluator polysemy
```
2024-02-08 12:53:40 +03:00
|
|
|
- effectful == 2.3.*
|
|
|
|
- effectful-core == 2.3.*
|
2024-09-19 23:02:43 +03:00
|
|
|
- effectful-th == 1.0.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- exceptions == 0.10.*
|
|
|
|
- extra == 1.7.*
|
|
|
|
- file-embed == 0.0.*
|
2023-11-16 18:19:52 +03:00
|
|
|
- filelock == 0.1.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- filepath == 1.4.*
|
Import tree (#2751)
- Contributes to #2750
# New commands:
1. `dev import-tree scan FILE`. Scans a single file and lists all the
imports in it.
2. `dev import-tree print`. Scans all files in the package and its
dependencies. Builds an import dependency tree and prints it to stdin.
If the `--stats` flag is given, it reports the number of scanned
modules, the number of unique imports, and the length of the longest
import chain.
Example: this is the truncated output of `juvix dev import-tree print
--stats` in the `juvix-stdlib` directory.
```
[...]
Stdlib/Trait/Partial.juvix imports Stdlib/Data/String/Base.juvix
Stdlib/Trait/Partial.juvix imports Stdlib/Debug/Fail.juvix
Stdlib/Trait/Show.juvix imports Stdlib/Data/String/Base.juvix
index.juvix imports Stdlib/Cairo/Poseidon.juvix
index.juvix imports Stdlib/Data/Int/Ord.juvix
index.juvix imports Stdlib/Data/Nat/Ord.juvix
index.juvix imports Stdlib/Data/String/Ord.juvix
index.juvix imports Stdlib/Prelude.juvix
Import Tree Statistics:
=======================
• Total number of modules: 56
• Total number of edges: 193
• Height (longest chain of imports): 15
```
Bot commands support the `--scan-strategy` flag, which determines which
parser we use to scan the imports. The possible values are:
1. `flatparse`. It uses the low-level
[FlatParse](https://hackage.haskell.org/package/flatparse-0.5.1.0/docs/FlatParse-Basic.html)
parsing library. This parser is made specifically to only parse imports
and ignores the rest. So we expect this to have a much better
performance. It does not have error messages.
2. `megaparsec`. It uses the normal juvix parser and we simply collect
the imports from it.
4. `flatparse-megaparsec` (default). It uses the flatparse backend and
fallbacks to megaparsec if it fails.
# Internal changes
## Megaparsec Parser (`Concrete.FromSource`)
In order to be able to run the parser during the scanning phase, I've
adjusted some of the effects used in the parser:
1. I've removed the `NameIdGen` and `Files` constraints, which were
unused.
2. I've removed `Reader EntryPoint`. It was used to get the `ModuleId`.
Now the `ModuleId` is generated during scoping.
3. I've replaced `PathResolver` by the `TopModuleNameChecker` effect.
This new effect, as the name suggests, only checks the name of the
module (same rules as we had in the `PathResolver` before). It is also
possible to ignore the effect, which is needed if we want to use this
parser without an entrypoint.
## `PathResolver` effet refactor
1. The `WithPath` command has been removed.
2. New command `ResolvePath :: ImportScan -> PathResolver m
(PackageInfo, FileExt)`. Useful for resolving imports during scanning
phase.
3. New command `WithResolverRoot :: Path Abs Dir -> m a -> PathResolver
m a`. Useful for switching package context.
4. New command `GetPackageInfos :: PathResolver m (HashMap (Path Abs
Dir) PackageInfo)` , which returns a table with all packages. Useful to
scan all dependencies.
The `Package.PathResolver` has been refactored to be more like to normal
`PathResolver`. We've discussed with @paulcadman the possibility to try
to unify both implementations in the near future.
## Misc
1. `Package.juvix` no longer ends up in
`PackageInfo.packageRelativeFiles`.
1. I've introduced string definitions for `--`, `{-` and `-}`.
2. I've fixed a bug were `.juvix.md` was detected as an invalid
extension.
3. I've added `LazyHashMap` to the prelude. I've also added `ordSet` to
create ordered Sets, `ordMap` for ordered maps, etc.
# Benchmarks
I've profiled `juvix dev import-tree --scan-strategy [megaparsec |
flatparse] --stats` with optimization enabled.
In the images below we see that in the megaparsec case, the scanning
takes 54.8% of the total time, whereas in the flatparse case it only
takes 9.6% of the total time.
- **Megaparsec**
![image](https://github.com/anoma/juvix/assets/5511599/05ec42cf-d79d-4bbf-b462-c0e48593fe51)
- **Flatparse**
![image](https://github.com/anoma/juvix/assets/5511599/1d7b363c-a915-463c-8dc4-613ab4b7d473)
## Hyperfine
```
hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 20
Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats
Time (mean ± σ): 82.0 ms ± 4.5 ms [User: 64.8 ms, System: 17.3 ms]
Range (min … max): 77.0 ms … 102.4 ms 37 runs
Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats
Time (mean ± σ): 174.1 ms ± 2.7 ms [User: 157.5 ms, System: 16.8 ms]
Range (min … max): 169.7 ms … 181.5 ms 20 runs
Summary
juvix dev import-tree print --scan-strategy flatparse --stats ran
2.12 ± 0.12 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats
```
In order to compare (almost) only the parsing, I've forced the scanning
of each file to be performed 50 times (so that the cost of other parts
get swallowed). Here are the results:
```
hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 10
Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats
Time (mean ± σ): 189.5 ms ± 3.6 ms [User: 161.7 ms, System: 27.6 ms]
Range (min … max): 185.1 ms … 197.1 ms 15 runs
Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats
Time (mean ± σ): 5.113 s ± 0.023 s [User: 5.084 s, System: 0.035 s]
Range (min … max): 5.085 s … 5.148 s 10 runs
Summary
juvix dev import-tree print --scan-strategy flatparse --stats ran
26.99 ± 0.52 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats
```
2024-05-14 11:53:33 +03:00
|
|
|
- flatparse == 0.5.*
|
2024-09-19 23:02:43 +03:00
|
|
|
- ghc == 9.8.2
|
2023-08-23 17:53:23 +03:00
|
|
|
- githash == 0.1.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- hashable == 1.4.*
|
|
|
|
- language-c == 0.9.*
|
2023-08-11 11:22:22 +03:00
|
|
|
- libyaml == 0.1.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- megaparsec == 9.6.*
|
2023-11-10 15:55:36 +03:00
|
|
|
- commonmark == 0.2.*
|
|
|
|
- parsec == 3.1.*
|
2024-07-10 00:39:19 +03:00
|
|
|
- lens == 5.2.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- parser-combinators == 1.3.*
|
|
|
|
- path == 0.9.*
|
2023-08-11 12:49:33 +03:00
|
|
|
- path-io == 1.8.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- pretty == 1.1.*
|
|
|
|
- prettyprinter == 1.7.*
|
|
|
|
- prettyprinter-ansi-terminal == 1.1.*
|
2024-06-08 15:43:33 +03:00
|
|
|
- primitive == 0.9.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- process == 1.6.*
|
2024-10-29 21:23:37 +03:00
|
|
|
- random == 1.2.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- safe == 0.3.*
|
2024-04-09 12:43:57 +03:00
|
|
|
- scientific == 0.3.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- singletons == 3.0.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- singletons-base == 3.3.*
|
|
|
|
- singletons-th == 3.3.*
|
2024-10-29 21:23:37 +03:00
|
|
|
- splitmix == 0.1.*
|
Parallel pipeline (#2779)
This pr introduces parallelism in the pipeline to gain performance. I've
included benchmarks at the end.
- Closes #2750.
# Flags:
There are two new global flags:
1. `-N / --threads`. It is used to set the number of capabilities.
According to [GHC
documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities):
_Set the number of Haskell threads that can run truly simultaneously (on
separate physical processors) at any given time_. When compiling in
parallel, we create this many worker threads. The default value is `-N
auto`, which sets `-N` to half the number of logical cores, capped at 8.
2. `--dev-show-thread-ids`. When given, the thread id is printed in the
compilation progress log. E.g.
![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034)
# Parallel compilation
1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the
concurrency related code. I think it is good to keep this code separated
from the actual compiler code.
2. I've added a progress log (only for the parallel driver) that outputs
a log of the compilation progress, similar to what stack/cabal do.
# Code changes:
1. I've removed the `setup` stage where we were registering
dependencies. Instead, the dependencies are registered when the
`pathResolver` is run for the first time. This way it is safer.
1. Now the `ImportTree` is needed to run the pipeline. Cycles are
detected during the construction of this tree, so I've removed `Reader
ImportParents` from the pipeline.
3. For the package pathresolver, we do not support parallelism yet (we
could add support for it in the future, but the gains will be small).
4. When `-N1`, the pipeline remains unchanged, so performance should be
the same as in the main branch (except there is a small performance
degradation due to adding the `-threaded` flag).
5. I've introduced `PipelineOptions`, which are options that are used to
pass options to the effects in the pipeline.
6. `PathResolver` constraint has been removed from the `upTo*` functions
in the pipeline due to being redundant.
7. I've added a lot of `NFData` instances. They are needed to force the
full evaluation of `Stored.ModuleInfo` in each of the threads.
2. The `Cache` effect uses
[`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html)
as opposed to
[`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html).
Perhaps we should provide different versions.
3. I've added a `Cache` handler that accepts a setup function. The setup
is triggered when a miss is detected. It is used to lazily compile the
modules in parallel.
# Tests
1. I've adapted the smoke test suite to ignore the progress log in the
stderr.
5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to
laziness, a crash happening in this file was not being caught. The
problem is that in this file we have a lambda function with different
number of patterns in their clauses, which we currently do not support
(https://github.com/anoma/juvix/issues/1706).
6. I've had to comment out the definition
```
x : Box ((A : Type) → A → A) := box λ {A a := a};
```
From the test as it was causing a crash
(https://github.com/anoma/juvix/issues/2247).
# Future Work
1. It should be investigated how much performance we lose by fully
evaluating the `Stored.ModuleInfo`, since some information in it will be
discarded. It may be possible to be more fine-grained when forcing
evaluation.
8. The scanning of imports to build the import tree is sequential. Now,
we build the import tree from the entry point module and only the
modules that are imported from it are in the tree. However, we have
discussed that at some point we should make a distinction between
`juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix`
as a build tool it makes sense to typecheck/compile (to stored core) all
modules in the project. When/if we do this, scanning imports in all
modules in parallel becomes trivial.
9. The implementation of the `ParallelTemplate` uses low level
primitives such as
[forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO).
At some point it should be refactored to use safer functions from the
[`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html)
module.
10. The number of cores and worker threads that we spawn is determined
by the command line. Ideally, we could use to import tree to compute an
upper bound to the ideal number of cores to use.
11. We could add an animation that displays which modules are being
compiled in parallel and which have finished being compiled.
# Benchmarks
On some benchmarks, I include the GHC runtime option
[`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9),
which sometimes makes a good impact on performance. Thanks to
@paulcadman for pointing this out. I've figured a good combination of
`-N` and `-A` through trial and error (but this oviously depends on the
cpu and juvix projects).
## Typecheck the standard library
### Clean run (88% faster than main):
```
hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix'
Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms]
Range (min … max): 432.6 ms … 455.9 ms 10 runs
Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms]
Range (min … max): 584.7 ms … 670.6 ms 10 runs
Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms]
Range (min … max): 816.0 ms … 853.6 ms 10 runs
Summary
juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran
1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix
1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix
```
### Cached run (43% faster than main):
```
hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix'
Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms]
Range (min … max): 231.5 ms … 251.3 ms 11 runs
Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms]
Range (min … max): 216.1 ms … 253.1 ms 12 runs
Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms]
Range (min … max): 316.9 ms … 351.8 ms 10 runs
Summary
juvix -N 4 typecheck Stdlib/Prelude.juvix ran
1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix
```
## Typecheck the test suite of the containers library
At the moment this is the biggest juvix project that we have.
### Clean run (105% faster than main)
```
hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix'
Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864
Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s]
Range (min … max): 0.991 s … 1.023 s 10 runs
Benchmark 2: juvix -N 4 typecheck Main.juvix
Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s]
Range (min … max): 1.535 s … 1.660 s 10 runs
Benchmark 3: juvix-main typecheck Main.juvix
Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s]
Range (min … max): 2.048 s … 2.077 s 10 runs
Summary
juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran
1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix
2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix
```
### Cached run (54% faster than main)
```
hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix'
Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432
Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms]
Range (min … max): 535.2 ms … 570.6 ms 10 runs
Benchmark 2: juvix -N 4 typecheck Main.juvix
Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms]
Range (min … max): 601.6 ms … 655.3 ms 10 runs
Benchmark 3: juvix-main typecheck Main.juvix
Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms]
Range (min … max): 731.1 ms … 890.0 ms 10 runs
Summary
juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran
1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix
1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix
```
2024-05-31 14:41:30 +03:00
|
|
|
- stm == 2.5.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- Stream == 0.4.*
|
2023-10-02 18:51:14 +03:00
|
|
|
- string-interpolate == 0.3.*
|
2024-09-19 23:02:43 +03:00
|
|
|
- template-haskell == 2.21.*
|
2023-03-29 16:51:04 +03:00
|
|
|
- temporary == 1.3.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- text == 2.1.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- th-utilities == 0.2.*
|
2023-08-11 12:49:33 +03:00
|
|
|
- time == 1.12.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- transformers == 0.6.*
|
2023-09-01 14:37:06 +03:00
|
|
|
- typed-process == 0.2.*
|
2023-05-24 22:35:07 +03:00
|
|
|
- unicode-show == 0.1.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- uniplate == 1.6.*
|
2023-08-11 12:49:33 +03:00
|
|
|
- unix-compat == 0.7.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- unordered-containers == 0.2.*
|
|
|
|
- utf8-string == 1.0.*
|
2024-01-12 14:57:02 +03:00
|
|
|
- vector == 0.13.*
|
2024-05-14 19:45:49 +03:00
|
|
|
- vector-builder == 0.3.*
|
2023-08-11 12:49:33 +03:00
|
|
|
- versions == 6.0.*
|
2023-04-13 12:27:39 +03:00
|
|
|
- xdg-basedir == 0.2.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- yaml == 0.11.*
|
2021-12-23 12:57:55 +03:00
|
|
|
|
2023-03-21 22:01:48 +03:00
|
|
|
# the tasty dependencies are here to avoid having to recompile
|
|
|
|
# juvix when running the tests.
|
|
|
|
- tasty
|
|
|
|
- tasty-hunit
|
2024-02-07 12:47:48 +03:00
|
|
|
- Diff == 0.5.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- pretty-show == 1.10.*
|
2024-08-30 20:20:18 +03:00
|
|
|
- hedgehog == 1.4.*
|
|
|
|
- tasty-hedgehog == 1.4.*
|
2022-02-16 12:03:49 +03:00
|
|
|
|
2023-03-21 22:01:48 +03:00
|
|
|
# benchmarks
|
2023-08-11 12:49:33 +03:00
|
|
|
- criterion == 1.6.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- statistics == 0.16.*
|
|
|
|
- shake == 0.19.*
|
|
|
|
- colour == 2.3.*
|
|
|
|
- palette == 0.3.*
|
2023-01-05 19:48:26 +03:00
|
|
|
|
2021-12-23 12:57:55 +03:00
|
|
|
ghc-options:
|
2023-03-21 22:01:48 +03:00
|
|
|
# Warnings
|
|
|
|
- -Weverything
|
|
|
|
- -Wno-all-missed-specialisations
|
2024-02-09 16:59:42 +03:00
|
|
|
- -Wno-missed-specialisations
|
2023-03-21 22:01:48 +03:00
|
|
|
- -Wno-missing-export-lists
|
|
|
|
- -Wno-missing-import-lists
|
|
|
|
- -Wno-missing-kind-signatures
|
|
|
|
- -Wno-missing-safe-haskell-mode
|
2024-02-07 12:47:48 +03:00
|
|
|
- -Wno-missing-role-annotations
|
|
|
|
- -Wno-missing-poly-kind-signatures
|
2023-03-21 22:01:48 +03:00
|
|
|
- -Wno-safe
|
|
|
|
- -Wno-unsafe
|
|
|
|
- -Wno-unused-packages
|
|
|
|
# HIE Support
|
|
|
|
- -fhide-source-paths
|
|
|
|
- -fwrite-ide-info -hiedir=.hie
|
|
|
|
# Polysemy Support
|
|
|
|
- -O2 -flate-specialise -fspecialise-aggressively
|
2021-12-23 12:57:55 +03:00
|
|
|
|
|
|
|
default-extensions:
|
2023-03-21 22:01:48 +03:00
|
|
|
- ApplicativeDo
|
|
|
|
- DataKinds
|
|
|
|
- DerivingStrategies
|
|
|
|
- GADTs
|
|
|
|
- GeneralizedNewtypeDeriving
|
|
|
|
- ImportQualifiedPost
|
|
|
|
- LambdaCase
|
|
|
|
- MultiWayIf
|
|
|
|
- NoFieldSelectors
|
|
|
|
- NoImplicitPrelude
|
|
|
|
- OverloadedStrings
|
2024-01-19 14:01:58 +03:00
|
|
|
- PatternSynonyms
|
2024-01-17 13:15:38 +03:00
|
|
|
- QuasiQuotes
|
2023-03-21 22:01:48 +03:00
|
|
|
- RecordWildCards
|
|
|
|
- TemplateHaskell
|
|
|
|
- TypeFamilyDependencies
|
2021-12-23 12:57:55 +03:00
|
|
|
|
|
|
|
library:
|
|
|
|
source-dirs: src
|
2022-03-24 12:14:29 +03:00
|
|
|
verbatim:
|
|
|
|
default-language: GHC2021
|
2021-12-23 12:57:55 +03:00
|
|
|
|
|
|
|
executables:
|
2024-02-14 17:12:39 +03:00
|
|
|
juvixbench:
|
|
|
|
main: Main.hs
|
|
|
|
source-dirs: bench2
|
|
|
|
dependencies:
|
|
|
|
- juvix
|
|
|
|
- tasty-bench == 0.3.*
|
2024-03-21 15:09:34 +03:00
|
|
|
- polysemy == 1.9.*
|
2024-08-30 20:20:18 +03:00
|
|
|
- random
|
2024-02-14 17:12:39 +03:00
|
|
|
verbatim:
|
|
|
|
default-language: GHC2021
|
2022-07-08 14:59:45 +03:00
|
|
|
juvix:
|
2021-12-23 12:57:55 +03:00
|
|
|
main: Main.hs
|
2022-01-18 14:25:42 +03:00
|
|
|
source-dirs: app
|
2021-12-23 12:57:55 +03:00
|
|
|
dependencies:
|
2023-03-21 22:01:48 +03:00
|
|
|
- juvix
|
|
|
|
- haskeline == 0.8.*
|
|
|
|
- http-conduit == 2.3.*
|
2024-02-07 12:47:48 +03:00
|
|
|
- mtl == 2.3.*
|
|
|
|
- optparse-applicative == 0.18.*
|
2023-03-21 22:01:48 +03:00
|
|
|
- repline == 0.4.*
|
|
|
|
- string-interpolate == 0.3.*
|
2022-03-24 12:14:29 +03:00
|
|
|
verbatim:
|
|
|
|
default-language: GHC2021
|
Parallel pipeline (#2779)
This pr introduces parallelism in the pipeline to gain performance. I've
included benchmarks at the end.
- Closes #2750.
# Flags:
There are two new global flags:
1. `-N / --threads`. It is used to set the number of capabilities.
According to [GHC
documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities):
_Set the number of Haskell threads that can run truly simultaneously (on
separate physical processors) at any given time_. When compiling in
parallel, we create this many worker threads. The default value is `-N
auto`, which sets `-N` to half the number of logical cores, capped at 8.
2. `--dev-show-thread-ids`. When given, the thread id is printed in the
compilation progress log. E.g.
![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034)
# Parallel compilation
1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the
concurrency related code. I think it is good to keep this code separated
from the actual compiler code.
2. I've added a progress log (only for the parallel driver) that outputs
a log of the compilation progress, similar to what stack/cabal do.
# Code changes:
1. I've removed the `setup` stage where we were registering
dependencies. Instead, the dependencies are registered when the
`pathResolver` is run for the first time. This way it is safer.
1. Now the `ImportTree` is needed to run the pipeline. Cycles are
detected during the construction of this tree, so I've removed `Reader
ImportParents` from the pipeline.
3. For the package pathresolver, we do not support parallelism yet (we
could add support for it in the future, but the gains will be small).
4. When `-N1`, the pipeline remains unchanged, so performance should be
the same as in the main branch (except there is a small performance
degradation due to adding the `-threaded` flag).
5. I've introduced `PipelineOptions`, which are options that are used to
pass options to the effects in the pipeline.
6. `PathResolver` constraint has been removed from the `upTo*` functions
in the pipeline due to being redundant.
7. I've added a lot of `NFData` instances. They are needed to force the
full evaluation of `Stored.ModuleInfo` in each of the threads.
2. The `Cache` effect uses
[`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html)
as opposed to
[`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html).
Perhaps we should provide different versions.
3. I've added a `Cache` handler that accepts a setup function. The setup
is triggered when a miss is detected. It is used to lazily compile the
modules in parallel.
# Tests
1. I've adapted the smoke test suite to ignore the progress log in the
stderr.
5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to
laziness, a crash happening in this file was not being caught. The
problem is that in this file we have a lambda function with different
number of patterns in their clauses, which we currently do not support
(https://github.com/anoma/juvix/issues/1706).
6. I've had to comment out the definition
```
x : Box ((A : Type) → A → A) := box λ {A a := a};
```
From the test as it was causing a crash
(https://github.com/anoma/juvix/issues/2247).
# Future Work
1. It should be investigated how much performance we lose by fully
evaluating the `Stored.ModuleInfo`, since some information in it will be
discarded. It may be possible to be more fine-grained when forcing
evaluation.
8. The scanning of imports to build the import tree is sequential. Now,
we build the import tree from the entry point module and only the
modules that are imported from it are in the tree. However, we have
discussed that at some point we should make a distinction between
`juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix`
as a build tool it makes sense to typecheck/compile (to stored core) all
modules in the project. When/if we do this, scanning imports in all
modules in parallel becomes trivial.
9. The implementation of the `ParallelTemplate` uses low level
primitives such as
[forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO).
At some point it should be refactored to use safer functions from the
[`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html)
module.
10. The number of cores and worker threads that we spawn is determined
by the command line. Ideally, we could use to import tree to compute an
upper bound to the ideal number of cores to use.
11. We could add an animation that displays which modules are being
compiled in parallel and which have finished being compiled.
# Benchmarks
On some benchmarks, I include the GHC runtime option
[`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9),
which sometimes makes a good impact on performance. Thanks to
@paulcadman for pointing this out. I've figured a good combination of
`-N` and `-A` through trial and error (but this oviously depends on the
cpu and juvix projects).
## Typecheck the standard library
### Clean run (88% faster than main):
```
hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix'
Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms]
Range (min … max): 432.6 ms … 455.9 ms 10 runs
Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms]
Range (min … max): 584.7 ms … 670.6 ms 10 runs
Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms]
Range (min … max): 816.0 ms … 853.6 ms 10 runs
Summary
juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran
1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix
1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix
```
### Cached run (43% faster than main):
```
hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix'
Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms]
Range (min … max): 231.5 ms … 251.3 ms 11 runs
Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms]
Range (min … max): 216.1 ms … 253.1 ms 12 runs
Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix
Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms]
Range (min … max): 316.9 ms … 351.8 ms 10 runs
Summary
juvix -N 4 typecheck Stdlib/Prelude.juvix ran
1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432
1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix
```
## Typecheck the test suite of the containers library
At the moment this is the biggest juvix project that we have.
### Clean run (105% faster than main)
```
hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix'
Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864
Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s]
Range (min … max): 0.991 s … 1.023 s 10 runs
Benchmark 2: juvix -N 4 typecheck Main.juvix
Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s]
Range (min … max): 1.535 s … 1.660 s 10 runs
Benchmark 3: juvix-main typecheck Main.juvix
Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s]
Range (min … max): 2.048 s … 2.077 s 10 runs
Summary
juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran
1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix
2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix
```
### Cached run (54% faster than main)
```
hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix'
Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432
Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms]
Range (min … max): 535.2 ms … 570.6 ms 10 runs
Benchmark 2: juvix -N 4 typecheck Main.juvix
Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms]
Range (min … max): 601.6 ms … 655.3 ms 10 runs
Benchmark 3: juvix-main typecheck Main.juvix
Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms]
Range (min … max): 731.1 ms … 890.0 ms 10 runs
Summary
juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran
1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix
1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix
```
2024-05-31 14:41:30 +03:00
|
|
|
ghc-options:
|
|
|
|
- -threaded
|
|
|
|
# We enable rtsopts because we've found that tweaking the -A flag can lead
|
|
|
|
# to great performance gains. However, GHC's decumentation warns that
|
|
|
|
# enabling this may cause security problems: "...can be used to write logging
|
|
|
|
# data to arbitrary files under the security context of the running
|
|
|
|
# program..."
|
|
|
|
- -rtsopts
|
|
|
|
# We set -N1 to avoid spending time in thread initialization. We manually
|
|
|
|
# set the number of cores we want to use through the juvix -N global flag.
|
|
|
|
- -with-rtsopts=-N1
|
2023-08-11 12:49:33 +03:00
|
|
|
when:
|
|
|
|
- condition: flag(static)
|
|
|
|
ld-options:
|
|
|
|
- -static
|
|
|
|
- -pthread
|
2022-01-11 12:09:08 +03:00
|
|
|
|
|
|
|
tests:
|
2022-07-08 14:59:45 +03:00
|
|
|
juvix-test:
|
2023-03-21 22:01:48 +03:00
|
|
|
main: Main.hs
|
|
|
|
source-dirs: test
|
2022-01-11 12:09:08 +03:00
|
|
|
dependencies:
|
2023-03-21 22:01:48 +03:00
|
|
|
- juvix
|
2022-03-24 12:14:29 +03:00
|
|
|
verbatim:
|
|
|
|
default-language: GHC2021
|
2023-11-16 18:19:52 +03:00
|
|
|
ghc-options:
|
|
|
|
- -threaded
|
2023-01-05 19:48:26 +03:00
|
|
|
|
|
|
|
benchmarks:
|
|
|
|
juvix-bench:
|
2023-03-21 22:01:48 +03:00
|
|
|
main: Main.hs
|
|
|
|
source-dirs: bench
|
2023-01-05 19:48:26 +03:00
|
|
|
dependencies:
|
2023-03-21 22:01:48 +03:00
|
|
|
- juvix
|
2023-01-05 19:48:26 +03:00
|
|
|
verbatim:
|
|
|
|
default-language: GHC2021
|