juvix

mirror of https://github.com/anoma/juvix.git synced 2024-12-04 17:07:28 +03:00

Author	SHA1	Message	Date
Jan Mas Rovira	6fcc9f21d2	Improve performance of formatting a project (#2863 ) Currently formatting a project is equivalent to running `juvix format` on each individual file. Hence, the performance is quadratic wrt the number of modules in the project. This pr fixes that and we now we only process each module once. # Benchmark (1236% faster 🚀) Checking the standard library ``` hyperfine --warmup 1 'juvix format --check' 'juvix-main format --check' Benchmark 1: juvix format --check Time (mean ± σ): 450.6 ms ± 33.7 ms [User: 707.2 ms, System: 178.7 ms] Range (min … max): 396.0 ms … 497.0 ms 10 runs Benchmark 2: juvix-main format --check Time (mean ± σ): 6.019 s ± 0.267 s [User: 9.333 s, System: 1.512 s] Range (min … max): 5.598 s … 6.524 s 10 runs Summary juvix format --check ran 13.36 ± 1.16 times faster than juvix-main format --check ``` # Other changes: 1. The `EntryPoint` field `entryPointModulePath` is now optional. 2. I've introduced a new type `TopModulePathKey` which is analogous to `TopModulePath` but wihout location information. It is used in hashmap keys where the location in the key is never used. This is useful as we can now get a `TopModulePathKey` from a `Path Rel File`. 3. I've refactored the `_formatInput` field in `FormatOptions` so that it doesn't need to be a special case anymore. 4. I've introduced a new effect `Forcing` that allows to individually force fields of a record type with a convenient syntax. 5. I've refactored some of the constraints in scoping so that they only require `Reader Package` instead of `Reader EntryPoint`. 6. I've introduced a new type family so that local modules are no longer required to have `ModuleId` from their type. Before, they were assigned one, but it was never used. # Future work: 1. For project-wise formatting, the compilation is done in parallel, but the formatting is still done sequentially. That should be improved.	2024-07-01 18:05:24 +02:00
Jan Mas Rovira	e9afdad82a	Parallel pipeline (#2779 ) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```	2024-05-31 12:41:30 +01:00
Jan Mas Rovira	3a4cbc742d	Replace `polysemy` by `effectful` (#2663 ) The following benchmark compares juvix 0.6.0 with polysemy and a new version (implemented in this pr) which replaces polysemy by effectful. # Typecheck standard library without caching ``` hyperfine --warmup 2 --prepare 'juvix-polysemy clean' 'juvix-polysemy typecheck Stdlib/Prelude.juvix' 'juvix-effectful typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix-polysemy typecheck Stdlib/Prelude.juvix Time (mean ± σ): 3.924 s ± 0.143 s [User: 3.787 s, System: 0.084 s] Range (min … max): 3.649 s … 4.142 s 10 runs Benchmark 2: juvix-effectful typecheck Stdlib/Prelude.juvix Time (mean ± σ): 2.558 s ± 0.074 s [User: 2.430 s, System: 0.084 s] Range (min … max): 2.403 s … 2.646 s 10 runs Summary juvix-effectful typecheck Stdlib/Prelude.juvix ran 1.53 ± 0.07 times faster than juvix-polysemy typecheck Stdlib/Prelude.juvix ``` # Typecheck standard library with caching ``` hyperfine --warmup 1 'juvix-effectful typecheck Stdlib/Prelude.juvix' 'juvix-polysemy typecheck Stdlib/Prelude.juvix' --min-runs 20 Benchmark 1: juvix-effectful typecheck Stdlib/Prelude.juvix Time (mean ± σ): 1.194 s ± 0.068 s [User: 0.979 s, System: 0.211 s] Range (min … max): 1.113 s … 1.307 s 20 runs Benchmark 2: juvix-polysemy typecheck Stdlib/Prelude.juvix Time (mean ± σ): 1.237 s ± 0.083 s [User: 0.997 s, System: 0.231 s] Range (min … max): 1.061 s … 1.476 s 20 runs Summary juvix-effectful typecheck Stdlib/Prelude.juvix ran 1.04 ± 0.09 times faster than juvix-polysemy typecheck Stdlib/Prelude.juvix ```	2024-03-21 12:09:34 +00:00
Jan Mas Rovira	b615fde186	Promote use of `MonadIO` to minimize `embed` occurrences (#2694 )	2024-03-20 09:56:00 +01:00
Łukasz Czajka	dcea0bbecb	Add field element type (#2659 ) * Closes #2571 * It is reasonable to finish this PR before tackling #2562, because the field element type is the primary data type in Cairo. * Depends on #2653 Checklist --------- - [x] Add field type and operations to intermediate representations (JuvixCore, JuvixTree, JuvixAsm, JuvixReg). - [x] Add CLI option to choose field size. - [x] Add frontend field builtins. - [x] Automatic conversion of integer literals to field elements. - [x] Juvix standard library support for fields. - [x] Check if field size matches when loading a stored module. - [x] Update the Cairo Assembly (CASM) interpreter to use the field type instead of integer type. - [x] Add field type to VampIR backend. - [x] Tests --------- Co-authored-by: Jan Mas Rovira <janmasrovira@gmail.com>	2024-02-27 14:54:43 +01:00
Paul Cadman	a091a7f63d	Update REPL artifacts with builtins from stored modules (#2639 ) Builtin information needs to be propagated from stored modules to REPL artifacts to avoid "The builtin _ has not been defined" errors. This PR adds a test suite for the REPL in the Haskell test code. This means some of the slow smoke tests can be moved to fast haskell unit tests. In future we should refactor the REPL code by putting in the main src target and unit testing more features (e.g :doc, :def). * Closes https://github.com/anoma/juvix/issues/2638	2024-02-26 16:19:04 +00:00

6 Commits