1
1
mirror of https://github.com/anoma/juvix.git synced 2024-12-12 04:43:18 +03:00
juvix/app/CommonOptions.hs

346 lines
10 KiB
Haskell
Raw Normal View History

2022-09-14 17:16:15 +03:00
-- | Contains common options reused in several commands.
module CommonOptions
( module CommonOptions,
module Juvix.Prelude,
Parallel pipeline (#2779) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```
2024-05-31 14:41:30 +03:00
module Parallel.ProgressLog,
2022-09-14 17:16:15 +03:00
module Options.Applicative,
)
where
import Control.Exception qualified as GHC
import Data.List.NonEmpty qualified as NonEmpty
Parallel pipeline (#2779) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```
2024-05-31 14:41:30 +03:00
import GHC.Conc
import Juvix.Compiler.Casm.Data.TransformationId.Parser qualified as Casm
Import tree (#2751) - Contributes to #2750 # New commands: 1. `dev import-tree scan FILE`. Scans a single file and lists all the imports in it. 2. `dev import-tree print`. Scans all files in the package and its dependencies. Builds an import dependency tree and prints it to stdin. If the `--stats` flag is given, it reports the number of scanned modules, the number of unique imports, and the length of the longest import chain. Example: this is the truncated output of `juvix dev import-tree print --stats` in the `juvix-stdlib` directory. ``` [...] Stdlib/Trait/Partial.juvix imports Stdlib/Data/String/Base.juvix Stdlib/Trait/Partial.juvix imports Stdlib/Debug/Fail.juvix Stdlib/Trait/Show.juvix imports Stdlib/Data/String/Base.juvix index.juvix imports Stdlib/Cairo/Poseidon.juvix index.juvix imports Stdlib/Data/Int/Ord.juvix index.juvix imports Stdlib/Data/Nat/Ord.juvix index.juvix imports Stdlib/Data/String/Ord.juvix index.juvix imports Stdlib/Prelude.juvix Import Tree Statistics: ======================= • Total number of modules: 56 • Total number of edges: 193 • Height (longest chain of imports): 15 ``` Bot commands support the `--scan-strategy` flag, which determines which parser we use to scan the imports. The possible values are: 1. `flatparse`. It uses the low-level [FlatParse](https://hackage.haskell.org/package/flatparse-0.5.1.0/docs/FlatParse-Basic.html) parsing library. This parser is made specifically to only parse imports and ignores the rest. So we expect this to have a much better performance. It does not have error messages. 2. `megaparsec`. It uses the normal juvix parser and we simply collect the imports from it. 4. `flatparse-megaparsec` (default). It uses the flatparse backend and fallbacks to megaparsec if it fails. # Internal changes ## Megaparsec Parser (`Concrete.FromSource`) In order to be able to run the parser during the scanning phase, I've adjusted some of the effects used in the parser: 1. I've removed the `NameIdGen` and `Files` constraints, which were unused. 2. I've removed `Reader EntryPoint`. It was used to get the `ModuleId`. Now the `ModuleId` is generated during scoping. 3. I've replaced `PathResolver` by the `TopModuleNameChecker` effect. This new effect, as the name suggests, only checks the name of the module (same rules as we had in the `PathResolver` before). It is also possible to ignore the effect, which is needed if we want to use this parser without an entrypoint. ## `PathResolver` effet refactor 1. The `WithPath` command has been removed. 2. New command `ResolvePath :: ImportScan -> PathResolver m (PackageInfo, FileExt)`. Useful for resolving imports during scanning phase. 3. New command `WithResolverRoot :: Path Abs Dir -> m a -> PathResolver m a`. Useful for switching package context. 4. New command `GetPackageInfos :: PathResolver m (HashMap (Path Abs Dir) PackageInfo)` , which returns a table with all packages. Useful to scan all dependencies. The `Package.PathResolver` has been refactored to be more like to normal `PathResolver`. We've discussed with @paulcadman the possibility to try to unify both implementations in the near future. ## Misc 1. `Package.juvix` no longer ends up in `PackageInfo.packageRelativeFiles`. 1. I've introduced string definitions for `--`, `{-` and `-}`. 2. I've fixed a bug were `.juvix.md` was detected as an invalid extension. 3. I've added `LazyHashMap` to the prelude. I've also added `ordSet` to create ordered Sets, `ordMap` for ordered maps, etc. # Benchmarks I've profiled `juvix dev import-tree --scan-strategy [megaparsec | flatparse] --stats` with optimization enabled. In the images below we see that in the megaparsec case, the scanning takes 54.8% of the total time, whereas in the flatparse case it only takes 9.6% of the total time. - **Megaparsec** ![image](https://github.com/anoma/juvix/assets/5511599/05ec42cf-d79d-4bbf-b462-c0e48593fe51) - **Flatparse** ![image](https://github.com/anoma/juvix/assets/5511599/1d7b363c-a915-463c-8dc4-613ab4b7d473) ## Hyperfine ``` hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 20 Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats Time (mean ± σ): 82.0 ms ± 4.5 ms [User: 64.8 ms, System: 17.3 ms] Range (min … max): 77.0 ms … 102.4 ms 37 runs Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats Time (mean ± σ): 174.1 ms ± 2.7 ms [User: 157.5 ms, System: 16.8 ms] Range (min … max): 169.7 ms … 181.5 ms 20 runs Summary juvix dev import-tree print --scan-strategy flatparse --stats ran 2.12 ± 0.12 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats ``` In order to compare (almost) only the parsing, I've forced the scanning of each file to be performed 50 times (so that the cost of other parts get swallowed). Here are the results: ``` hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 10 Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats Time (mean ± σ): 189.5 ms ± 3.6 ms [User: 161.7 ms, System: 27.6 ms] Range (min … max): 185.1 ms … 197.1 ms 15 runs Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats Time (mean ± σ): 5.113 s ± 0.023 s [User: 5.084 s, System: 0.035 s] Range (min … max): 5.085 s … 5.148 s 10 runs Summary juvix dev import-tree print --scan-strategy flatparse --stats ran 26.99 ± 0.52 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats ```
2024-05-14 11:53:33 +03:00
import Juvix.Compiler.Concrete.Translation.ImportScanner
import Juvix.Compiler.Core.Data.TransformationId.Parser qualified as Core
import Juvix.Compiler.Pipeline.EntryPoint
import Juvix.Compiler.Reg.Data.TransformationId.Parser qualified as Reg
import Juvix.Compiler.Tree.Data.TransformationId.Parser qualified as Tree
import Juvix.Data.Field
2022-09-14 17:16:15 +03:00
import Juvix.Prelude
Make `compile` targets a subcommand instead of a flag (#2700) # Changes The main goal of this pr is to remove the `--target` flag for `juvix compile` and use subcommands instead. The targets that are relevant to normal users are found in `juvix compile --help`. Targets that are relevant only to developers are found in `juvix dev compile --help`. Below I list some of the changes in more detail. ## Compile targets for user-facing languages - `juvix compile native` - `juvix compile wasi`. I wasn't sure how to call this: `wasm`, `wasm32-wasi`, etc. In the end I thought `wasi` was short and accurate, but we can change it. - `juvix compile vampir` - `juvix compile anoma` - `juvix compile cairo` ## *New* compile targets for internal languages See `juvix dev compile --help`. 1. `dev compile core` has the same behaviour as `dev core from-concrete`. The `dev core from-concrete` is redundant at the moment. 2. `dev compile tree` compiles to Tree and prints the InfoTable to the output file wihout any additional checks. 3. `dev compile reg` compiles to Reg and prints the InfoTable to the output file wihout any additional checks. 4. `dev compile asm` compiles to Asm and prints the InfoTable to the output file wihout any additional checks. 5. 4. `dev compile casm` compiles to Asm and prints the Result to the output file wihout any additional checks. TODO: should the Result be printed or something else? At the moment the Result lacks a pretty instance. 6. ## Optional input file 1. The input file for commands that expect a .juvix file as input is now optional. If the argument is ommited, he main file given in the package.yaml will be used. This applies to the following commands: 1. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 8. `juvix dev compile [core|reg|tree|casm|asm]` 1. `juvix html` 3. `juvix markdown`. 4. `juvix dev internal [typecheck|pretty]`. 5. `juvix dev [parse|scope]` 7. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 9. note that `juvix format` has not changed its behaviour. ## Refactor some C-like compiler flags Both `juvix compile native` and `juvix compile wasi` support `--only-c` (`-C`), `--only-preprocess` (`-E`), `--only-assemble` (`-S`). I propose to deviate from the `gcc` style and instead use a flag with a single argument: - `--cstage [source|preprocess|assembly|exec(default)]`. I'm open to suggestions. For now, I've kept the legacy flags but marked them as deprecated in the help message. ## Remove code duplication I've tried to reduce code duplication. This is sometimes in tension with code readability so I've tried to find a good balance. I've tried to make it so we don't have to jump to many different files to understand what a single command is doing. I'm sure there is still room for improvement. ## Other refactors I've implemented other small refactors that I considered improved the quality of the code. ## TODO/Future work We should refactor commands (under `compile dev`) which still use `module Commands.Extra.Compile` and remove it.
2024-04-09 14:29:07 +03:00
import Juvix.Prelude as Juvix
2022-09-14 17:16:15 +03:00
import Options.Applicative
Parallel pipeline (#2779) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```
2024-05-31 14:41:30 +03:00
import Parallel.ProgressLog
2022-09-14 17:16:15 +03:00
import System.Process
import Text.Read (readMaybe)
2022-09-14 17:16:15 +03:00
import Prelude (show)
-- | Paths that are input are used to detect the root of the project.
2022-12-20 15:05:40 +03:00
data AppPath f = AppPath
{ _pathPath :: Prepath f,
2022-09-14 17:16:15 +03:00
_pathIsInput :: Bool
}
deriving stock (Data, Eq)
2022-09-14 17:16:15 +03:00
2022-12-20 15:05:40 +03:00
makeLenses ''AppPath
2022-09-14 17:16:15 +03:00
2022-12-20 15:05:40 +03:00
instance Show (AppPath f) where
show = Prelude.show . (^. pathPath)
2022-09-14 17:16:15 +03:00
parseInputFiles :: NonEmpty FileExt -> Parser (AppPath File)
parseInputFiles exts' = do
let exts = NonEmpty.toList exts'
mvars = intercalate "|" (map toMetavar exts)
dotExts = intercalate ", " (map Prelude.show exts)
helpMsg = "Path to a " <> dotExts <> " file"
completers = foldMap (completer . extCompleter) exts
2022-09-14 17:16:15 +03:00
_pathPath <-
argument
somePreFileOpt
( metavar mvars
<> help helpMsg
<> completers
<> action "file"
2022-09-29 18:44:55 +03:00
)
2022-12-20 15:05:40 +03:00
pure AppPath {_pathIsInput = True, ..}
2022-09-29 18:44:55 +03:00
parseInputFile :: FileExt -> Parser (AppPath File)
parseInputFile = parseInputFiles . NonEmpty.singleton
2022-11-03 11:38:09 +03:00
Parallel pipeline (#2779) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```
2024-05-31 14:41:30 +03:00
numThreadsOpt :: ReadM NumThreads
numThreadsOpt = eitherReader readNumThreads
parseNumThreads :: Parser NumThreads
parseNumThreads = do
option
numThreadsOpt
( long "threads"
<> short 'N'
<> metavar "THREADS"
<> value defaultNumThreads
<> showDefault
<> help "Number of physical threads to run"
<> completer (listCompleter (Juvix.show NumThreadsAuto : [Juvix.show j | j <- [1 .. numCapabilities]]))
)
parseProgramInputFile :: Parser (AppPath File)
parseProgramInputFile = do
_pathPath <-
option
somePreFileOpt
( long "program_input"
<> metavar "JSON_FILE"
<> help "Path to program input json file"
<> completer (extCompleter FileExtJson)
<> action "file"
)
pure AppPath {_pathIsInput = True, ..}
parseGenericInputFile :: Parser (AppPath File)
parseGenericInputFile = do
_pathPath <-
argument
somePreFileOpt
( metavar "INPUT_FILE"
<> help "Path to input file"
<> action "file"
)
pure AppPath {_pathIsInput = True, ..}
2022-12-20 15:05:40 +03:00
parseGenericOutputFile :: Parser (AppPath File)
2022-09-14 17:16:15 +03:00
parseGenericOutputFile = do
_pathPath <-
option
somePreFileOpt
2022-09-14 17:16:15 +03:00
( long "output"
<> short 'o'
<> metavar "OUTPUT_FILE"
<> help "Path to output file"
<> action "file"
)
2022-12-20 15:05:40 +03:00
pure AppPath {_pathIsInput = False, ..}
2022-09-14 17:16:15 +03:00
parseGenericOutputDir :: Mod OptionFields (Prepath Dir) -> Parser (AppPath Dir)
2022-09-14 17:16:15 +03:00
parseGenericOutputDir m = do
_pathPath <-
option
somePreDirOpt
2022-09-14 17:16:15 +03:00
( long "output-dir"
<> metavar "OUTPUT_DIR"
<> help "Path to output directory"
<> action "directory"
<> m
)
2022-12-20 15:05:40 +03:00
pure AppPath {_pathIsInput = False, ..}
somePreDirOpt :: ReadM (Prepath Dir)
somePreDirOpt = mkPrepath <$> str
somePreFileOpt :: ReadM (Prepath File)
somePreFileOpt = mkPrepath <$> str
2022-12-20 15:05:40 +03:00
someFileOpt :: ReadM (SomeBase File)
someFileOpt = eitherReader aux
where
aux :: String -> Either String (SomeBase File)
aux s = maybe (Left $ s <> " is not a file path") Right (parseSomeFile s)
someDirOpt :: ReadM (SomeBase Dir)
someDirOpt = eitherReader aux
where
aux :: String -> Either String (SomeBase Dir)
aux s = maybe (Left $ s <> " is not a directory path") Right (parseSomeDir s)
2022-09-14 17:16:15 +03:00
naturalNumberOpt :: ReadM Word
naturalNumberOpt = eitherReader aux
where
aux :: String -> Either String Word
aux s = maybe (Left $ s <> " is not a nonnegative number") Right (readMaybe s :: Maybe Word)
fieldSizeOpt :: ReadM (Maybe Natural)
fieldSizeOpt = eitherReader aux
where
aux :: String -> Either String (Maybe Natural)
aux s = case s of
"cairo" -> Right $ Just cairoFieldSize
"small" -> Right $ Just smallFieldSize
_ ->
Make `compile` targets a subcommand instead of a flag (#2700) # Changes The main goal of this pr is to remove the `--target` flag for `juvix compile` and use subcommands instead. The targets that are relevant to normal users are found in `juvix compile --help`. Targets that are relevant only to developers are found in `juvix dev compile --help`. Below I list some of the changes in more detail. ## Compile targets for user-facing languages - `juvix compile native` - `juvix compile wasi`. I wasn't sure how to call this: `wasm`, `wasm32-wasi`, etc. In the end I thought `wasi` was short and accurate, but we can change it. - `juvix compile vampir` - `juvix compile anoma` - `juvix compile cairo` ## *New* compile targets for internal languages See `juvix dev compile --help`. 1. `dev compile core` has the same behaviour as `dev core from-concrete`. The `dev core from-concrete` is redundant at the moment. 2. `dev compile tree` compiles to Tree and prints the InfoTable to the output file wihout any additional checks. 3. `dev compile reg` compiles to Reg and prints the InfoTable to the output file wihout any additional checks. 4. `dev compile asm` compiles to Asm and prints the InfoTable to the output file wihout any additional checks. 5. 4. `dev compile casm` compiles to Asm and prints the Result to the output file wihout any additional checks. TODO: should the Result be printed or something else? At the moment the Result lacks a pretty instance. 6. ## Optional input file 1. The input file for commands that expect a .juvix file as input is now optional. If the argument is ommited, he main file given in the package.yaml will be used. This applies to the following commands: 1. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 8. `juvix dev compile [core|reg|tree|casm|asm]` 1. `juvix html` 3. `juvix markdown`. 4. `juvix dev internal [typecheck|pretty]`. 5. `juvix dev [parse|scope]` 7. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 9. note that `juvix format` has not changed its behaviour. ## Refactor some C-like compiler flags Both `juvix compile native` and `juvix compile wasi` support `--only-c` (`-C`), `--only-preprocess` (`-E`), `--only-assemble` (`-S`). I propose to deviate from the `gcc` style and instead use a flag with a single argument: - `--cstage [source|preprocess|assembly|exec(default)]`. I'm open to suggestions. For now, I've kept the legacy flags but marked them as deprecated in the help message. ## Remove code duplication I've tried to reduce code duplication. This is sometimes in tension with code readability so I've tried to find a good balance. I've tried to make it so we don't have to jump to many different files to understand what a single command is doing. I'm sure there is still room for improvement. ## Other refactors I've implemented other small refactors that I considered improved the quality of the code. ## TODO/Future work We should refactor commands (under `compile dev`) which still use `module Commands.Extra.Compile` and remove it.
2024-04-09 14:29:07 +03:00
mapRight Just
. either Left checkAllowed
$ maybe (Left $ s <> " is not a valid field size") Right (readMaybe s :: Maybe Natural)
checkAllowed :: Natural -> Either String Natural
checkAllowed n
| n `elem` allowedFieldSizes = Right n
| otherwise = Left $ Prelude.show n <> " is not a recognized field size"
Make `compile` targets a subcommand instead of a flag (#2700) # Changes The main goal of this pr is to remove the `--target` flag for `juvix compile` and use subcommands instead. The targets that are relevant to normal users are found in `juvix compile --help`. Targets that are relevant only to developers are found in `juvix dev compile --help`. Below I list some of the changes in more detail. ## Compile targets for user-facing languages - `juvix compile native` - `juvix compile wasi`. I wasn't sure how to call this: `wasm`, `wasm32-wasi`, etc. In the end I thought `wasi` was short and accurate, but we can change it. - `juvix compile vampir` - `juvix compile anoma` - `juvix compile cairo` ## *New* compile targets for internal languages See `juvix dev compile --help`. 1. `dev compile core` has the same behaviour as `dev core from-concrete`. The `dev core from-concrete` is redundant at the moment. 2. `dev compile tree` compiles to Tree and prints the InfoTable to the output file wihout any additional checks. 3. `dev compile reg` compiles to Reg and prints the InfoTable to the output file wihout any additional checks. 4. `dev compile asm` compiles to Asm and prints the InfoTable to the output file wihout any additional checks. 5. 4. `dev compile casm` compiles to Asm and prints the Result to the output file wihout any additional checks. TODO: should the Result be printed or something else? At the moment the Result lacks a pretty instance. 6. ## Optional input file 1. The input file for commands that expect a .juvix file as input is now optional. If the argument is ommited, he main file given in the package.yaml will be used. This applies to the following commands: 1. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 8. `juvix dev compile [core|reg|tree|casm|asm]` 1. `juvix html` 3. `juvix markdown`. 4. `juvix dev internal [typecheck|pretty]`. 5. `juvix dev [parse|scope]` 7. `juvix compile [native|wasi|geb|vampir|anoma|cairo]` 9. note that `juvix format` has not changed its behaviour. ## Refactor some C-like compiler flags Both `juvix compile native` and `juvix compile wasi` support `--only-c` (`-C`), `--only-preprocess` (`-E`), `--only-assemble` (`-S`). I propose to deviate from the `gcc` style and instead use a flag with a single argument: - `--cstage [source|preprocess|assembly|exec(default)]`. I'm open to suggestions. For now, I've kept the legacy flags but marked them as deprecated in the help message. ## Remove code duplication I've tried to reduce code duplication. This is sometimes in tension with code readability so I've tried to find a good balance. I've tried to make it so we don't have to jump to many different files to understand what a single command is doing. I'm sure there is still room for improvement. ## Other refactors I've implemented other small refactors that I considered improved the quality of the code. ## TODO/Future work We should refactor commands (under `compile dev`) which still use `module Commands.Extra.Compile` and remove it.
2024-04-09 14:29:07 +03:00
enumReader :: forall a. (Bounded a, Enum a, Show a) => Proxy a -> ReadM a
enumReader _ = eitherReader $ \val ->
case lookup val assocs of
Just x -> return x
Nothing -> Left ("Invalid value " <> val <> ". Valid values are: " <> (Juvix.show (allElements @a)))
where
assocs :: [(String, a)]
assocs = [(Prelude.show x, x) | x <- allElements @a]
enumCompleter :: forall a. (Bounded a, Enum a, Show a) => Proxy a -> Completer
enumCompleter _ = listCompleter [Juvix.show e | e <- allElements @a]
extCompleter :: FileExt -> Completer
extCompleter ext = mkCompleter $ \word -> do
let cmd = unwords ["compgen", "-o", "plusdirs", "-f", "-X", "!*" <> Prelude.show ext, "--", requote word]
2022-09-14 17:16:15 +03:00
result <- GHC.try @GHC.SomeException $ readProcess "bash" ["-c", cmd] ""
return . lines . fromRight [] $ result
requote :: String -> String
requote s =
let -- Bash doesn't appear to allow "mixed" escaping
-- in bash completions. So we don't have to really
-- worry about people swapping between strong and
-- weak quotes.
unescaped =
case s of
-- It's already strongly quoted, so we
-- can use it mostly as is, but we must
-- ensure it's closed off at the end and
-- there's no single quotes in the
-- middle which might confuse bash.
('\'' : rs) -> unescapeN rs
-- We're weakly quoted.
('"' : rs) -> unescapeD rs
-- We're not quoted at all.
-- We need to unescape some characters like
-- spaces and quotation marks.
elsewise -> unescapeU elsewise
in strong unescaped
where
strong :: String -> String
strong ss = '\'' : foldr go "'" ss
where
-- If there's a single quote inside the
-- command: exit from the strong quote and
-- emit it the quote escaped, then resume.
go '\'' t = "'\\''" ++ t
go h t = h : t
-- Unescape a strongly quoted string
-- We have two recursive functions, as we
-- can enter and exit the strong escaping.
unescapeN = goX
where
goX ('\'' : xs) = goN xs
goX (x : xs) = x : goX xs
goX [] = []
goN ('\\' : '\'' : xs) = '\'' : goN xs
goN ('\'' : xs) = goX xs
goN (x : xs) = x : goN xs
goN [] = []
-- Unescape an unquoted string
unescapeU = goX
where
goX [] = []
goX ('\\' : x : xs) = x : goX xs
goX (x : xs) = x : goX xs
-- Unescape a weakly quoted string
unescapeD = goX
where
-- Reached an escape character
goX ('\\' : x : xs)
-- If it's true escapable, strip the
-- slashes, as we're going to strong
-- escape instead.
| x `elem` ("$`\"\\\n" :: String) = x : goX xs
| otherwise = '\\' : x : goX xs
-- We've ended quoted section, so we
-- don't recurse on goX, it's done.
goX ('"' : xs) =
xs
-- Not done, but not a special character
-- just continue the fold.
goX (x : xs) =
x : goX xs
goX [] =
[]
optDeBruijn :: Parser Bool
optDeBruijn =
switch
( long "show-de-bruijn"
<> help "Show variable de Bruijn indices"
)
optIdentIds :: Parser Bool
optIdentIds =
switch
( long "show-ident-ids"
<> help "Show identifier IDs"
)
optArgsNum :: Parser Bool
optArgsNum =
switch
( long "show-args-num"
<> help "Show identifier arguments number"
)
optNoDisambiguate :: Parser Bool
optNoDisambiguate =
switch
( long "no-disambiguate"
<> help "Don't disambiguate the names of bound variables"
)
optReadRun :: Parser Bool
optReadRun =
switch
( long "run"
<> help "Run the code after the transformation"
)
optReadNoPrint :: Parser Bool
optReadNoPrint =
switch
( long "no-print"
<> help "Do not print the transformed code"
)
optTransformationIds :: forall a. (Text -> Either Text [a]) -> (String -> [String]) -> Parser [a]
optTransformationIds parseIds completions =
option
(eitherReader parseTransf)
( long "transforms"
<> short 't'
<> value []
<> metavar "[Transform]"
<> completer (mkCompleter (return . completions))
<> help "hint: use autocomplete"
)
where
parseTransf :: String -> Either String [a]
parseTransf = mapLeft unpack . parseIds . pack
Import tree (#2751) - Contributes to #2750 # New commands: 1. `dev import-tree scan FILE`. Scans a single file and lists all the imports in it. 2. `dev import-tree print`. Scans all files in the package and its dependencies. Builds an import dependency tree and prints it to stdin. If the `--stats` flag is given, it reports the number of scanned modules, the number of unique imports, and the length of the longest import chain. Example: this is the truncated output of `juvix dev import-tree print --stats` in the `juvix-stdlib` directory. ``` [...] Stdlib/Trait/Partial.juvix imports Stdlib/Data/String/Base.juvix Stdlib/Trait/Partial.juvix imports Stdlib/Debug/Fail.juvix Stdlib/Trait/Show.juvix imports Stdlib/Data/String/Base.juvix index.juvix imports Stdlib/Cairo/Poseidon.juvix index.juvix imports Stdlib/Data/Int/Ord.juvix index.juvix imports Stdlib/Data/Nat/Ord.juvix index.juvix imports Stdlib/Data/String/Ord.juvix index.juvix imports Stdlib/Prelude.juvix Import Tree Statistics: ======================= • Total number of modules: 56 • Total number of edges: 193 • Height (longest chain of imports): 15 ``` Bot commands support the `--scan-strategy` flag, which determines which parser we use to scan the imports. The possible values are: 1. `flatparse`. It uses the low-level [FlatParse](https://hackage.haskell.org/package/flatparse-0.5.1.0/docs/FlatParse-Basic.html) parsing library. This parser is made specifically to only parse imports and ignores the rest. So we expect this to have a much better performance. It does not have error messages. 2. `megaparsec`. It uses the normal juvix parser and we simply collect the imports from it. 4. `flatparse-megaparsec` (default). It uses the flatparse backend and fallbacks to megaparsec if it fails. # Internal changes ## Megaparsec Parser (`Concrete.FromSource`) In order to be able to run the parser during the scanning phase, I've adjusted some of the effects used in the parser: 1. I've removed the `NameIdGen` and `Files` constraints, which were unused. 2. I've removed `Reader EntryPoint`. It was used to get the `ModuleId`. Now the `ModuleId` is generated during scoping. 3. I've replaced `PathResolver` by the `TopModuleNameChecker` effect. This new effect, as the name suggests, only checks the name of the module (same rules as we had in the `PathResolver` before). It is also possible to ignore the effect, which is needed if we want to use this parser without an entrypoint. ## `PathResolver` effet refactor 1. The `WithPath` command has been removed. 2. New command `ResolvePath :: ImportScan -> PathResolver m (PackageInfo, FileExt)`. Useful for resolving imports during scanning phase. 3. New command `WithResolverRoot :: Path Abs Dir -> m a -> PathResolver m a`. Useful for switching package context. 4. New command `GetPackageInfos :: PathResolver m (HashMap (Path Abs Dir) PackageInfo)` , which returns a table with all packages. Useful to scan all dependencies. The `Package.PathResolver` has been refactored to be more like to normal `PathResolver`. We've discussed with @paulcadman the possibility to try to unify both implementations in the near future. ## Misc 1. `Package.juvix` no longer ends up in `PackageInfo.packageRelativeFiles`. 1. I've introduced string definitions for `--`, `{-` and `-}`. 2. I've fixed a bug were `.juvix.md` was detected as an invalid extension. 3. I've added `LazyHashMap` to the prelude. I've also added `ordSet` to create ordered Sets, `ordMap` for ordered maps, etc. # Benchmarks I've profiled `juvix dev import-tree --scan-strategy [megaparsec | flatparse] --stats` with optimization enabled. In the images below we see that in the megaparsec case, the scanning takes 54.8% of the total time, whereas in the flatparse case it only takes 9.6% of the total time. - **Megaparsec** ![image](https://github.com/anoma/juvix/assets/5511599/05ec42cf-d79d-4bbf-b462-c0e48593fe51) - **Flatparse** ![image](https://github.com/anoma/juvix/assets/5511599/1d7b363c-a915-463c-8dc4-613ab4b7d473) ## Hyperfine ``` hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 20 Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats Time (mean ± σ): 82.0 ms ± 4.5 ms [User: 64.8 ms, System: 17.3 ms] Range (min … max): 77.0 ms … 102.4 ms 37 runs Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats Time (mean ± σ): 174.1 ms ± 2.7 ms [User: 157.5 ms, System: 16.8 ms] Range (min … max): 169.7 ms … 181.5 ms 20 runs Summary juvix dev import-tree print --scan-strategy flatparse --stats ran 2.12 ± 0.12 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats ``` In order to compare (almost) only the parsing, I've forced the scanning of each file to be performed 50 times (so that the cost of other parts get swallowed). Here are the results: ``` hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 10 Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats Time (mean ± σ): 189.5 ms ± 3.6 ms [User: 161.7 ms, System: 27.6 ms] Range (min … max): 185.1 ms … 197.1 ms 15 runs Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats Time (mean ± σ): 5.113 s ± 0.023 s [User: 5.084 s, System: 0.035 s] Range (min … max): 5.085 s … 5.148 s 10 runs Summary juvix dev import-tree print --scan-strategy flatparse --stats ran 26.99 ± 0.52 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats ```
2024-05-14 11:53:33 +03:00
optImportScanStrategy :: Parser ImportScanStrategy
optImportScanStrategy =
option
(enumReader Proxy)
( long "scan-strategy"
<> metavar "SCAN_STRAT"
<> completer (enumCompleter @ImportScanStrategy Proxy)
<> value defaultImportScanStrategy
<> help "Import scanning strategy"
)
optCoreTransformationIds :: Parser [Core.TransformationId]
optCoreTransformationIds = optTransformationIds Core.parseTransformations Core.completionsString
optTreeTransformationIds :: Parser [Tree.TransformationId]
optTreeTransformationIds = optTransformationIds Tree.parseTransformations Tree.completionsString
optRegTransformationIds :: Parser [Reg.TransformationId]
optRegTransformationIds = optTransformationIds Reg.parseTransformations Reg.completionsString
optCasmTransformationIds :: Parser [Casm.TransformationId]
optCasmTransformationIds = optTransformationIds Casm.parseTransformations Casm.completionsString
class EntryPointOptions a where
applyOptions :: a -> EntryPoint -> EntryPoint
Parallel pipeline (#2779) This pr introduces parallelism in the pipeline to gain performance. I've included benchmarks at the end. - Closes #2750. # Flags: There are two new global flags: 1. `-N / --threads`. It is used to set the number of capabilities. According to [GHC documentation](https://hackage.haskell.org/package/base-4.20.0.0/docs/GHC-Conc.html#v:setNumCapabilities): _Set the number of Haskell threads that can run truly simultaneously (on separate physical processors) at any given time_. When compiling in parallel, we create this many worker threads. The default value is `-N auto`, which sets `-N` to half the number of logical cores, capped at 8. 2. `--dev-show-thread-ids`. When given, the thread id is printed in the compilation progress log. E.g. ![image](https://github.com/anoma/juvix/assets/5511599/9359fae2-0be1-43e5-8d74-faa82cba4034) # Parallel compilation 1. I've added `src/Parallel/ParallelTemplate.hs` which contains all the concurrency related code. I think it is good to keep this code separated from the actual compiler code. 2. I've added a progress log (only for the parallel driver) that outputs a log of the compilation progress, similar to what stack/cabal do. # Code changes: 1. I've removed the `setup` stage where we were registering dependencies. Instead, the dependencies are registered when the `pathResolver` is run for the first time. This way it is safer. 1. Now the `ImportTree` is needed to run the pipeline. Cycles are detected during the construction of this tree, so I've removed `Reader ImportParents` from the pipeline. 3. For the package pathresolver, we do not support parallelism yet (we could add support for it in the future, but the gains will be small). 4. When `-N1`, the pipeline remains unchanged, so performance should be the same as in the main branch (except there is a small performance degradation due to adding the `-threaded` flag). 5. I've introduced `PipelineOptions`, which are options that are used to pass options to the effects in the pipeline. 6. `PathResolver` constraint has been removed from the `upTo*` functions in the pipeline due to being redundant. 7. I've added a lot of `NFData` instances. They are needed to force the full evaluation of `Stored.ModuleInfo` in each of the threads. 2. The `Cache` effect uses [`SharedState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-State-Static-Shared.html) as opposed to [`LocalState`](https://hackage.haskell.org/package/effectful-core-2.3.0.1/docs/Effectful-Writer-Static-Local.html). Perhaps we should provide different versions. 3. I've added a `Cache` handler that accepts a setup function. The setup is triggered when a miss is detected. It is used to lazily compile the modules in parallel. # Tests 1. I've adapted the smoke test suite to ignore the progress log in the stderr. 5. I've had to adapt `tests/positive/Internal/Lambda.juvix`. Due to laziness, a crash happening in this file was not being caught. The problem is that in this file we have a lambda function with different number of patterns in their clauses, which we currently do not support (https://github.com/anoma/juvix/issues/1706). 6. I've had to comment out the definition ``` x : Box ((A : Type) → A → A) := box λ {A a := a}; ``` From the test as it was causing a crash (https://github.com/anoma/juvix/issues/2247). # Future Work 1. It should be investigated how much performance we lose by fully evaluating the `Stored.ModuleInfo`, since some information in it will be discarded. It may be possible to be more fine-grained when forcing evaluation. 8. The scanning of imports to build the import tree is sequential. Now, we build the import tree from the entry point module and only the modules that are imported from it are in the tree. However, we have discussed that at some point we should make a distinction between `juvix` _the compiler_ and `juvix` _the build tool_. When using `juvix` as a build tool it makes sense to typecheck/compile (to stored core) all modules in the project. When/if we do this, scanning imports in all modules in parallel becomes trivial. 9. The implementation of the `ParallelTemplate` uses low level primitives such as [forkIO](https://hackage.haskell.org/package/base-4.20.0.0/docs/Control-Concurrent.html#v:forkIO). At some point it should be refactored to use safer functions from the [`Effectful.Concurrent.Async`](https://hackage.haskell.org/package/effectful-2.3.0.0/docs/Effectful-Concurrent-Async.html) module. 10. The number of cores and worker threads that we spawn is determined by the command line. Ideally, we could use to import tree to compute an upper bound to the ideal number of cores to use. 11. We could add an animation that displays which modules are being compiled in parallel and which have finished being compiled. # Benchmarks On some benchmarks, I include the GHC runtime option [`-A`](https://downloads.haskell.org/ghc/latest/docs/users_guide/runtime_control.html#rts-flag--A%20%E2%9F%A8size%E2%9F%A9), which sometimes makes a good impact on performance. Thanks to @paulcadman for pointing this out. I've figured a good combination of `-N` and `-A` through trial and error (but this oviously depends on the cpu and juvix projects). ## Typecheck the standard library ### Clean run (88% faster than main): ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 444.1 ms ± 6.5 ms [User: 1018.0 ms, System: 77.7 ms] Range (min … max): 432.6 ms … 455.9 ms 10 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 628.3 ms ± 23.9 ms [User: 1227.6 ms, System: 69.5 ms] Range (min … max): 584.7 ms … 670.6 ms 10 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 835.9 ms ± 12.3 ms [User: 788.5 ms, System: 31.9 ms] Range (min … max): 816.0 ms … 853.6 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 ran 1.41 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix 1.88 ± 0.04 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ### Cached run (43% faster than main): ``` hyperfine --warmup 1 'juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432' 'juvix -N 4 typecheck Stdlib/Prelude.juvix' 'juvix-main typecheck Stdlib/Prelude.juvix' Benchmark 1: juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 Time (mean ± σ): 241.3 ms ± 7.3 ms [User: 538.6 ms, System: 101.3 ms] Range (min … max): 231.5 ms … 251.3 ms 11 runs Benchmark 2: juvix -N 4 typecheck Stdlib/Prelude.juvix Time (mean ± σ): 235.1 ms ± 12.0 ms [User: 405.3 ms, System: 87.7 ms] Range (min … max): 216.1 ms … 253.1 ms 12 runs Benchmark 3: juvix-main typecheck Stdlib/Prelude.juvix Time (mean ± σ): 336.7 ms ± 13.3 ms [User: 269.5 ms, System: 67.1 ms] Range (min … max): 316.9 ms … 351.8 ms 10 runs Summary juvix -N 4 typecheck Stdlib/Prelude.juvix ran 1.03 ± 0.06 times faster than juvix -N 4 typecheck Stdlib/Prelude.juvix +RTS -A33554432 1.43 ± 0.09 times faster than juvix-main typecheck Stdlib/Prelude.juvix ``` ## Typecheck the test suite of the containers library At the moment this is the biggest juvix project that we have. ### Clean run (105% faster than main) ``` hyperfine --warmup 1 --prepare 'juvix clean' 'juvix -N 6 typecheck Main.juvix +RTS -A67108864' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A67108864 Time (mean ± σ): 1.006 s ± 0.011 s [User: 2.171 s, System: 0.162 s] Range (min … max): 0.991 s … 1.023 s 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 1.584 s ± 0.046 s [User: 2.934 s, System: 0.149 s] Range (min … max): 1.535 s … 1.660 s 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 2.066 s ± 0.010 s [User: 1.939 s, System: 0.089 s] Range (min … max): 2.048 s … 2.077 s 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A67108864 ran 1.57 ± 0.05 times faster than juvix -N 4 typecheck Main.juvix 2.05 ± 0.03 times faster than juvix-main typecheck Main.juvix ``` ### Cached run (54% faster than main) ``` hyperfine --warmup 1 'juvix -N 6 typecheck Main.juvix +RTS -A33554432' 'juvix -N 4 typecheck Main.juvix' 'juvix-main typecheck Main.juvix' Benchmark 1: juvix -N 6 typecheck Main.juvix +RTS -A33554432 Time (mean ± σ): 551.8 ms ± 13.2 ms [User: 1419.8 ms, System: 199.4 ms] Range (min … max): 535.2 ms … 570.6 ms 10 runs Benchmark 2: juvix -N 4 typecheck Main.juvix Time (mean ± σ): 636.7 ms ± 17.3 ms [User: 1006.3 ms, System: 196.3 ms] Range (min … max): 601.6 ms … 655.3 ms 10 runs Benchmark 3: juvix-main typecheck Main.juvix Time (mean ± σ): 847.2 ms ± 58.9 ms [User: 710.1 ms, System: 126.5 ms] Range (min … max): 731.1 ms … 890.0 ms 10 runs Summary juvix -N 6 typecheck Main.juvix +RTS -A33554432 ran 1.15 ± 0.04 times faster than juvix -N 4 typecheck Main.juvix 1.54 ± 0.11 times faster than juvix-main typecheck Main.juvix ```
2024-05-31 14:41:30 +03:00
instance EntryPointOptions (EntryPoint -> EntryPoint) where
applyOptions = id
instance EntryPointOptions () where
applyOptions () = id