1
1
mirror of https://github.com/anoma/juvix.git synced 2024-12-29 18:43:42 +03:00
juvix/app/Commands/Dev/ImportTree.hs
Jan Mas Rovira 7c59e2aa10
Import tree (#2751)
- Contributes to #2750 

# New commands:
1. `dev import-tree scan FILE`. Scans a single file and lists all the
imports in it.
2. `dev import-tree print`. Scans all files in the package and its
dependencies. Builds an import dependency tree and prints it to stdin.
If the `--stats` flag is given, it reports the number of scanned
modules, the number of unique imports, and the length of the longest
import chain.

Example: this is the truncated output of `juvix dev import-tree print
--stats` in the `juvix-stdlib` directory.
```
[...]
Stdlib/Trait/Partial.juvix imports Stdlib/Data/String/Base.juvix
Stdlib/Trait/Partial.juvix imports Stdlib/Debug/Fail.juvix
Stdlib/Trait/Show.juvix imports Stdlib/Data/String/Base.juvix
index.juvix imports Stdlib/Cairo/Poseidon.juvix
index.juvix imports Stdlib/Data/Int/Ord.juvix
index.juvix imports Stdlib/Data/Nat/Ord.juvix
index.juvix imports Stdlib/Data/String/Ord.juvix
index.juvix imports Stdlib/Prelude.juvix

Import Tree Statistics:
=======================
• Total number of modules: 56
• Total number of edges: 193
• Height (longest chain of imports): 15
```

Bot commands support the `--scan-strategy` flag, which determines which
parser we use to scan the imports. The possible values are:
1. `flatparse`. It uses the low-level
[FlatParse](https://hackage.haskell.org/package/flatparse-0.5.1.0/docs/FlatParse-Basic.html)
parsing library. This parser is made specifically to only parse imports
and ignores the rest. So we expect this to have a much better
performance. It does not have error messages.
2. `megaparsec`. It uses the normal juvix parser and we simply collect
the imports from it.
4. `flatparse-megaparsec` (default). It uses the flatparse backend and
fallbacks to megaparsec if it fails.

# Internal changes
## Megaparsec Parser (`Concrete.FromSource`)
In order to be able to run the parser during the scanning phase, I've
adjusted some of the effects used in the parser:
1. I've removed the `NameIdGen` and `Files` constraints, which were
unused.
2. I've removed `Reader EntryPoint`. It was used to get the `ModuleId`.
Now the `ModuleId` is generated during scoping.
3. I've replaced `PathResolver` by the `TopModuleNameChecker` effect.
This new effect, as the name suggests, only checks the name of the
module (same rules as we had in the `PathResolver` before). It is also
possible to ignore the effect, which is needed if we want to use this
parser without an entrypoint.

## `PathResolver` effet refactor
1. The `WithPath` command has been removed.
2. New command `ResolvePath :: ImportScan -> PathResolver m
(PackageInfo, FileExt)`. Useful for resolving imports during scanning
phase.
3. New command `WithResolverRoot :: Path Abs Dir -> m a -> PathResolver
m a`. Useful for switching package context.
4. New command `GetPackageInfos :: PathResolver m (HashMap (Path Abs
Dir) PackageInfo)` , which returns a table with all packages. Useful to
scan all dependencies.

The `Package.PathResolver` has been refactored to be more like to normal
`PathResolver`. We've discussed with @paulcadman the possibility to try
to unify both implementations in the near future.

## Misc
1. `Package.juvix` no longer ends up in
`PackageInfo.packageRelativeFiles`.
1. I've introduced string definitions for `--`, `{-` and `-}`.
2. I've fixed a bug were `.juvix.md` was detected as an invalid
extension.
3. I've added `LazyHashMap` to the prelude. I've also added `ordSet` to
create ordered Sets, `ordMap` for ordered maps, etc.

# Benchmarks
I've profiled `juvix dev import-tree --scan-strategy [megaparsec |
flatparse] --stats` with optimization enabled.
In the images below we see that in the megaparsec case, the scanning
takes 54.8% of the total time, whereas in the flatparse case it only
takes 9.6% of the total time.

- **Megaparsec**

![image](https://github.com/anoma/juvix/assets/5511599/05ec42cf-d79d-4bbf-b462-c0e48593fe51)

- **Flatparse**

![image](https://github.com/anoma/juvix/assets/5511599/1d7b363c-a915-463c-8dc4-613ab4b7d473)

## Hyperfine
```
hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 20
Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats
  Time (mean ± σ):      82.0 ms ±   4.5 ms    [User: 64.8 ms, System: 17.3 ms]
  Range (min … max):    77.0 ms … 102.4 ms    37 runs

Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats
  Time (mean ± σ):     174.1 ms ±   2.7 ms    [User: 157.5 ms, System: 16.8 ms]
  Range (min … max):   169.7 ms … 181.5 ms    20 runs

Summary
  juvix dev import-tree print --scan-strategy flatparse --stats ran
    2.12 ± 0.12 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats
```

In order to compare (almost) only the parsing, I've forced the scanning
of each file to be performed 50 times (so that the cost of other parts
get swallowed). Here are the results:
```
hyperfine --warmup 1 'juvix dev import-tree print --scan-strategy flatparse --stats' 'juvix dev import-tree print --scan-strategy megaparsec --stats' --min-runs 10
Benchmark 1: juvix dev import-tree print --scan-strategy flatparse --stats
  Time (mean ± σ):     189.5 ms ±   3.6 ms    [User: 161.7 ms, System: 27.6 ms]
  Range (min … max):   185.1 ms … 197.1 ms    15 runs

Benchmark 2: juvix dev import-tree print --scan-strategy megaparsec --stats
  Time (mean ± σ):      5.113 s ±  0.023 s    [User: 5.084 s, System: 0.035 s]
  Range (min … max):    5.085 s …  5.148 s    10 runs

Summary
  juvix dev import-tree print --scan-strategy flatparse --stats ran
   26.99 ± 0.52 times faster than juvix dev import-tree print --scan-strategy megaparsec --stats
```
2024-05-14 10:53:33 +02:00

12 lines
405 B
Haskell

module Commands.Dev.ImportTree where
import Commands.Base
import Commands.Dev.ImportTree.Options
import Commands.Dev.ImportTree.Print qualified as Print
import Commands.Dev.ImportTree.ScanFile qualified as ScanFile
runCommand :: (Members '[EmbedIO, App, TaggedLock] r) => ImportTreeCommand -> Sem r ()
runCommand = \case
Print opts -> Print.runCommand opts
ScanFile opts -> ScanFile.runCommand opts