Merge pull request #2059 from rtfeldman/readme-code-starting-points

Add a list of starting points to get familiar with compiler code
This commit is contained in:
Richard Feldman 2021-11-25 00:12:46 -05:00 committed by GitHub
commit e63727fee4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 35 additions and 9 deletions

View File

@ -18,9 +18,8 @@ For example, parsing would translate this string...
This `Expr` representation of the expression is useful for things like: This `Expr` representation of the expression is useful for things like:
* Checking that all variables are declared before they're used - Checking that all variables are declared before they're used
* Type checking - Type checking
* Running Roc code in Interpreted Mode (that is, without having to compile it to Rust first - useful for development, since it's a faster feedback loop, but there's a runtime performance penalty compared to doing a full compile to Rust).
> As of this writing, the compiler doesn't do any of those things yet. They'll be added later! > As of this writing, the compiler doesn't do any of those things yet. They'll be added later!
@ -28,7 +27,7 @@ Since the parser is only concerned with translating String values into Expr valu
For example, parsing will translate this string: For example, parsing will translate this string:
not "foo", "bar" not "foo", "bar"
...into this `Expr`: ...into this `Expr`:
@ -68,7 +67,7 @@ The `eval` function will take this `Expr` and translate it into this much simple
Int(6) Int(6)
At this point it's become so simple that we can display it to the end user as the number `6`. So running `parse` and then `eval` on the original Roc string of `1 + 8 - 3` will result in displaying `6` as the final output. At this point it's become so simple that we can display it to the end user as the number `6`. So running `parse` and then `eval` on the original Roc string of `1 + 8 - 3` will result in displaying `6` as the final output.
> The `expr` module includes an `impl fmt::Display for Expr` that takes care of translating `Int(6)` into `6`, `Char('x')` as `'x'`, and so on. > The `expr` module includes an `impl fmt::Display for Expr` that takes care of translating `Int(6)` into `6`, `Char('x')` as `'x'`, and so on.
@ -105,7 +104,6 @@ That concludes our original recursive call to `eval`, after which point we'll be
This will work the same way as `Minus` did, and will reduce down to `Int(6)`. This will work the same way as `Minus` did, and will reduce down to `Int(6)`.
## Optimization philosophy ## Optimization philosophy
Focus on optimizations which are only safe in the absence of side effects, and leave the rest to LLVM. Focus on optimizations which are only safe in the absence of side effects, and leave the rest to LLVM.
@ -142,3 +140,27 @@ Express operations like map and filter in terms of toStream and fromStream, to u
More info on here: More info on here:
https://wiki.haskell.org/GHC_optimisations#Fusion https://wiki.haskell.org/GHC_optimisations#Fusion
# Getting started with the code
The compiler contains a lot of code! If you're new to the project it can be hard to know where to start. It's useful to have some sort of "main entry point", or at least a "good place to start" for each of the main phases.
After you get into the details, you'll discover that some parts of the compiler have more than one entry point. And things can be interwoven together in subtle and complex ways, for reasons to do with performance, edge case handling, etc. But if this is "day one" for you, and you're just trying to get familiar with things, this should be "good enough".
The compiler is invoked from the CLI via `build_file` in cli/src/build.rs
| Phase | Entry point / main functions |
| ------------------------------------- | ------------------------------------------------ |
| Compiler entry point | load/src/file.rs: load, load_and_monomorphize |
| Parse header | parse/src/module.rs: parse_header |
| Parse definitions | parse/src/module.rs: module_defs |
| Canonicalize | can/src/def.rs: canonicalize_defs |
| Type check | solve/src/module.rs: run_solve |
| Gather types to specialize | mono/src/ir.rs: PartialProc::from_named_function |
| Solve specialized types | mono/src/ir.rs: from_can, with_hole |
| Insert reference counting | mono/src/ir.rs: Proc::insert_refcount_operations |
| Code gen (optimized but slow) | gen_llvm/src/llvm/build.rs: build_procedures |
| Code gen (unoptimized but fast, CPU) | gen_dev/src/object_builder.rs: build_module |
| Code gen (unoptimized but fast, Wasm) | gen_wasm/src/lib.rs: build_module |
For a more detailed understanding of the compilation phases, see the `Phase`, `BuildTask`, and `Msg` enums in `load/src/file.rs`.

View File

@ -783,6 +783,8 @@ struct ParsedModule<'a> {
parsed_defs: &'a [Located<roc_parse::ast::Def<'a>>], parsed_defs: &'a [Located<roc_parse::ast::Def<'a>>],
} }
/// A message sent out _from_ a worker thread,
/// representing a result of work done, or a request for further work
#[derive(Debug)] #[derive(Debug)]
enum Msg<'a> { enum Msg<'a> {
Many(Vec<Msg<'a>>), Many(Vec<Msg<'a>>),
@ -1004,6 +1006,7 @@ impl ModuleTiming {
} }
} }
/// A message sent _to_ a worker thread, describing the work to be done
#[derive(Debug)] #[derive(Debug)]
#[allow(dead_code)] #[allow(dead_code)]
enum BuildTask<'a> { enum BuildTask<'a> {
@ -1134,6 +1137,7 @@ where
} }
} }
/// Main entry point to the compiler from the CLI and tests
pub fn load_and_monomorphize<'a, F>( pub fn load_and_monomorphize<'a, F>(
arena: &'a Bump, arena: &'a Bump,
filename: PathBuf, filename: PathBuf,
@ -1300,7 +1304,7 @@ enum LoadResult<'a> {
/// 5. Parse the module's defs. /// 5. Parse the module's defs.
/// 6. Canonicalize the module. /// 6. Canonicalize the module.
/// 7. Before type checking, block on waiting for type checking to complete on all imports. /// 7. Before type checking, block on waiting for type checking to complete on all imports.
/// (Since Roc doesn't allow cyclic dependencies, this ctypeot deadlock.) /// (Since Roc doesn't allow cyclic dependencies, this cannot deadlock.)
/// 8. Type check the module and create type annotations for its top-level declarations. /// 8. Type check the module and create type annotations for its top-level declarations.
/// 9. Report the completed type annotation to the coordinator thread, so other modules /// 9. Report the completed type annotation to the coordinator thread, so other modules
/// that are waiting in step 7 can unblock. /// that are waiting in step 7 can unblock.
@ -1324,9 +1328,9 @@ enum LoadResult<'a> {
/// in requests for others; these are added to the queue and worked through as normal. /// in requests for others; these are added to the queue and worked through as normal.
/// This process continues until *both* all modules have reported that they've finished /// This process continues until *both* all modules have reported that they've finished
/// adding specialization requests to the queue, *and* the queue is empty (including /// adding specialization requests to the queue, *and* the queue is empty (including
/// of any requestss that were added in the course of completing other requests). Now /// of any requests that were added in the course of completing other requests). Now
/// we have a map of specializations, and everything was assembled in parallel with /// we have a map of specializations, and everything was assembled in parallel with
/// no unique specialization ever getting assembled twice (meanaing no wasted effort). /// no unique specialization ever getting assembled twice (meaning no wasted effort).
/// 12. Now that we have our final map of specializations, we can proceed to code gen! /// 12. Now that we have our final map of specializations, we can proceed to code gen!
/// As long as the specializations are stored in a per-ModuleId map, we can also /// As long as the specializations are stored in a per-ModuleId map, we can also
/// parallelize this code gen. (e.g. in dev builds, building separate LLVM modules /// parallelize this code gen. (e.g. in dev builds, building separate LLVM modules