mirror of
https://github.com/enso-org/enso.git
synced 2024-12-23 17:03:32 +03:00
313 lines
13 KiB
Markdown
313 lines
13 KiB
Markdown
---
|
|
layout: developer-doc
|
|
title: Macro Resolution
|
|
category: parser
|
|
tags: [parser, macro, resolution]
|
|
order: 4
|
|
---
|
|
|
|
# Macro Resolution
|
|
|
|
The Enso macro system is responsible for translating the structured token stream
|
|
from the [lexer](./lexer.md) and resolving it into the output [ast](./ast.md)
|
|
that represents the Enso language.
|
|
|
|
The macro system is of _key_ importance to the operation of Enso, providing the
|
|
IDE with detailed information on how users can interact with each and every
|
|
portion of the syntax. This has myriad benefits over a traditional parser,
|
|
allowing the IDE to handle all portions of the language's syntax in a seamless
|
|
way; the [macro matchers](#macro-matchers) describe the Enso syntax in a way
|
|
that allows the IDE to extract information about extension points and other
|
|
crucial metadata.
|
|
|
|
Nevertheless, the macro system _also_ serves as the language's parser,
|
|
implementing a sophisticated resolution algorithm that is able to turn the token
|
|
stream into the language AST.
|
|
|
|
Furthermore, the macro system can be extended in future to allow _users_ to
|
|
define their own hygienic macros in Enso itself. This will allow them to
|
|
manipulate their programs syntactically using _exactly_ the same mechanism as
|
|
used in the Enso parser itself.
|
|
|
|
<!-- MarkdownTOC levels="2,3" autolink="true" indent=" " -->
|
|
|
|
- [Resolving Macros](#resolving-macros)
|
|
- [Macro Segments](#macro-segments)
|
|
- [Matching](#matching)
|
|
- [Macro Matchers](#macro-matchers)
|
|
- [Relative Precedence Matchers](#relative-precedence-matchers)
|
|
- [Named Macros](#named-macros)
|
|
- [Macro Resolution Errors](#macro-resolution-errors)
|
|
- [Macro Errors as Parser Errors](#macro-errors-as-parser-errors)
|
|
- [User-Defined Macros](#user-defined-macros)
|
|
- [Benchmarking](#benchmarking)
|
|
- [Running a Subset of the Benchmarks](#running-a-subset-of-the-benchmarks)
|
|
- [Changing the Macro Resolver](#changing-the-macro-resolver)
|
|
|
|
<!-- /MarkdownTOC -->
|
|
|
|
## Resolving Macros
|
|
|
|
Conceptually, the macro resolution process builds a tree out of the user-defined
|
|
macros, segmenting the token-space based on the types of tokens expected by the
|
|
various definitions. Resolution then proceeds to walk the tree, pruning the
|
|
possible cases until either:
|
|
|
|
1. An unambiguous resolution is found.
|
|
2. An ambiguous resolution is found and an error emitted.
|
|
|
|
Resolution then recurses, continuing to apply the macro resolution process
|
|
against the non-matchable segments in the macro.
|
|
|
|
Consider, for example, the macros defined as follows:
|
|
|
|
```rust
|
|
let if_then = "if" >> Matcher::Expr >> "then" >> Matcher::Expr
|
|
let if_then_else = "if" >> Matcher::Expr >> "then" >> Matcher::Expr >> "else" >> Matcher::Expr
|
|
```
|
|
|
|
These generate a tree as follows:
|
|
|
|
```
|
|
┌──────┐
|
|
│ "if" │
|
|
└──────┘
|
|
│
|
|
├─────────────────┐
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────┐
|
|
│ Matcher::Expr │ │ Other │
|
|
└───────────────┘ └───────┘
|
|
│ │
|
|
│ │
|
|
▼ ▼
|
|
┌──────┐ ┌──────┐
|
|
│"then"│ │ Fail │
|
|
└──────┘ └──────┘
|
|
│
|
|
├─────────────────┐
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────┐
|
|
│ Matcher::Expr │ │ Other │
|
|
└───────────────┘ └───────┘
|
|
│ │
|
|
┌───────────────────┤ │
|
|
▼ ▼ ▼
|
|
┌─────────┐ ┌──────┐ ┌──────┐
|
|
│ Succeed │ │"else"│ │ Fail │
|
|
└─────────┘ └──────┘ └──────┘
|
|
│
|
|
├─────────────────┐
|
|
▼ ▼
|
|
┌───────────────┐ ┌───────┐
|
|
│ Matcher::Expr │ │ Other │
|
|
└───────────────┘ └───────┘
|
|
│ │
|
|
│ │
|
|
▼ ▼
|
|
┌─────────┐ ┌──────┐
|
|
│ Succeed │ │ Fail │
|
|
└─────────┘ └──────┘
|
|
```
|
|
|
|
Within each of the patterns, the macro resolution continues recursively, thus
|
|
allowing the macros to match nested uses of themselves (e.g.
|
|
`if a then if b then c`).
|
|
|
|
In reality, this is not how it _actually_ takes place. Applying one macro to
|
|
completion and then starting again on all remaining segments provides abysmal
|
|
complexity. As a result, the process instead operates on a _stack_ of resolution
|
|
contexts, performing its resolution in a linear scan over the token stream.
|
|
|
|
### Macro Segments
|
|
|
|
Macros are described in terms of _segments_. A segment consists of three main
|
|
components:
|
|
|
|
1. An optional _preceding section_ that uses [matchers](#macro-matchers) to
|
|
describe what they match until encountering the _literal_ in 2. This can
|
|
only occur with the first segment.
|
|
2. A _literal_ that must be matched for that segment to apply.
|
|
3. An optional associated [matcher](#macro-matchers) that consumes certain
|
|
_types_ of token.
|
|
|
|
Segments determine how a given macro matches on the input token stream, and are
|
|
used to generate the tree for performing resolution.
|
|
|
|
### Matching
|
|
|
|
When a match occurs, the resultant segments are passed to a function that can
|
|
perform some action based on those segments. This function is responsible for
|
|
manipulating the output of the macro resolver, and hence manipulates the AST.
|
|
|
|
As the function may assert _additional_ properties on the match not able to be
|
|
ascertained by the segment definitions themselves,
|
|
|
|
## Macro Matchers
|
|
|
|
The matchers describe the _kinds_ of tokens that can be reasoned about within a
|
|
macro definition. These fall across a set of categories as follows.
|
|
|
|
The boundary matchers are:
|
|
|
|
- `Start`: Matches the start of the line.
|
|
- `End`: Matches the end of the line.
|
|
|
|
Structural matchers:
|
|
|
|
- `Nothing`: Never matches on any token.
|
|
- `Any`: Matches on any token. This may become a family of tokens (e.g.
|
|
`AnyExpr`), and so on, for performance.
|
|
- `Seq`: Matches on the contained matches in sequence.
|
|
- `Or`: Matches on one of the contained matches.
|
|
- `Many`: Matches on multiple of the given match.
|
|
- `Except`: Matches on everything but the given pattern.
|
|
|
|
Meta matchers:
|
|
|
|
- `Build`: Performs AST resolution on the match, but this must become implicit.
|
|
- `Err`: Allows ascribing a user-defined error to the failure of the contained
|
|
matcher.
|
|
- `Tag`: Does nothing, but attaches string metadata to the result. This is very
|
|
useful for syntax highlighting.
|
|
- `Exact`: Matches on an explicit AST.
|
|
|
|
Identifier matchers:
|
|
|
|
- `Referent`: Matches on a referent identifier.
|
|
- `Variable`: Matches on a variable identifier.
|
|
- `External`: Matches on an external identifier.
|
|
- `Blank`: Matches the blank identifier (`_`).
|
|
- `Operator`: Matches on an operator identifier.
|
|
- `Modifier`: Matches on a modifier identifier.
|
|
- `Annotation`: Matches on an annotation identifier.
|
|
|
|
Literal matchers:
|
|
|
|
- `Number`: Matches on a literal number.
|
|
- `Text`: Any text literal.
|
|
- `TextLine`: Matches a single-line textual literal.
|
|
- `TextInlineBlock`: Matches an inline-block textual literal.
|
|
- `TextBlock`: Matches a block text literal.
|
|
|
|
Comment matchers:
|
|
|
|
- `DisableComment`: Matches a standard disable comment.
|
|
- `DocComment`: Matches an Enso doc comment.
|
|
|
|
Structural matchers:
|
|
|
|
- `Block`: Matches an Enso block.
|
|
|
|
Error matchers:
|
|
|
|
- `DanglingBase`: Matches an erroneous base for a number literal.
|
|
- `TextError`: Matches tokens representing error states during textual lexing.
|
|
- `InvalidSuffix`: Matches an erroneous suffix token.
|
|
- `Unrecognized`: Matches an unrecognised token.
|
|
|
|
In addition, it is highly likely that some of these will be combined into
|
|
aggregate matchers that match one of many kinds of token.
|
|
|
|
### Relative Precedence Matchers
|
|
|
|
Where relevant, the matchers above may take an optional "maximum precedence" of
|
|
operator that they may match. This effectively serves to implement differing
|
|
precedences on either side of a literal operator being implemented via a macro,
|
|
and is important for correctly resolving the function arrow `->`.
|
|
|
|
### Named Macros
|
|
|
|
In addition to the basic matcher described above, we also need a `Named`,
|
|
matcher. The intent behind this matcher is that it allows you to refer to
|
|
existing macro definitions by name, and also provides for better error messages.
|
|
|
|
By convention, we aim for these names to be treated as qualified, based on the
|
|
package in which they're declared. This convention should be followed for the
|
|
built-in macros as well, even if they can't be overridden, as they may appear in
|
|
error messages.
|
|
|
|
## Macro Resolution Errors
|
|
|
|
Due to the incredibly structured nature of the resolver it is quite possible for
|
|
the macro system to produce very detailed errors when resolution fails. This is
|
|
because it knows the path it took to get to the failure, and the expected tokens
|
|
or kinds of tokens at the failure site.
|
|
|
|
In addition, the resolver allows users to define their own errors through the
|
|
use of the `Err` matcher mentioned [above](#macro-matchers).
|
|
|
|
### Macro Errors as Parser Errors
|
|
|
|
The wealth of information in a failed macro resolution brings the necessary
|
|
tools for the compiler to produce informative and detailed parsing errors.
|
|
|
|
> The actionables for this section are:
|
|
>
|
|
> - Determine how we ensure provision of failure information to the engine in a
|
|
> useful format.
|
|
|
|
## User-Defined Macros
|
|
|
|
In the future we want to provide an Enso-level syntax that allows the user to
|
|
define syntactic macros for their programs, similar to how Rust exposes the
|
|
`macro_rules!` construct.
|
|
|
|
> The actionables for this section are:
|
|
>
|
|
> - Determine what this should look like in surface Enso.
|
|
> - Determine exactly how the round-trip mechanic between the runtime and parser
|
|
> should function.
|
|
|
|
## Benchmarking
|
|
|
|
All components of the macro resolver are accompanied by comprehensive benchmarks
|
|
in order to protect the performance-crucial code against regressions. These
|
|
benchmarks are written using
|
|
[criterion.rs](https://github.com/bheisler/criterion.rs), and cover all of the
|
|
performance critical functionality of the macro resolver.
|
|
|
|
**Baseline Commit:** TBC (use the latest for now)
|
|
|
|
The benchmarking process for the macro resolver is as follows:
|
|
|
|
1. Check out the current _baseline commit_, listed above.
|
|
2. In each of the benchmark files, change the configuration line reading
|
|
`.retain_baseline` to `.save_baseline`. This will save the current baseline
|
|
(taken on your machine).
|
|
3. In `lexer_bench_sources.rs` change the line that reads `.retain_baseline` to
|
|
instead read `.save_baseline`. This will save the current baseline (taken on
|
|
your machine).
|
|
4. Run the benchmarks using `cargo bench`.
|
|
5. Once the baseline run has completed, change the above-mentioned lines back to
|
|
`.retain_baseline`. This will disable overwriting the saved baseline, and
|
|
will perform its regression reporting against it.
|
|
6. Make your changes.
|
|
7. Run the benchmark suite again. It will report any performance regressions in
|
|
the benchmark report, measured against your saved baseline.
|
|
|
|
Unfortunately, the use of time-based benchmarks means that we can't commit the
|
|
baseline to the repository. There is far too much variance between machines for
|
|
this to be useful.
|
|
|
|
### Running a Subset of the Benchmarks
|
|
|
|
The benchmarks are very comprehensive, testing all performance-critical paths of
|
|
the macro resolver. However, it can often be useful to run a subset of these.
|
|
|
|
There are two main tuning points for this:
|
|
|
|
1. The _sizes_ of inputs being executed on, where relevant.
|
|
2. The benchmarks being executed.
|
|
|
|
While it is _possible_ to tune the benchmarking config to decrease benchmarking
|
|
time, this is not recommended. The current settings are tuned to provide
|
|
reliable results.
|
|
|
|
### Changing the Macro Resolver
|
|
|
|
When changing the macro resolver the _full_ benchmark suite must be run against
|
|
the current baseline before the changes can be merged. This suite run must use
|
|
the provided settings for the benchmarking library, and should be performed
|
|
using the process described above.
|