enso/docs/parser/macro-resolution.md

---
layout: developer-doc
title: Macro Resolution
category: parser
tags: [parser, macro, resolution]
order: 4
---

# Macro Resolution

The Enso macro system is responsible for translating the structured token stream
from the [lexer](./lexer.md) and resolving it into the output [ast](./ast.md)
that represents the Enso language.

The macro system is of _key_ importance to the operation of Enso, providing the
IDE with detailed information on how users can interact with each and every
portion of the syntax. This has myriad benefits over a traditional parser,
allowing the IDE to handle all portions of the language's syntax in a seamless
way; the [macro matchers](#macro-matchers) describe the Enso syntax in a way
that allows the IDE to extract information about extension points and other
crucial metadata.

Nevertheless, the macro system _also_ serves as the language's parser,
implementing a sophisticated resolution algorithm that is able to turn the token
stream into the language AST.

Furthermore, the macro system can be extended in future to allow _users_ to
define their own hygienic macros in Enso itself. This will allow them to
manipulate their programs syntactically using _exactly_ the same mechanism as
used in the Enso parser itself.

<!-- MarkdownTOC levels="2,3" autolink="true" indent="  " -->

- [Resolving Macros](#resolving-macros)
  - [Macro Segments](#macro-segments)
  - [Matching](#matching)
- [Macro Matchers](#macro-matchers)
  - [Relative Precedence Matchers](#relative-precedence-matchers)
  - [Named Macros](#named-macros)
- [Macro Resolution Errors](#macro-resolution-errors)
  - [Macro Errors as Parser Errors](#macro-errors-as-parser-errors)
- [User-Defined Macros](#user-defined-macros)
- [Benchmarking](#benchmarking)
  - [Running a Subset of the Benchmarks](#running-a-subset-of-the-benchmarks)
  - [Changing the Macro Resolver](#changing-the-macro-resolver)

<!-- /MarkdownTOC -->

## Resolving Macros

Conceptually, the macro resolution process builds a tree out of the user-defined
macros, segmenting the token-space based on the types of tokens expected by the
various definitions. Resolution then proceeds to walk the tree, pruning the
possible cases until either:

1. An unambiguous resolution is found.
2. An ambiguous resolution is found and an error emitted.

Resolution then recurses, continuing to apply the macro resolution process
against the non-matchable segments in the macro.

Consider, for example, the macros defined as follows:

```rust
let if_then      = "if" >> Matcher::Expr >> "then" >> Matcher::Expr
let if_then_else = "if" >> Matcher::Expr >> "then" >> Matcher::Expr >> "else" >> Matcher::Expr
```

These generate a tree as follows:

```
                       ┌──────┐
                       │ "if" │
                       └──────┘
                           │
                           ├─────────────────┐
                           ▼                 ▼
                   ┌───────────────┐     ┌───────┐
                   │ Matcher::Expr │     │ Other │
                   └───────────────┘     └───────┘
                           │                 │
                           │                 │
                           ▼                 ▼
                       ┌──────┐          ┌──────┐
                       │"then"│          │ Fail │
                       └──────┘          └──────┘
                           │
                           ├─────────────────┐
                           ▼                 ▼
                   ┌───────────────┐     ┌───────┐
                   │ Matcher::Expr │     │ Other │
                   └───────────────┘     └───────┘
                           │                 │
       ┌───────────────────┤                 │
       ▼                   ▼                 ▼
  ┌─────────┐          ┌──────┐          ┌──────┐
  │ Succeed │          │"else"│          │ Fail │
  └─────────┘          └──────┘          └──────┘
                           │
                           ├─────────────────┐
                           ▼                 ▼
                   ┌───────────────┐     ┌───────┐
                   │ Matcher::Expr │     │ Other │
                   └───────────────┘     └───────┘
                           │                 │
                           │                 │
                           ▼                 ▼
                      ┌─────────┐        ┌──────┐
                      │ Succeed │        │ Fail │
                      └─────────┘        └──────┘
```

Within each of the patterns, the macro resolution continues recursively, thus
allowing the macros to match nested uses of themselves (e.g.
`if a then if b then c`).

In reality, this is not how it _actually_ takes place. Applying one macro to
completion and then starting again on all remaining segments provides abysmal
complexity. As a result, the process instead operates on a _stack_ of resolution
contexts, performing its resolution in a linear scan over the token stream.

### Macro Segments

Macros are described in terms of _segments_. A segment consists of three main
components:

1.  An optional _preceding section_ that uses [matchers](#macro-matchers) to
    describe what they match until encountering the _literal_ in 2. This can
    only occur with the first segment.
2.  A _literal_ that must be matched for that segment to apply.
3.  An optional associated [matcher](#macro-matchers) that consumes certain
    _types_ of token.

Segments determine how a given macro matches on the input token stream, and are
used to generate the tree for performing resolution.

### Matching

When a match occurs, the resultant segments are passed to a function that can
perform some action based on those segments. This function is responsible for
manipulating the output of the macro resolver, and hence manipulates the AST.

As the function may assert _additional_ properties on the match not able to be
ascertained by the segment definitions themselves,

## Macro Matchers

The matchers describe the _kinds_ of tokens that can be reasoned about within a
macro definition. These fall across a set of categories as follows.

The boundary matchers are:

- `Start`: Matches the start of the line.
- `End`: Matches the end of the line.

Structural matchers:

- `Nothing`: Never matches on any token.
- `Any`: Matches on any token. This may become a family of tokens (e.g.
  `AnyExpr`), and so on, for performance.
- `Seq`: Matches on the contained matches in sequence.
- `Or`: Matches on one of the contained matches.
- `Many`: Matches on multiple of the given match.
- `Except`: Matches on everything but the given pattern.

Meta matchers:

- `Build`: Performs AST resolution on the match, but this must become implicit.
- `Err`: Allows ascribing a user-defined error to the failure of the contained
  matcher.
- `Tag`: Does nothing, but attaches string metadata to the result. This is very
  useful for syntax highlighting.
- `Exact`: Matches on an explicit AST.

Identifier matchers:

- `Referent`: Matches on a referent identifier.
- `Variable`: Matches on a variable identifier.
- `External`: Matches on an external identifier.
- `Blank`: Matches the blank identifier (`_`).
- `Operator`: Matches on an operator identifier.
- `Modifier`: Matches on a modifier identifier.
- `Annotation`: Matches on an annotation identifier.

Literal matchers:

- `Number`: Matches on a literal number.
- `Text`: Any text literal.
- `TextLine`: Matches a single-line textual literal.
- `TextInlineBlock`: Matches an inline-block textual literal.
- `TextBlock`: Matches a block text literal.

Comment matchers:

- `DisableComment`: Matches a standard disable comment.
- `DocComment`: Matches an Enso doc comment.

Structural matchers:

- `Block`: Matches an Enso block.

Error matchers:

- `DanglingBase`: Matches an erroneous base for a number literal.
- `TextError`: Matches tokens representing error states during textual lexing.
- `InvalidSuffix`: Matches an erroneous suffix token.
- `Unrecognized`: Matches an unrecognised token.

In addition, it is highly likely that some of these will be combined into
aggregate matchers that match one of many kinds of token.

### Relative Precedence Matchers

Where relevant, the matchers above may take an optional "maximum precedence" of
operator that they may match. This effectively serves to implement differing
precedences on either side of a literal operator being implemented via a macro,
and is important for correctly resolving the function arrow `->`.

### Named Macros

In addition to the basic matcher described above, we also need a `Named`,
matcher. The intent behind this matcher is that it allows you to refer to
existing macro definitions by name, and also provides for better error messages.

By convention, we aim for these names to be treated as qualified, based on the
package in which they're declared. This convention should be followed for the
built-in macros as well, even if they can't be overridden, as they may appear in
error messages.

## Macro Resolution Errors

Due to the incredibly structured nature of the resolver it is quite possible for
the macro system to produce very detailed errors when resolution fails. This is
because it knows the path it took to get to the failure, and the expected tokens
or kinds of tokens at the failure site.

In addition, the resolver allows users to define their own errors through the
use of the `Err` matcher mentioned [above](#macro-matchers).

### Macro Errors as Parser Errors

The wealth of information in a failed macro resolution brings the necessary
tools for the compiler to produce informative and detailed parsing errors.

> The actionables for this section are:
>
> - Determine how we ensure provision of failure information to the engine in a
>   useful format.

## User-Defined Macros

In the future we want to provide an Enso-level syntax that allows the user to
define syntactic macros for their programs, similar to how Rust exposes the
`macro_rules!` construct.

> The actionables for this section are:
>
> - Determine what this should look like in surface Enso.
> - Determine exactly how the round-trip mechanic between the runtime and parser
>   should function.

## Benchmarking

All components of the macro resolver are accompanied by comprehensive benchmarks
in order to protect the performance-crucial code against regressions. These
benchmarks are written using
[criterion.rs](https://github.com/bheisler/criterion.rs), and cover all of the
performance critical functionality of the macro resolver.

**Baseline Commit:** TBC (use the latest for now)

The benchmarking process for the macro resolver is as follows:

1. Check out the current _baseline commit_, listed above.
2. In each of the benchmark files, change the configuration line reading
   `.retain_baseline` to `.save_baseline`. This will save the current baseline
   (taken on your machine).
3. In `lexer_bench_sources.rs` change the line that reads `.retain_baseline` to
   instead read `.save_baseline`. This will save the current baseline (taken on
   your machine).
4. Run the benchmarks using `cargo bench`.
5. Once the baseline run has completed, change the above-mentioned lines back to
   `.retain_baseline`. This will disable overwriting the saved baseline, and
   will perform its regression reporting against it.
6. Make your changes.
7. Run the benchmark suite again. It will report any performance regressions in
   the benchmark report, measured against your saved baseline.

Unfortunately, the use of time-based benchmarks means that we can't commit the
baseline to the repository. There is far too much variance between machines for
this to be useful.

### Running a Subset of the Benchmarks

The benchmarks are very comprehensive, testing all performance-critical paths of
the macro resolver. However, it can often be useful to run a subset of these.

There are two main tuning points for this:

1. The _sizes_ of inputs being executed on, where relevant.
2. The benchmarks being executed.

While it is _possible_ to tune the benchmarking config to decrease benchmarking
time, this is not recommended. The current settings are tuned to provide
reliable results.

### Changing the Macro Resolver

When changing the macro resolver the _full_ benchmark suite must be run against
the current baseline before the changes can be merged. This suite run must use
the provided settings for the benchmarking library, and should be performed
using the process described above.