mirror of
https://github.com/enso-org/enso.git
synced 2024-11-27 18:12:31 +03:00
61 lines
1.9 KiB
Markdown
61 lines
1.9 KiB
Markdown
---
|
|
layout: developer-doc
|
|
title: Lexer
|
|
category: syntax
|
|
tags: [parser, lexer]
|
|
order: 4
|
|
---
|
|
|
|
# Lexer
|
|
|
|
The lexer is the code generated by the [flexer](./flexer.md) that is actually
|
|
responsible for lexing Enso source code. It chunks the character stream into a
|
|
(structured) token stream in order to make later processing faster, and to
|
|
identify blocks
|
|
|
|
<!-- MarkdownTOC levels="2,3" autolink="true" -->
|
|
|
|
- [Lexer Functionality](#lexer-functionality)
|
|
- [The Lexer AST](#the-lexer-ast)
|
|
|
|
<!-- /MarkdownTOC -->
|
|
|
|
## Lexer Functionality
|
|
|
|
The lexer needs to provide the following functionality as part of the parser.
|
|
|
|
- It consumes the source lazily, character by character, and produces a
|
|
structured token stream consisting of the lexer [ast](#the-lexer-ast).
|
|
- It must succeed on _any_ input, even if there are invalid constructs in the
|
|
token stream, represented by `Invalid` tokens.
|
|
|
|
## The Lexer AST
|
|
|
|
In contrast to the full parser [ast](./ast.md), the lexer operates on a
|
|
simplified AST that we call a 'structured token stream'. While most lexers
|
|
output a linear token stream, it is very important in Enso that we encode the
|
|
nature of _blocks_ into the token stream, hence giving it structure.
|
|
|
|
This encoding of blocks is _crucial_ to the functionality of Enso as it ensures
|
|
that no later stages of the parser can ignore blocks, and hence maintains them
|
|
for use by the GUI.
|
|
|
|
It contains the following constructs:
|
|
|
|
- `Var`: Variable identifiers.
|
|
- `Ref`: Referrent identifiers.
|
|
- `Opr`: Operator identifiers.
|
|
- `Number`: Numbers.
|
|
- `Text`: Text.
|
|
- `Invalid`: Invalid constructs that cannot be lexed.
|
|
- `Block`: Syntactic blocks in the language.
|
|
|
|
The distinction is made here between the various kinds of identifiers in order
|
|
to keep lexing fast, but also in order to allow macros to switch on the kinds of
|
|
identifiers.
|
|
|
|
> The actionables for this section are:
|
|
>
|
|
> - Determine if we want to have separate ASTs for the lexer and the parser, or
|
|
> not.
|