Do initial design for the lexer (#947)

2024-12-23 16:32:18 +03:00 · 2020-06-26 14:54:20 +01:00 · 2020-06-26 14:54:20 +01:00 · f0551f7693
commit f0551f7693
parent 0e139ee42a
12 changed files with 462 additions and 190 deletions
--- a/docs/parser/README.md
+++ b/docs/parser/README.md
@ -31,6 +31,8 @@ below:
  stream of source code.
 - [**Macro Resolution:**](./macro-resolution.md) The system for defining and
  resolving macros on the token stream.
 - [**Operator Resolution:**](./operator-resolution.md) The system for resolving
  operator applications properly.
 - [**Construct Resolution:**](./construct-resolution.md) The system for
  resolving higher-level language constructs in the AST to produce a useful
  output.
--- a/docs/parser/architecture.md
+++ b/docs/parser/architecture.md
@ -8,43 +8,72 @@ order: 2
 # Parser Architecture Overview
 The Enso parser is designed in a highly modular fashion, with separate crates
-responsible for the component's various responsibilities. The main components of
+responsible for the component's various responsibilities. The overall
-the parser are described below.
+architecture for the parser is described in this document.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Overall Architecture](#overall-architecture)
 - [Reader](#reader)
 - [Flexer](#flexer)
 - [Lexer](#lexer)
 - [Macro Resolution](#macro-resolution)
 - [Operator Resolution](#operator-resolution)
 - [Construct Resolution](#construct-resolution)
 - [Parser Driver](#parser-driver)
  - [AST](#ast)
 - [JVM Object Generation](#jvm-object-generation)
 <!-- /MarkdownTOC -->
 ## Overall Architecture
 The overall architecture of the parser subsystem can be visualised as follows.
-## Reader
+```
-
+                               ┌───────────────┐
-## Flexer
+                               │  Source Code  │
-
+                               └───────────────┘
-## Lexer
+                                       │
-
+                                       │
-## Macro Resolution
+                                       ▽
-
+  ┌─────────────────────────────────────────────────────────────────────────┐
-## Operator Resolution
+  │ ┌──────────────┐                Parser                                  │
-
+  │ │ UTF-X Reader │                                                        │
-## Construct Resolution
+  │ └──────────────┘                                                        │
-
+  │         │                                                               │
-## Parser Driver
+  │         │  Character                                                    │
-
+  │         │   Stream                                                      │
-### AST
+  │         ▽                                                               │
-
+  │    ┌────────┐                                                           │
-## JVM Object Generation
+  │    │ Lexer  │                                                           │
-
+  │    │┌──────┐│                                                           │
- Should wrap the parser as a whole into a new module, built for the engine
+  │    ││Flexer││                                                           │
  │    │└──────┘│                                                           │
  │    └────────┘                                                           │
  │         │                                                               │
  │         │  Structured                                                   │
  │         │ Token Stream                                                  │
  │         ▽                                                               │
  │  ┌────────────┐              ┌────────────┐              ┌────────────┐ │
  │  │            │              │            │              │            │ │
  │  │   Macro    │   Rust AST   │  Operator  │   Rust AST   │ Construct  │ │
  │  │ Resolution │─────────────▷│ Resolution │─────────────▷│ Resolution │ │
  │  │            │              │            │              │            │ │
  │  └────────────┘              └────────────┘              └────────────┘ │
  │                                                                 │       │
  │                                                        Rust AST │       │
  │                                                                 ▽       │
  │                                                          ┌────────────┐ │
  │                                                          │ AST Output │ │
  │                                                          └────────────┘ │
  └─────────────────────────────────────────────────────────────────────────┘
                                       │
                       ┌───────────────┤ Rust AST
                       ▽               │
                ┌────────────┐         │
                │            │         │
                │ JVM Object │         └─────────────────┐
                │ Generator  │                           │
                │            │                           │
                └────────────┘                           │
                       │                                 │
               JVM AST │                                 │
                       ▽                                 ▽
                ┌────────────┐                    ┌────────────┐
                │            │                    │            │
                │ Use in JVM │                    │ Direct Use │
                │    Code    │                    │in Rust Code│
                │            │                    │            │
                └────────────┘                    └────────────┘
 ```
--- a/docs/parser/ast.md
+++ b/docs/parser/ast.md
@ -1,13 +1,40 @@
 ---
 layout: developer-doc
-title: Parser Driver
+title: AST
 category: parser
 tags: [parser, ast]
-order: 8
+order: 9
 ---
-# Parser Driver
+# AST
 The parser AST describes the high-level syntactic structure of Enso, as well as
 containing robust and descriptive parser errors directly in the AST.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Functionality](#functionality)
 <!-- /MarkdownTOC -->
 ## Functionality
 The parser AST needs to account for the following:
 - A single `Name` type, removing the distinction between different names found
  in the [lexer](./lexer.md). This should provide functions `is_var`, `is_opr`,
  and `is_ref`.
 - It should contain all of the language constructs that may appear in Enso's
  source.
 - It should contain `Invalid` nodes, but these should be given a descriptive
  error as to _why_ the construct is invalid.
 - It should also contain `Ambiguous` nodes, where a macro cannot be resolved in
  an unambiguous fashion.
 Each node should contain:
 - An identifier, attributed to it from the ID map.
 - The start source position of the node, and the length (span) of the node.
 > The actionables for this section are:
 >
 > - Flesh out the design for the AST based on the requirements of the various
 >   parser phases.
--- a/docs/parser/construct-resolution.md
+++ b/docs/parser/construct-resolution.md
@ -3,11 +3,37 @@ layout: developer-doc
 title: Construct Resolution
 category: parser
 tags: [parser, construct, resolution]
-order: 6
+order: 7
 ---
 # Construct Resolution
 Construct resolution is the process of turning the low-level AST format into the
 full high-level AST format that represents both all of Enso's language
 constructs and contains rich error nodes.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Syntax Errors](#syntax-errors)
 <!-- /MarkdownTOC -->
 > The actionables for this section are:
 >
 > - Produce a detailed design for this resolution functionality, accounting for
 >   all known current use cases.
 ## Syntax Errors
 It is very important that Enso is able to provide descriptive and useful syntax
 errors to its users. Doing so requires that it has a full understanding of the
 language's syntax, but also that it is designed in such a fashion that it will
 always succeed, regardless of any errors. Errors must be:
 - Highly descriptive, so that it is easy for the runtime to explain to the user
  what went wrong.
 - Highly localised, so that the scope of the error has as minimal an impact on
  parsing as possible.
 > The actionables for this section are:
 >
 > - Determine how to design this parsing phase to obtain very accurate syntax
 >   errors.
--- a/docs/parser/flexer.md
+++ b/docs/parser/flexer.md
@ -7,7 +7,191 @@ order: 3
 ---
 # Flexer
 The flexer is a finite-automata-based engine for generating lexers. Akin to
 `flex` and other lexer generators, it is given a definition as a series of rules
 from which it then generates code for a highly-optimised lexer.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Pattern Description](#pattern-description)
 - [State Management](#state-management)
 - [Code Generation](#code-generation)
    - [Notes on Code Generation](#notes-on-code-generation)
 - [An Example](#an-example)
 <!-- /MarkdownTOC -->
 ## Pattern Description
 The definition of a lexer using the flexer library consists of a set of rules
 for how to behave when matching portions of syntax. These rules behave as
 follows:
 - A rule describes a regex-like pattern.
 - It also describes the code to be executed when the pattern is matched.
 ```rust
 pub fn lexer_definition() -> String {
    ...
    let chr = alphaNum | '_';
    let blank = Pattern::from('_')
    lexer.rule(lexer.root,blank,"self.on_ident(token::blank(self.start_location))");
 }
 ```
 A pattern, such as `chr`, or `blank` is a description of the characters that
 should be matched for that pattern to match. The flexer library provides a set
 of basic matchers for doing this.
 A `lexer.rule(...)` definition consists of the following parts:
 - A state, used for grouping rules and named for debugging (see the section on
  [state management](#state-management) below).
 - A pattern, as described above.
 - The code that is executed when the pattern matches.
 ## State Management
 States in the flexer engine provide a mechanism for grouping sets of rules
 together known as `State`. At any given time, only rules from the _active_ state
 are considered by the lexer.
 - States are named for purposes of debugging.
 - You can activate another state from within the flexer instance by using
  `state.push(new_state)`.
 - You can deactivate the topmost state by using `state.pop()`.
 ## Code Generation
 The patterns in a lexer definition are used to generate a highly-efficient and
 specialised lexer. This translation process works as follows:
 1.  All rules are taken and used to generate an NFA.
 2.  A DFA is generated from the NFA using the standard
    [subset construction](https://en.wikipedia.org/wiki/Powerset_construction)
    algorithm, but with some additional optimisations that ensure the following
    properties hold:
    - Patterns are matched in the order that they are defined.
    - The associated code chunks are maintained properly.
    - Lexing is `O(n)`, where `n` is the size of the input.
 3.  The DFA is used to generate the code for a lexer `Engine` struct, containing
    the `Lexer` definition.
 The `Engine` generated through this process contains a main loop that consumes
 the input stream character-by-character, evaluating a big switch generated from
 the DFA using functions from the `Lexer`.
 Lexing proceeds from top-to-bottom of the rules, and the first expression that
 _matches fully_ is chosen. This differs from other common lexer generators, as
 they mostly choose the _longest_ match instead. Once the pattern is matched, the
 associated code is executed and the process starts over again until the input
 stream has been consumed.
 ### Notes on Code Generation
 The following properties are likely to hold for the code generation machinery.
 - The vast majority of the code generated by the flexer is going to be the same
  for all lexers.
 - The primary generation is in `consume_next_character`, which takes a `Lexer`
  as an argument.
 ## An Example
 The following code provides a sketchy example of the intended API for the
 flexer code generation using the definition of a simple lexer.
 ```rust
 use crate::prelude::*;
 use flexer;
 use flexer::Flexer;
 // =============
 // === Token ===
 // =============
 pub struct Token {
    location : flexer::Location,
    ast      : TokenAst,
 }
 enum TokenAst {
    Var(ImString),
    Cons(ImString),
    Blank,
    ...
 }
 impl Token {
    pub fn new(location:Location, ast:TokenAst) -> Self {
        Self {location,ast}
    }
    pub fn var(location:Location, name:impl Into<ImString>) -> Self {
        let ast = TokenAst::Var(name.into());
        Self::new(location,ast)
    }
    ...
 }
 // =============
 // === Lexer ===
 // =============
 #[derive(Debug,Default)]
 struct Lexer<T:Flexer::State> {
    current : Option<Token>,
    tokens  : Vec<Token>,
    state   : T
 }
 impl Lexer {
    fn on_ident(&mut self, tok:Token) {
        self.current = Some(tok);
        self.state.push(self.ident_sfx_check);
    }
    fn on_ident_err_sfx(&mut self) {
        println!("OH NO!")
    }
    fn on_no_ident_err_sfx(&mut self) {
        let current = std::mem::take(&mut self.current).unwrap();
        self.tokens.push_back(current);
    }
 }
 impl Flexer::Definition Lexer {
    fn state     (&    self) -> &    flexer::State { &    self.state }
    fn state_mut (&mut self) -> &mut flexer::State { &mut self.state }
 }
 pub fn lexer_source_code() -> String {
    let lexer = Flexer::<Lexer<_>>::new();
    let chr     = alphaNum | '_';
    let blank   = Pattern::from('_');
    let body    = chr.many >> '\''.many();
    let var     = lowerLetter >> body;
    let cons    = upperLetter >> body;
    let breaker = "^`!@#$%^&*()-=+[]{}|;:<>,./ \t\r\n\\";
    let sfx_check = lexer.add(State("Identifier Suffix Check"));
    lexer.rule(lexer.root,var,"self.on_ident(Token::var(self.start_location,self.current_match()))");
    lexer.rule(lexer.root,cons,"self.on_ident(token::cons(self.start_location,self.current_match()))");
    lexer.rule(lexer.root,blank,"self.on_ident(token::blank(self.start_location))");
    lexer.rule(sfx_check,err_sfx,"self.on_ident_err_sfx()");
    lexer.rule(sfx_check,Flexer::always,"self.on_no_ident_err_sfx()");
    ...
    lexer.generate_specialized_code() // This code needs to become a source file, probably via build.rs
 }
 ```
 Some things to note:
 - The function definitions in `Lexer` take `self` as their first argument
  because `Engine` implements `Deref` and `DerefMut` to `Lexer`.
--- a/docs/parser/jvm-object-generation.md
+++ b/docs/parser/jvm-object-generation.md
@ -3,11 +3,19 @@ layout: developer-doc
 title: JVM Object Generation
 category: parser
 tags: [parser, jvm, object-generation]
-order: 9
+order: 10
 ---
 # JVM Object Generation
 The JVM object generation phase is responsible for creating JVM-native objects
 representing the parser AST from the rust-native AST. This is required to allow
 the compiler and runtime to work with the AST.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 <!-- /MarkdownTOC -->
 > The actionables for this section are:
 >
 > - Work out how on earth this is going to work.
 > - Produce a detailed design for this functionality.
--- a/docs/parser/lexer.md
+++ b/docs/parser/lexer.md
@ -7,7 +7,51 @@ order: 4
 ---
 # Lexer
 The lexer is the code generated by the [flexer](./flexer.md) that is actually
 responsible for lexing Enso source code. It chunks the character stream into a
 (structured) token stream in order to make later processing faster, and to
 identify blocks
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Lexer Functionality](#lexer-functionality)
 - [The Lexer AST](#the-lexer-ast)
 <!-- /MarkdownTOC -->
 ## Lexer Functionality
 The lexer needs to provide the following functionality as part of the parser.
 - It consumes the source lazily, character by character, and produces a
  structured token stream consisting of the lexer [ast](#the-lexer-ast).
 - It must succeed on _any_ input, even if there are invalid constructs in the
  token stream, represented by `Invalid` tokens.
 ## The Lexer AST
 In contrast to the full parser [ast](./ast.md), the lexer operates on a
 simplified AST that we call a 'structured token stream'. While most lexers
 output a linear token stream, it is very important in Enso that we encode the
 nature of _blocks_ into the token stream, hence giving it structure.
 This encoding of blocks is _crucial_ to the functionality of Enso as it ensures
 that no later stages of the parser can ignore blocks, and hence maintains them
 for use by the GUI.
 It contains the following constructs:
 - `Var`: Variable identifiers.
 - `Ref`: Referrent identifiers.
 - `Opr`: Operator identifiers.
 - `Number`: Numbers.
 - `Text`: Text.
 - `Invalid`: Invalid constructs that cannot be lexed.
 - `Block`: Syntactic blocks in the language.
 The distinction is made here between the various kinds of identifiers in order
 to keep lexing fast, but also in order to allow macros to switch on the kinds of
 identifiers.
 > The actionables for this section are:
 >
 > - Determine if we want to have separate ASTs for the lexer and the parser, or
 >   not.
--- a/docs/parser/macro-resolution.md
+++ b/docs/parser/macro-resolution.md
@ -7,7 +7,39 @@ order: 5
 ---
 # Macro Resolution
 Macro resolution is the process of taking the structured token stream from the
 [lexer](./lexer.md), and resolving it into the [ast](./ast.md) through the
 process of resolving macros. This process produces a chunked AST stream,
 including spacing-unaware elements.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Functionality](#functionality)
 - [Errors During Macro Resolution](#errors-during-macro-resolution)
 <!-- /MarkdownTOC -->
 ## Functionality
 The current functionality of the macro resolver is as follows:
 - TBC
 The current overview of the macro resolution process can be found in the scala
 [implementation](../../lib/syntax/specialization/shared/src/main/scala/org/enso/syntax/text/Parser.scala).
 > The actionables for this section are:
 >
 > - Discuss how the space-unaware AST should be handled as it is produced by
 >   macros.
 > - Handle precedence for operators properly within macro resolution (e.g.
 >   `x : a -> b : a -> c` should parse with the correct precedence).
 > - Create a detailed design for how macro resolution should work.
 ## Errors During Macro Resolution
 It is very important that, during macro resolution, the resolver produces
 descriptive errors for error conditions in the macro resolver.
 > The actionables for this section are:
 >
 > - Determine how best to provide detailed and specific errors from within the
 >   macro resolution engine.
--- a/docs/parser/operator-resolution.md
+++ b/docs/parser/operator-resolution.md
@ -0,0 +1,28 @@
 ---
 layout: developer-doc
 title: Operator Resolution
 category: parser
 tags: [parser, operator, resolution]
 order: 6
 ---
 # Operator Resolution
 Operator resolution is the process of resolving applications of operators into
 specific nodes on the AST.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Resolution Algorithm](#resolution-algorithm)
 <!-- /MarkdownTOC -->
 ## Resolution Algorithm
 The operator resolution process uses a version of the classic
 [shunting-yard algorithm](https://en.wikipedia.org/wiki/Shunting-yard_algorithm)
 with modifications to support operator sections.
 > The actionables for this section are:
 >
 > - Work out how to formulate this functionality efficiently in rust. The scala
 >   implementation can be found
 >   [here](../../lib/syntax/definition/src/main/scala/org/enso/syntax/text/prec/Operator.scala).
--- a/docs/parser/parser-driver.md
+++ b/docs/parser/parser-driver.md
@ -3,11 +3,26 @@ layout: developer-doc
 title: Parser Driver
 category: parser
 tags: [parser, driver]
-order: 7
+order: 8
 ---
 # Parser Driver
 The parser driver component is responsible for orchestrating the entire action
 of the parser. It handles the following duties:
 1.  Consuming input text using a provided [reader](./reader.md) in a lazy
    fashion.
 2.  Lexing and then parsing the input text.
 3.  Writing the output AST to the client of the parser.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Driver Clients](#driver-clients)
 <!-- /MarkdownTOC -->
 ## Driver Clients
 The parser is going to be employed in two contexts, both running in-process:
 1. In the IDE codebase as a rust dependency.
 2. In the engine as a native code dependency used via JNI.
--- a/docs/parser/parser.md
+++ b/docs/parser/parser.md
@ -1,153 +0,0 @@
 # Parser Design
 ## 1. Lexer (Code -> Token Stream)
 - Lexer needs to be generic over the input stream encoding to support utf-16
  coming from the JVM.
 - Is there any use case that requires the lexer to read an actual file?
 - The prelude needs to be released to crates.io otherwise we're going to rapidly
  get out of sync.
 - I don't think it makes sense to have separate `Var` and `Cons` identifiers. We
  should instead have `Name`, with functions `is_referrent` and `is_variable`.
  This better mirrors how the language actually treats names.
 - What actually is the flexer?
 - What should the AST look like?
 Lexer reads source file (lazily, line by line) or uses in-memory `&str` and produces token stream of `Var`, `Cons`, `Opr`, `Number`, `Text`, `Invalid`, and `Block`. Please note that `Block` is part of the token stream on purpose. It is important that the source code is easy to parse visually, so if you see a block, it should be a block. Discovering blocks in lexer allows us to prevent all other parts of parser, like macros, from breaking this assumption. Moreover, it makes the design of the following stages a lot simpler.  Enso lexer should always succeed, on any input stream (token stream could contain `Invalid` tokens). 
 Lexer is defined using Rust procedural macro system. We are using procedural macros, because the lexer definition produces a Rust code (pastes it "in-place" of the macro usage). Let's consider a very simple lexer definition:
 ```rust
 use crate::prelude::*; // Needs to be a released crate
 use flexer;
 use flexer::Flexer;
 // =============
 // === Token ===
 // =============
 pub struct Token {
    location : flexer::Location,
    ast      : TokenAst,
 }
 enum TokenAst {
    Var(ImString),
    Cons(ImString),
    Blank,
    ...
 }
 impl Token {
    pub fn new(location:Location, ast:TokenAst) -> Self {
        Self {location,ast}  
    }
    pub fn var(location:Location, name:impl Into<ImString>) -> Self {
        let ast = TokenAst::Var(name.into());
        Self::new(location,ast)      
    }
    ...
 }
 // =============
 // === Lexer ===
 // =============
 #[derive(Debug,Default)]
 struct Lexer {
    current : Option<Token>,
    tokens  : Vec<Token>,
    state   : Flexer::State
 }
 impl Lexer {
    fn on_ident(&mut self, tok:Token) {
        self.current = Some(tok);
        self.state.push(self.ident_sfx_check);
    }
    fn on_ident_err_sfx(&mut self) {
        println!("OH NO!")
    }
    fn on_no_ident_err_sfx(&mut self) {
        let current = std::mem::take(&mut self.current).unwrap();
        self.tokens.push_back(current);
    }
 }
 impl Flexer::Definition Lexer {
    fn state     (&    self) -> &    flexer::State { &    self.state }
    fn state_mut (&mut self) -> &mut flexer::State { &mut self.state }
 }
 pub fn lexer_source_code() -> String {
    let lexer = Flexer::<Lexer>::new();
    let chr     = alphaNum | '_';
    let blank   = Pattern::from('_');
    let body    = chr.many >> '\''.many();
    let var     = lowerLetter >> body;
    let cons    = upperLetter >> body;
    let breaker = "^`!@#$%^&*()-=+[]{}|;:<>,./ \t\r\n\\";
    let sfx_check = lexer.add(State("Identifier Suffix Check"));
    lexer.rule(lexer.root,var,"self.on_ident(Token::var(self.start_location,self.current_match()))");
    lexer.rule(lexer.root,cons,"self.on_ident(token::cons(self.start_location,self.current_match()))");
    lexer.rule(lexer.root,blank,"self.on_ident(token::blank(self.start_location))");
    lexer.rule(sfx_check,err_sfx,"self.on_ident_err_sfx()");
    lexer.rule(sfx_check,Flexer::always,"self.on_no_ident_err_sfx()");
    ...
    lexer.generate_specialized_code()
 }
 ```
 The idea here is that we are describing regexp-like patterns and tell what should happen when the pattern is matched. For example, after matching the `var` pattern, the code `self.on_ident(ast::Var)` should be evaluated. The code is passed as string, because it will be part of the generated, highly specialized, very fast lexer.
 Technically, the patterns are first translated to a state machine, and then to a bunch of if-then-else statements in such a way, that parsing is always `O(n)` where `n` is the input size. Logically, the regular expressions are matched top-bottom  and the first fully-matched expression is chosen (unlike in the popular lexer generator flex, which uses longest match instead). After the expression is chosen, the associated function is executed and the process starts over again till the end of the input stream. Only the rules from the currently active state are considered. State is just a named (for debug purposes only) set of rules. Lexer always starts with the `lexer.root` state. You can make other state active by running (from within Flexer instance) `state.push(new_state)`, and pop it using `state.pop()`.
 The `lexer.generate_specialized_code` first works in a few steps:
 1. It takes all rules and states and generates an NFA state machine.
 2. It generates DFA state machine using some custom optimizations to make sure that the regexps are matched in order and the associated code chunks are not lost.
 3. It generates a highly tailored lexer `Engine` struct. One of the fields of the engine is the `Lexer` struct we defined above. The engine contains a main "loop" which consumes char by char, evaluates a big if-then-else machinery generated from the NFA, and evaluates functions from the `Lexer`. Please note that the functions start with `self`, that's because `Engine` implements `Deref` and `DerefMut` to `Lexer`.
 The generation of the if-then-else code block is not defined in this document, but can be observed by:
 1. Inspecting the current code in Scala.
 2. Printing the Java code generated by current Scala Flexer implementation.
 3. Talking with @wdanilo about it.
 ## 2. Macro Resolution (Token Stream -> Chunked AST Stream incl spaec-unaware AST)
 To be described in detail taking into consideration all current use cases. For the current documentation of macro resolution, take a look here: https://github.com/luna/enso/blob/main/lib/syntax/specialization/shared/src/main/scala/org/enso/syntax/text/Parser.scala 
 Before implementing this step, we need to talk about handling of space-unaware AST (the AST produced by user-macros).
 ## 3. Operator Resolution (Chunked AST Stream -> Chunked AST Stream with Opr Apps)
 Using modified [Shunting-yard algorithm](https://en.wikipedia.org/wiki/Shunting-yard_algorithm). The algorithm is modified to support sections. The Scala implementation is here: https://github.com/luna/enso/blob/main/lib/syntax/definition/src/main/scala/org/enso/syntax/text/prec/Operator.scala . Unfortunatelly, we cannot use recursion in Rust, so it needs to be re-worked.
 ## 4. Finalization and Special Rules Discovery (Chunked AST Stream with Opr Apps -> AST)
 To be described in detail taking into consideration all current use cases.
--- a/docs/parser/reader.md
+++ b/docs/parser/reader.md
@ -3,11 +3,41 @@ layout: developer-doc
 title: Reading Source Code
 category: parser
 tags: [parser, reader]
-order: 9
+order: 11
 ---
 # Reading Source Code
 The reader is responsible for abstracting the interface to reading a character
 from a stream. This handles abstracting away the various encodings that the
 project is going to use, as well as backing formats for the stream.
 <!-- MarkdownTOC levels="2,3" autolink="true" -->
 - [Reader Functionality](#reader-functionality)
 - [Provided Readers](#provided-readers)
    - [UTF-8 Reader](#utf-8-reader)
    - [UTF-16 Reader](#utf-16-reader)
 <!-- /MarkdownTOC -->
 ## Reader Functionality
 The reader trait needs to have the following functionality:
 - It must read its input _lazily_, not requiring the entire input to be in
  memory.
 - It should provide the interface to `next_character`, returning rust-native
  UTF-8, and hence abstract away the various underlying encodings.
 ## Provided Readers
 The parser implementation currently provides the following reader utilities to
 clients.
 ### UTF-8 Reader
 Rust natively uses UTF-8 encoding for its strings. In order for the IDE to make
 use of the parser, it must provide a simple rust-native reader.
 ### UTF-16 Reader
 As the JVM as a platform makes use of UTF-16 for encoding its strings, we need
 to provide a reader that will let JVM clients of the parser provide the source
 code in a streaming fashion without needing to re-encode it prior to passing it
 to the parser.