Merge pull request #4805 from martinbaker/parseLibDoc2

Extend and Fix Parser Library Documentation
2024-09-11 08:55:23 +03:00 · 2020-02-01 01:12:19 +01:00 · 2020-02-01 01:12:19 +01:00 · fc0acd9f00
commit fc0acd9f00
parent 9074a76cb3 161573f75e
4 changed files with 354 additions and 174 deletions
--- a/docs/parserLibrary/introduction.rst
+++ b/docs/parserLibrary/introduction.rst
@ -25,21 +25,21 @@ complicated parser can be made out of many smaller ones.
        - Lexer - This takes the input string and turns it into a list of Tokens.
        - Parser - This takes the list of tokens and outputs the code.

-.. image:: ../image/parserTopLevel.png
-   :width: 307px
-   :height: 76px
-   :alt: diagram illustrating these stages of lexer and parser
+    - .. image:: ../image/parserTopLevel.png
+         :width: 307px
+         :height: 76px
+         :alt: diagram illustrating these stages of lexer and parser

 The advantage of using two stages, like this, is that things like whitespace and
 comments don't need to be considered in every parser rule.

-The  Idris parser library differs from Parsec in that you need to say in the
-Recogniser whether a rule is guaranteed to consume input. This means that the
+The  Idris parser library differs from Parsec in that you need to say, in the
+Recogniser, whether a rule is guaranteed to consume input. This means that the
 Idris type system prevents the construction of recognisers/rules that can't
 consume the input.

 The Lexer is similar but works at the level of characters rather than tokens, and
-you need to provide a TokenMap which says how to build a token representation from
+you need to provide a TokenMap which defines how to build a token from
 a string when a rule is matched.

 The following pages contain a tutorial for the :ref:`parserLibraryLexer` and a
--- a/docs/parserLibrary/lexer.rst
+++ b/docs/parserLibrary/lexer.rst
@ -10,13 +10,16 @@ module:

 lex : TokenMap a -> String -> (List (TokenData a), (Int, Int, String))

-This function takes a String and returns a list of tokens. With the tokens we
-have indexes back to the original string which can be used in error messages:
+.. list-table::

-.. image:: ../image/tokenise.png
-   :width: 330px
-   :height: 73px
-   :alt: diagram illustrating these stages of lexer
+  * - This function takes a String and returns a list of tokens. With the
+      tokens we have indexes back to the original string which can be used
+      in error messages:
+
+    - .. image:: ../image/tokenise.png
+         :width: 330px
+         :height: 73px
+         :alt: diagram illustrating these stages of lexer

 TokenMap
 --------
@ -34,21 +37,24 @@ this information:
  (Lexer, String -> tokenType)

 So for each Lexer in the list, if a substring in the input matches, run
-the associated function to produce a token of type `tokenType`
+the associated function to produce a token of type ``tokenType``

-from Text.Lexer.Core:
+.. list-table::

-.. code-block:: idris
+  * - from Text.Lexer.Core
+      : https://github.com/idris-lang/Idris-dev/blob/master/libs/contrib/Text/Lexer/Core.idr

-  TokenMap : (tokenType : Type) -> Type
-  TokenMap tokenType = List (Lexer, String -> tokenType)
+    - .. code-block:: idris

-We can create a tokenMap by using a function like this:
+        TokenMap : (tokenType : Type) -> Type
+        TokenMap tokenType = List (Lexer, String -> tokenType)

-.. code-block:: idris
+  * - We can create a tokenMap by using a function like this:

-  myTokenMap : TokenMap Token
-  myTokenMap = [(is 'a',CharLit)]
+    - .. code-block:: idris
+
+        myTokenMap : TokenMap Token
+        myTokenMap = [(is 'a',CharLit)]

 Once we have a TokenMap we can use it to lex many strings.

@ -74,7 +80,7 @@ combined, for example,

 .. list-table::

-  * - <+> means sequence two recognisers. If either consumes a character,
+  * - ``<+>`` means sequence two recognisers. If either consumes a character,
      the sequence is guaranteed to consume a character.

    -  .. code-block:: idris
@ -83,7 +89,7 @@ combined, for example,
         SeqEat (Pred (\ARG => intToBool (prim__eqChar ARG 'a')))
               (Delay (is 'b')) : Recognise True

-  * - <|> means if both consume, the combination is guaranteed
+  * - ``<|>`` means if both consume, the combination is guaranteed
      to consumer a character:

    - .. code-block:: idris
@ -161,7 +167,7 @@ able to parse simple arithmetic expressions, like this:
 so we need:

 - Numbers (for now integer literals are good enough).
- Some operators (for now '+', '-' and '*' will do.
+- Some operators (for now ``+``, ``-`` and ``*`` will do.
 - Opening and closing Parentheses.

 We can define these, as tokens, like this:
@ -178,7 +184,7 @@ We can define these, as tokens, like this:
           | EndInput

 It may help with debugging and to implement error messages to
-implement 'show' for these tokens:
+implement ``show`` for these tokens:

 .. code-block:: idris

--- a/docs/parserLibrary/parser.rst
+++ b/docs/parserLibrary/parser.rst
@ -3,8 +3,8 @@
 Parser
 ======

-To run our parser we call 'parse'. This requires a Grammar and the output
-from the lexer (a list of tokens).
+To run the parser we call ``parse``. This requires a grammar and the output
+from the lexer (which is a list of tokens).

 .. code-block:: idris

@ -18,13 +18,13 @@ If successful this returns 'Right' with a pair of

 otherwise it returns 'Left' with the error message.

-So we need to define the Grammar for our parser, this is done using the following
+So we need to define the grammar for our parser, this is done using the following
 'Grammar' data structure. This is a combinator structure, similar in principle
 to the recogniser combinator for the lexer, which was discussed on the
 previous page.

-As with the Recogniser the Grammar type is dependent on a boolean 'consumes'
-value which allows us to ensure that complicated Grammar structures will always
+As with the Recogniser the ``Grammar`` type is dependent on a boolean 'consumes'
+value which allows us to ensure that complicated ``Grammar`` structures will always
 consume input.

 .. code-block:: idris
@ -53,13 +53,13 @@ consume input.
           Grammar tok c1 ty -> Grammar tok c2 ty ->
           Grammar tok (c1 && c2) ty

-So an example of a Grammer type may look something like this:
-'Grammar (TokenData ExpressionToken) True Integer'.
-
+So an example of a grammer type may look something like this:
+``Grammar (TokenData ExpressionToken) True Integer``.
 This is a complicated type name and a given parser will need to use it a lot.
 So to reduce the amount of typing we can use the following type synonym (similar
 to what is done in Idris 2
 : https://github.com/edwinb/Idris2/blob/master/src/Parser/Support.idr
+)

 .. code-block:: idris

@ -71,12 +71,12 @@ Parser Example
 --------------

 On the previous page we implemented a lexer to 'lex' a very simple expression, on
-this page we will go on to implement a parser for this running example.
+this page we will go on to add a parser for this running example.

 .. list-table::

-  * - Expressed in Backus–Naur form (BNF) the syntax we are aiming at is something
-      like this:
+  * - The syntax we are aiming at is something
+      like this (expressed in Backus–Naur form (BNF)):
    - .. code-block:: idris

         <expr> ::= <integer literal>
@ -89,8 +89,8 @@ this page we will go on to implement a parser for this running example.

 .. list-table::

-  * - To start, here is a Grammar to parse an integer literal (that is, a
-      sequence of numbers).
+  * - To start, here is a grammar to parse an integer literal (that is, a
+      sequence of numeric characters).

    - .. code-block:: idris

@ -144,7 +144,7 @@ successful, this is because we are not specifically checking for end-of-file.

 .. list-table::

-  * - The 'intLiteral' function above uses the 'terminal' function to
+  * - The ``intLiteral`` function above uses the ``terminal`` function to
      construct the grammar
    - .. code-block:: idris

@ -174,10 +174,10 @@ Idris 2 uses a slightly different version which stores an error message like
                           OParen => Just 0
                           _ => Nothing)

-Integer value is not really relevant for parenthesis so '0' is used as
+Integer value is not really relevant for parenthesis so ``0`` (zero) is used as
 a default value.

-As before, we can test this out with a function like this:
+As before, this can be tested with a function like this:

 .. code-block:: idris

@ -202,10 +202,10 @@ error for anything else:
                                              List (TokenData ExpressionToken))

 Now we have two Grammars we can try combining them. The following test looks
-for 'openParen' followed by 'intLiteral', the two Grammars are combined using
-'<*>'. The 'map const' part uses the integer value from the first.
+for ``openParen`` followed by ``intLiteral``, the two Grammars are combined using
+``<*>``. The ``map const`` part uses the integer value from the first.

-The following test is looking for '(' followed by a number:
+The following test is looking for ``(`` followed by a number:

 .. code-block:: idris

@ -213,8 +213,8 @@ The following test is looking for '(' followed by a number:
                        (Integer, List (TokenData ExpressionToken))
  test3 s = parse (map const openParen <*> intLiteral) (fst (lex expressionTokens s))

-We can see below that '(' followed by a number is successfully parsed but other
-token lists are not:
+We can see below that ``(`` followed by a number is successfully parsed but, as
+required, other token lists are not:

 .. code-block:: idris

@ -270,21 +270,87 @@ token lists are not:
        paren : Rule Integer -> Rule Integer
        paren exp = openParen *> exp <* closeParen

-The use of '*>' and '<*' instead of '<*>' is an easy way to use the integer
+The use of ``*>`` and ``<*`` instead of ``<*>`` is an easy way to use the integer
 value from the inner expression.

+So lets try to use this to define a grammar which recognises either:
+
+- An integer literal
+- An integer literal inside parenthesis
+- An integer literal inside parenthesis inside parenthesis
+- ... and so on.
+
 .. list-table::

-  * - Now for the operations, in this case: '+', '-' and '*'.
-      The syntax we require is that operators like '+' are infix operators, which
-      would require a definition like this:
+  * - This requires a recursively defined structure like this:

    - .. code-block:: idris

-        expr = (add expr (op "+") expr)
+        partial
+        expr : Rule Integer
+        expr = intLiteral <|> (paren expr)

-This is a potentially infinite structure which is not total.
-In order to work up to this gradually I will start with prefix operators
+This is a valid grammar because every time it is called it is guaranteed to
+consume a token. However, as an Idris structure, it is problematic due to
+the recursion. Defining it as partial allows it to compile but at runtime
+we are likely to get a crash with an (unhelpful) error message like
+``killed by signal 11``.
+
+So a better method is to use ``do`` notation as described in the following
+section.
+
+Monadic Combinator
+------------------
+
+In addition to ``<|>`` and ``<*>`` there is a ``>>=`` combinator, which is
+defined like this:
+
+.. code-block:: idris
+
+  ||| Sequence two grammars. If either consumes some input, the sequence is
+  ||| guaranteed to consume some input. If the first one consumes input, the
+  ||| second is allowed to be recursive (because it means some input has been
+  ||| consumed and therefore the input is smaller)
+  export %inline
+  (>>=) : {c1 : Bool} ->
+        Grammar tok c1 a ->
+        inf c1 (a -> Grammar tok c2 b) ->
+        Grammar tok (c1 || c2) b
+  (>>=) {c1 = False} = SeqEmpty
+  (>>=) {c1 = True} = SeqEat
+
+.. list-table::
+
+  * - This allows us to use ``do`` notation for the previous parenthesis example:
+
+    - .. code-block:: idris
+
+         expr = intLiteral <|> do
+                openParen
+                r <- expr
+                closeParen
+                pure r
+
+This can be more flexible than using the ``<*>`` operator. Also it is defined
+using ``Inf`` so we can implement recursively defined grammars as above.
+
+Implementing the Arithmetic Operators
+-------------------------------------
+
+.. list-table::
+
+  * - Now for the operations, in this case: ``+``, ``-`` and ``*``.
+      The syntax we require for these infix operators
+      would require recursive grammars like this:
+
+    - .. code-block:: idris
+
+        expr = expr (op "+") expr
+
+As already explained, ``do`` notation can allow us to use recursive
+definitions but not necessarily left-recursion like this.
+
+In order to work up to this gradually lets start with prefix operators
 (sometimes known as Polish notation) then modify later for infix operators.

 .. list-table::
@ -295,7 +361,7 @@ In order to work up to this gradually I will start with prefix operators

        expr = (add (op "+") expr expr)

-where 'op' is defined like this:
+where ``op`` is defined like this:

 .. code-block:: idris

@ -308,7 +374,13 @@ where 'op' is defined like this:

 .. list-table::

-  * - and 'add' is defined like this:
+  * - and ``add`` is defined like this:
+
+      Where:
+
+      - x is the add operator.
+      - y is the first operand.
+      - z is the second operand.

    - .. code-block:: idris

@ -322,40 +394,37 @@ where 'op' is defined like this:
            Grammar tok ((c1 || c2) || c3) Integer
        add x y z = map addInt (x *> y) <*> z

-Where:

- x is the add operator.
- y is the first operand.
- z is the second operand.

 The resulting integer will be the sum of the two operands.

-The other operators are defined in a similar way:
+.. list-table::

-.. code-block:: idris
+  * - The other operators are defined in a similar way:

-  subInt : Integer -> Integer -> Integer
-  subInt a b = a-b
+    - .. code-block:: idris

-  export
-  sub : Grammar tok c1 Integer ->
-      Grammar tok c2 Integer ->
-      Grammar tok c3 Integer ->
-      Grammar tok ((c1 || c2) || c3) Integer
-  sub x y z = map subInt (x *> y) <*> z
+        subInt : Integer -> Integer -> Integer
+        subInt a b = a-b

+        export
+        sub : Grammar tok c1 Integer ->
+            Grammar tok c2 Integer ->
+            Grammar tok c3 Integer ->
+            Grammar tok ((c1 || c2) || c3) Integer
+        sub x y z = map subInt (x *> y) <*> z

-  multInt : Integer -> Integer -> Integer
-  multInt a b = a*b
+        multInt : Integer -> Integer -> Integer
+        multInt a b = a*b

-  export
-  mult : Grammar tok c1 Integer ->
-      Grammar tok c2 Integer ->
-      Grammar tok c3 Integer ->
-      Grammar tok ((c1 || c2) || c3) Integer
-  mult x y z = map multInt (x *> y) <*> z
+        export
+        mult : Grammar tok c1 Integer ->
+            Grammar tok c2 Integer ->
+            Grammar tok c3 Integer ->
+            Grammar tok ((c1 || c2) || c3) Integer
+        mult x y z = map multInt (x *> y) <*> z

-So the top level Grammar can now be defined as follows. Note that this is
+So the top level ``Grammar`` can now be defined as follows. Note that this is
 partial as it is a potentially infinite structure and so not total.

 .. code-block:: idris
@ -408,7 +477,7 @@ Then an invalid syntax:
                                        List (TokenData ExpressionToken))

 However if we try something that is invalid, but starts with a valid token,
-then it will return 'Right' (to indicate success)
+then it will return ``Right`` (to indicate success)

 .. code-block:: idris

@ -435,8 +504,8 @@ Infix Notation
 --------------

 So far we have implemented a prefix notation for operators (like this:
-'+ expr expr') but the aim is to implemented an infix notation (like this:
-'expr + expr'). To do this we must be able to deal with potentially
+``+ expr expr``) but the aim is to implemented an infix notation (like this:
+``expr + expr``). To do this we must be able to deal with potentially
 infinite data structures (see Codata Types here :ref:`sect-typefuns`).

 First alter the grammar to have infix operations:
@ -494,10 +563,10 @@ If we have a rule like this:

  A -> (x<*>y) <|> (x<*>z)

-If 'x<*>y' fails but 'x<*>z' would succeed a problem is that, 'x<*>y' has
-already consumed 'x', so now 'x<*>z' will fail.
+If ``x<*>y`` fails but ``x<*>z`` would succeed a problem is that, ``x<*>y`` has
+already consumed ``x``, so now ``x<*>z`` will fail.

-so we could write code to backtrack. That is 'try' 'x<*>y' without consuming
+so we could write code to backtrack. That is ``try`` ``x<*>y`` without consuming
 so that, if the first token succeeds but the following tokens fail, then the
 first tokens would not be consumed.

@ -524,52 +593,64 @@ That is we convert a general context-free grammar to a LL(1) grammar. Although
 not every context-free grammar can be converted to a LL(1) grammar.

 This still does not solve the infinite recursion issue and there is another
-problem: the precedence of the operators +, - and * is not explicit.
+problem: the precedence of the operators ``+``, ``-`` and ``*`` is not explicit.
+
+.. list-table::
+
+  * - To resolve this we can alter the example to have a BNF like this:
+      Where braces ``{ ... }`` mean the contents can occur 1 or more times.
+
+    - .. code-block:: idris
+
+        'expression' ::=  ('+'|'-') 'term' {('+'|'-') 'term'}
+
+        'term' ::=  'factor' {'*' 'factor'}
+
+        'factor' ::=
+           number
+           | '(' 'expression' ')'

-To resolve this we can alter the example like this:

 .. code-block:: idris

-  paren : Rule Integer -> Rule Integer
-  paren exp = openParen *> exp <* closeParen
-
-  addInt : Integer -> Integer -> Integer
-  addInt a b = a+b
-
-  subInt : Integer -> Integer -> Integer
-  subInt a b = a-b
-
-  multInt : Integer -> Integer -> Integer
-  multInt a b = a*b
-
-  partial
  expr : Rule Integer

-  partial
-  exprAtom : Rule Integer
-  exprAtom = intLiteral <|> (paren expr)
+  factor : Rule Integer
+  factor = intLiteral <|> do
+                openParen
+                r <- expr
+                closeParen
+                pure r

-  partial
-  expr1 : Rule Integer
-  expr1 = map multInt exprAtom <*> ((op "*") *> exprAtom)
+  term : Rule Integer
+  term = map multInt factor <*> (
+          (op "*")
+          *> factor)
+       <|> factor

-  partial
-  exprMult : Rule Integer
-  exprMult = expr1 <|> exprAtom
+  expr = map addInt term <*> (
+          (op "+")
+          *> term)
+       <|> map subInt term <*> (
+          (op "-")
+          *> term)
+       <|> term

-  partial
-  expr2 : Rule Integer
-  expr2 = map addInt exprMult <*> ((op "+") *> exprMult)
+  calc : String -> Either (ParseError (TokenData ExpressionToken))
+                        (Integer, List (TokenData ExpressionToken))
+  calc s = parse expr (fst (lex expressionTokens s))

-  partial
-  exprAdd : Rule Integer
-  exprAdd = expr2 <|> exprMult
+  lft : (ParseError (TokenData ExpressionToken)) -> IO ()
+  lft (Error s lst) = putStrLn ("error:"++s)

-  partial
-  expr3 : Rule Integer
-  expr3 = map subInt exprAdd <*> ((op "-") *> expr)
+  rht : (Integer, List (TokenData ExpressionToken)) -> IO ()
+  rht i = putStrLn ("right " ++ (show i))

-  expr = expr3 <|> exprAdd
+  main : IO ()
+  main = do
+    putStr "alg>"
+    x <- getLine
+    either lft rht (calc x) -- eliminator for Either

 As the following tests show, this now handles infix operators and precedence.

@ -579,44 +660,41 @@ As the following tests show, this now handles infix operators and precedence.

    - .. code-block:: idris

-         *parserExInfix> test "2*3"
-         Right (6,
-             []) : Either (ParseError (TokenData ExpressionToken))
-                    (Integer, List (TokenData ExpressionToken))
+         *algebraREPL> :exec
+         alg>2*3
+         right (6, [])

  * - Check that atomic number literals work:

    - .. code-block:: idris

-         *parserExInfix> test "2"
-         Right (2,
-             []) : Either (ParseError (TokenData ExpressionToken))
-                    (Integer, List (TokenData ExpressionToken))
+         *algebraREPL> :exec
+         alg>2
+         right (2, [])

  * - Check that '*' has a higher precedence than '+'.

    - .. code-block:: idris

-         *parserExInfix> test "2+3*4"
-         Right (14,
-             []) : Either (ParseError (TokenData ExpressionToken))
-                    (Integer, List (TokenData ExpressionToken))
+         *algebraREPL> :exec
+         alg>2+3*4
+         right (14, [])

  * - Also '*' has a higher precedence than '+' when the order is reversed.

    - .. code-block:: idris

-         *parserExInfix> test "3*4+2"
-         Right (14,
-             []) : Either (ParseError (TokenData ExpressionToken))
-                    (Integer, List (TokenData ExpressionToken))
+         *algebraREPL> :exec
+         alg>3*4+2
+         right (14, [])

  * - Also precedence can be overridden by parenthesis.

    - .. code-block:: idris

-         *parserExInfix> test "(2+3)*4"
-         Right (20,
-             []) : Either (ParseError (TokenData ExpressionToken))
-                    (Integer, List (TokenData ExpressionToken))
+         *algebraREPL> :exec
+         alg>(2+3)*4
+         right (20, [])

+This is still not correct because it does not correctly parse multiple sums or terms
+like ``1+2+3`` or ``1*2*3``.
--- a/docs/parserLibrary/whitespace.rst
+++ b/docs/parserLibrary/whitespace.rst
@ -8,14 +8,16 @@ discussed how to handle whitespace and comments.

 In some grammars whitespace and comments would not be considered significant and
 so we might be tempted not to generate any tokens for them. However, in the
-running example, we may want to make spaces significant. For example, we might
-want to implement function application expressed by juxtaposition as in
-Haskell and Idris like this: ``f x``.
+running example, we may want to make spaces significant in the future. For
+example, we might want to implement function application expressed by
+juxtaposition as in Haskell and Idris like this: ``f x``.

-Also, I'm not sure if it is an problem for error messages and so on if some
-text does not have corresponding tokens.
+So, on this page, we will add a token for whitespace and comments. We will
+then consider two ways to process this:

-So, on this page, we will add a token for whitespace and comments.
+- Filter all whitespace tokens from the lexer results before passing to
+  the parser.
+- Process the whitespace tokens in the parser.

 Whitespace and Comments in Lexer
 --------------------------------
@ -70,10 +72,31 @@ like this:
  spaces : Lexer
  spaces = some space

+If these whitespace tokens are not significant, that is, they can appear
+anywhere and they are optional then we can filter them out before the parser
+gets them like this:
+
+.. code-block:: idris
+
+  processWhitespace : (List (TokenData ExpressionToken), Int, Int, String)
+                  -> (List (TokenData ExpressionToken), Int, Int, String)
+  processWhitespace (x,l,c,s) = ((filter notComment x),l,c,s) where
+      notComment : TokenData ExpressionToken -> Bool
+      notComment t = case tok t of
+                          Comment _ => False
+                          _ => True
+
+  calc : String -> Either (ParseError (TokenData ExpressionToken))
+                        (Integer, List (TokenData ExpressionToken))
+  calc s = parse expr (fst (processWhitespace (lex expressionTokens s)))
+
+
 Whitespace and Comments in Parser
 ---------------------------------

-Now that there are ``Comment`` tokens the parser needs to be able to handle them.
+If we sometimes require whitespace to be significant then we can't filter
+them out as above. In this case the ``Comment`` tokens are sent to the parser
+which now needs to be able to handle them.

 .. code-block:: idris

@ -85,42 +108,68 @@ Now that there are ``Comment`` tokens the parser needs to be able to handle them
 So far we don't have any syntax that requires spaces to be significant so we
 need to define the grammar so that it will parse with, or without, spaces.
 This needs to be done in a systematic way, here I have defined the grammar so
-that there is an optional space to the right of every atom or operator:
+that there is an optional space to the right of every atom or operator.
+First add versions of ``intLiteral`` , ``openParen`` , ``closeParen`` 
+and ``op`` that allow optional spaces/comments to the right of them:

 .. code-block:: idris

-  partial
-  exprAtom : Rule Integer
-  exprAtom = (intLiteral <* commentSpace)
-           <|> intLiteral <|> (paren expr)
+  intLiteralC : Rule Integer
+  intLiteralC = (intLiteral <* commentSpace) <|> intLiteral

-  partial
-  expr1 : Rule Integer
-  expr1 = map multInt exprAtom <*> (
-          (((op "*") <* commentSpace) <|> (op "*"))
-          *> exprAtom)
+  openParenC : Rule Integer
+  openParenC = (openParen <* commentSpace) <|> openParen

-  partial
-  exprMult : Rule Integer
-  exprMult = expr1 <|> exprAtom
+  closeParenC : Rule Integer
+  closeParenC = (closeParen <* commentSpace) <|> closeParen

-  partial
-  expr2 : Rule Integer
-  expr2 = map addInt exprMult <*> (
-          (((op "+") <* commentSpace) <|> (op "+"))
-          *> exprMult)
+  ||| like op but followed by optional comment or space
+  opC : String -> Rule Integer
+  opC s = ((op s) <* commentSpace) <|> (op s)

-  partial
-  exprAdd : Rule Integer
-  exprAdd = expr2 <|> exprMult
+Then just use these functions instead of the original functions:

-  partial
-  expr3 : Rule Integer
-  expr3 = map subInt exprAdd <*> (
-          (((op "-") <* commentSpace) <|> (op "-"))
-          *> expr)
+.. code-block:: idris
+
+  expr : Rule Integer
+
+  factor : Rule Integer
+  factor = intLiteralC <|> do
+                openParenC
+                r <- expr
+                closeParenC
+                pure r
+
+  term : Rule Integer
+  term = map multInt factor <*> (
+          (opC "*")
+          *> factor)
+       <|> factor
+
+  expr = map addInt term <*> (
+          (opC "+")
+          *> term)
+       <|> map subInt term <*> (
+          (opC "-")
+          *> term)
+       <|> term
+
+  calc : String -> Either (ParseError (TokenData ExpressionToken))
+                        (Integer, List (TokenData ExpressionToken))
+  calc s = parse expr (fst (lex expressionTokens s))
+
+  lft : (ParseError (TokenData ExpressionToken)) -> IO ()
+  lft (Error s lst) = putStrLn ("error:"++s)
+
+  rht : (Integer, List (TokenData ExpressionToken)) -> IO ()
+  rht i = putStrLn ("right " ++ (show i))
+
+  main : IO ()
+  main = do
+    putStr "alg>"
+    x <- getLine
+    either lft rht (calc x) -- eliminator for Either

-  expr = expr3 <|> exprAdd

 Defining Block Structure using Indents
 --------------------------------------
@ -141,14 +190,61 @@ We can see how Idris2 does it here
  init : IndentInfo
  init = 0

+EndInput Token
+--------------

+So far, the parser will return a successful result even if the full input
+is not consumed. To ensure that the top level syntax is fully matched we
+add a ``EndInput`` token to indicate the last token.

+``EndInput`` is added to the other tokens like this:

+.. code-block:: idris

+  data ExpressionToken = Number Integer
+           | Operator String
+           | OParen
+           | CParen
+           | Comment String
+           | EndInput

+A rule to consume this token is added:

+.. code-block:: idris

+  eoi : Rule Integer
+  eoi = terminal (\x => case tok x of
+                           EndInput => Just 0
+                           _ => Nothing)

+Instead of using ``expr`` at the top level of the syntax we can now define
+``exprFull`` as shown here. This will make sure that only when ``EndInput``
+is consumed will the parse be successful:

+.. code-block:: idris
+
+  exprFull : Rule Integer
+  exprFull = expr <* eoi
+
+The following code makes sure ``EndInput`` is added to the end of the token
+list:
+
+.. code-block:: idris
+
+  processWhitespace : (List (TokenData ExpressionToken), Int, Int, String)
+                  -> (List (TokenData ExpressionToken), Int, Int, String)
+  processWhitespace (x,l,c,s) = ((filter notComment x)++
+                                      [MkToken l c EndInput],l,c,s) where
+      notComment : TokenData ExpressionToken -> Bool
+      notComment t = case tok t of
+                          Comment _ => False
+                          _ => True
+
+All we have to do now is use ``exprFull`` instead of ``expr``:
+
+.. code-block:: idris
+
+  calc : String -> Either (ParseError (TokenData ExpressionToken))
+                        (Integer, List (TokenData ExpressionToken))
+  calc s = parse exprFull (fst (processWhitespace (lex expressionTokens s)))

-