mirror of
https://github.com/idris-lang/Idris2.git
synced 2024-12-19 09:12:34 +03:00
395 lines
12 KiB
ReStructuredText
395 lines
12 KiB
ReStructuredText
Parsing
|
|
=======
|
|
|
|
Idris 2 comes with a Lexing and a Parsing library built into the ``contrib`` package.
|
|
|
|
For this cookbook, we will write a very simple parser for a lambda calculus parser
|
|
that will accept the following language:
|
|
|
|
.. code-block:: text
|
|
|
|
let name = world in (\x.hello x) name
|
|
|
|
Once we write a lambda calculus parser, we will also see how we can take advantage of a
|
|
powerful built in expression parser in Idris 2 to write a small calculator that should be
|
|
smart enough to parse the following expression:
|
|
|
|
.. code-block:: text
|
|
|
|
1 + 2 - 3 * 4 / 5
|
|
|
|
Lexer
|
|
-----
|
|
|
|
The main lexer module is under ``Text.Lexer``. This module contains ``toTokenMap`` which is a function
|
|
that converts a ``List (Lexer, k) -> TokenMap (Token k)`` where ``k`` is a token kind. This function
|
|
could be used for simple lexer to token mappings. The module also includes high level lexers for
|
|
specifying quantity and common programming primitives like ``alphas``, ``intLit``,
|
|
``lineComment`` and ``blockComment``.
|
|
|
|
The ``Text.Lexer`` module also reexports ``Text.Lexer.Core``, ``Text.Quantity`` and ``Text.Token``.
|
|
|
|
``Text.Lexer.Core`` provides the building blocks of the lexer, including a type called
|
|
``Recognise`` which is the underlying data type for the lexer. The other important function that this
|
|
module provides is a ``lex`` which takes in a lexer and returns the tokens.
|
|
|
|
``Text.Quantity`` provides a data type ``Quantity`` which can be used with certain lexers to specify
|
|
how many times something is expected to appear.
|
|
|
|
``Text.Token`` provides a data type ``Token`` that represents a parsed token, its kind and the text.
|
|
This module also provides an important interface called ``TokenKind`` which tells the lexer how to map
|
|
token kinds to Idris 2 types and how to convert each kind from a string to a value.
|
|
|
|
Parser
|
|
------
|
|
|
|
The main parser module is under ``Text.Parser``. This module contains different grammar parsers, the main
|
|
one being ``match`` which takes a ``TokenKind`` and returns the value as defined in the ``TokenKind``
|
|
interface. There are other grammar parsers as well, but for our example, we will only be using ``match``.
|
|
|
|
The ``Text.Parser`` module reexports ``Text.Parser.Core``, ``Text.Quantity`` and ``Text.Token``.
|
|
|
|
``Text.Parser.Core`` provides the building blocks of the parser, including a type called ``Grammar``
|
|
which is the underlying data type for the parser. The other important function that this module provides
|
|
is ``parse`` which takes in a ``Grammar`` and returns the parsed expression.
|
|
|
|
We covered ``Text.Quantity`` and ``Text.Token`` in the Lexer section so we're not going to
|
|
repeat what they do here.
|
|
|
|
Lambda Calculus Lexer & Parser
|
|
------------------------------
|
|
|
|
.. code-block:: idris
|
|
:caption: LambdaCalculus.idr
|
|
:linenos:
|
|
|
|
import Data.List
|
|
import Data.List1
|
|
import Text.Lexer
|
|
import Text.Parser
|
|
|
|
%default total
|
|
|
|
data Expr = App Expr Expr | Abs String Expr | Var String | Let String Expr Expr
|
|
|
|
Show Expr where
|
|
showPrec d (App e1 e2) = showParens (d == App) (showPrec (User 0) e1 ++ " " ++ showPrec App e2)
|
|
showPrec d (Abs v e) = showParens (d > Open) ("\\" ++ v ++ "." ++ show e)
|
|
showPrec d (Var v) = v
|
|
showPrec d (Let v e1 e2) = showParens (d > Open) ("let " ++ v ++ " = " ++ show e1 ++ " in " ++ show e2)
|
|
|
|
data LambdaTokenKind
|
|
= LTLambda
|
|
| LTIdentifier
|
|
| LTDot
|
|
| LTOParen
|
|
| LTCParen
|
|
| LTIgnore
|
|
| LTLet
|
|
| LTEqual
|
|
| LTIn
|
|
|
|
Eq LambdaTokenKind where
|
|
(==) LTLambda LTLambda = True
|
|
(==) LTDot LTDot = True
|
|
(==) LTIdentifier LTIdentifier = True
|
|
(==) LTOParen LTOParen = True
|
|
(==) LTCParen LTCParen = True
|
|
(==) LTLet LTLet = True
|
|
(==) LTEqual LTEqual = True
|
|
(==) LTIn LTIn = True
|
|
(==) _ _ = False
|
|
|
|
Show LambdaTokenKind where
|
|
show LTLambda = "LTLambda"
|
|
show LTDot = "LTDot"
|
|
show LTIdentifier = "LTIdentifier"
|
|
show LTOParen = "LTOParen"
|
|
show LTCParen = "LTCParen"
|
|
show LTIgnore = "LTIgnore"
|
|
show LTLet = "LTLet"
|
|
show LTEqual = "LTEqual"
|
|
show LTIn = "LTIn"
|
|
|
|
LambdaToken : Type
|
|
LambdaToken = Token LambdaTokenKind
|
|
|
|
Show LambdaToken where
|
|
show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
|
|
|
|
TokenKind LambdaTokenKind where
|
|
TokType LTIdentifier = String
|
|
TokType _ = ()
|
|
|
|
tokValue LTLambda _ = ()
|
|
tokValue LTIdentifier s = s
|
|
tokValue LTDot _ = ()
|
|
tokValue LTOParen _ = ()
|
|
tokValue LTCParen _ = ()
|
|
tokValue LTIgnore _ = ()
|
|
tokValue LTLet _ = ()
|
|
tokValue LTEqual _ = ()
|
|
tokValue LTIn _ = ()
|
|
|
|
ignored : WithBounds LambdaToken -> Bool
|
|
ignored (MkBounded (Tok LTIgnore _) _ _) = True
|
|
ignored _ = False
|
|
|
|
identifier : Lexer
|
|
identifier = alpha <+> many alphaNum
|
|
|
|
keywords : List (String, LambdaTokenKind)
|
|
keywords = [
|
|
("let", LTLet),
|
|
("in", LTIn)
|
|
]
|
|
|
|
lambdaTokenMap : TokenMap LambdaToken
|
|
lambdaTokenMap = toTokenMap [(spaces, LTIgnore)] ++
|
|
[(identifier, \s =>
|
|
case lookup s keywords of
|
|
(Just kind) => Tok kind s
|
|
Nothing => Tok LTIdentifier s
|
|
)
|
|
] ++ toTokenMap [
|
|
(exact "\\", LTLambda),
|
|
(exact ".", LTDot),
|
|
(exact "(", LTOParen),
|
|
(exact ")", LTCParen),
|
|
(exact "=", LTEqual)
|
|
]
|
|
|
|
lexLambda : String -> Maybe (List (WithBounds LambdaToken))
|
|
lexLambda str =
|
|
case lex lambdaTokenMap str of
|
|
(tokens, _, _, "") => Just tokens
|
|
_ => Nothing
|
|
|
|
mutual
|
|
expr : Grammar state LambdaToken True Expr
|
|
expr = do
|
|
t <- term
|
|
app t <|> pure t
|
|
|
|
term : Grammar state LambdaToken True Expr
|
|
term = abs
|
|
<|> var
|
|
<|> paren
|
|
<|> letE
|
|
|
|
app : Expr -> Grammar state LambdaToken True Expr
|
|
app e1 = do
|
|
e2 <- term
|
|
app1 $ App e1 e2
|
|
|
|
app1 : Expr -> Grammar state LambdaToken False Expr
|
|
app1 e = app e <|> pure e
|
|
|
|
abs : Grammar state LambdaToken True Expr
|
|
abs = do
|
|
match LTLambda
|
|
commit
|
|
argument <- match LTIdentifier
|
|
match LTDot
|
|
e <- expr
|
|
pure $ Abs argument e
|
|
|
|
var : Grammar state LambdaToken True Expr
|
|
var = map Var $ match LTIdentifier
|
|
|
|
paren : Grammar state LambdaToken True Expr
|
|
paren = do
|
|
match LTOParen
|
|
e <- expr
|
|
match LTCParen
|
|
pure e
|
|
|
|
letE : Grammar state LambdaToken True Expr
|
|
letE = do
|
|
match LTLet
|
|
commit
|
|
argument <- match LTIdentifier
|
|
match LTEqual
|
|
e1 <- expr
|
|
match LTIn
|
|
e2 <- expr
|
|
pure $ Let argument e1 e2
|
|
|
|
parseLambda : List (WithBounds LambdaToken) -> Either String Expr
|
|
parseLambda toks =
|
|
case parse expr $ filter (not . ignored) toks of
|
|
Right (l, []) => Right l
|
|
Right e => Left "contains tokens that were not consumed"
|
|
Left e => Left (show e)
|
|
|
|
parse : String -> Either String Expr
|
|
parse x =
|
|
case lexLambda x of
|
|
Just toks => parseLambda toks
|
|
Nothing => Left "Failed to lex."
|
|
|
|
Testing out our parser gives us back the following output:
|
|
|
|
.. code-block:: text
|
|
|
|
$ idris2 -p contrib LambdaCalculus.idr
|
|
Main> :exec printLn $ parse "let name = world in (\\x.hello x) name"
|
|
Right (let name = world in (\x.hello x) name)
|
|
|
|
Expression Parser
|
|
-----------------
|
|
|
|
Idris 2 also comes with a very convenient expression parser that is
|
|
aware of precedence and associativity in ``Text.Parser.Expression``.
|
|
|
|
The main function called ``buildExpressionParser`` takes in an ``OperatorTable`` and a
|
|
``Grammar`` that represents the terms, and returns a parsed expression. The magic comes from
|
|
the ``OperatorTable`` since this table defines all the operators, the grammars for those operators,
|
|
the precedence, and the associativity.
|
|
|
|
An ``OperatorTable`` is a list of lists containing the ``Op`` type. The ``Op`` type allows you to specify
|
|
``Prefix``, ``Postfix``, and ``Infix`` operators along with their grammars. ``Infix`` also contains the
|
|
associativity called ``Assoc`` which can specify left associativity or ``AssocLeft``, right
|
|
associativity assoc or ``AssocRight`` and as being non-associative or ``AssocNone``.
|
|
|
|
An example of an operator table we'll be using for the calculator is:
|
|
|
|
.. code-block:: idris
|
|
|
|
[
|
|
[ Infix (match CTMultiply >> pure (*)) AssocLeft
|
|
, Infix (match CTDivide >> pure (/)) AssocLeft
|
|
],
|
|
[ Infix (match CTPlus >> pure (+)) AssocLeft
|
|
, Infix (match CTMinus >> pure (-)) AssocLeft
|
|
]
|
|
]
|
|
|
|
This table defines 4 operators for mulitiplication, division, addition and subtraction.
|
|
Mulitiplication and division show up in the first table because they have higher precedence than
|
|
addition and subtraction, which show up in the second table. We're also defining them as infix operators,
|
|
with a specific grammar and all being left associative via ``AssocLeft``.
|
|
|
|
Building a Calculator
|
|
---------------------
|
|
|
|
.. code-block:: idris
|
|
:caption: Calculator.idr
|
|
:linenos:
|
|
|
|
import Data.List1
|
|
import Text.Lexer
|
|
import Text.Parser
|
|
import Text.Parser.Expression
|
|
|
|
%default total
|
|
|
|
data CalculatorTokenKind
|
|
= CTNum
|
|
| CTPlus
|
|
| CTMinus
|
|
| CTMultiply
|
|
| CTDivide
|
|
| CTOParen
|
|
| CTCParen
|
|
| CTIgnore
|
|
|
|
Eq CalculatorTokenKind where
|
|
(==) CTNum CTNum = True
|
|
(==) CTPlus CTPlus = True
|
|
(==) CTMinus CTMinus = True
|
|
(==) CTMultiply CTMultiply = True
|
|
(==) CTDivide CTDivide = True
|
|
(==) CTOParen CTOParen = True
|
|
(==) CTCParen CTCParen = True
|
|
(==) _ _ = False
|
|
|
|
Show CalculatorTokenKind where
|
|
show CTNum = "CTNum"
|
|
show CTPlus = "CTPlus"
|
|
show CTMinus = "CTMinus"
|
|
show CTMultiply = "CTMultiply"
|
|
show CTDivide = "CTDivide"
|
|
show CTOParen = "CTOParen"
|
|
show CTCParen = "CTCParen"
|
|
show CTIgnore = "CTIgnore"
|
|
|
|
CalculatorToken : Type
|
|
CalculatorToken = Token CalculatorTokenKind
|
|
|
|
Show CalculatorToken where
|
|
show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
|
|
|
|
TokenKind CalculatorTokenKind where
|
|
TokType CTNum = Double
|
|
TokType _ = ()
|
|
|
|
tokValue CTNum s = cast s
|
|
tokValue CTPlus _ = ()
|
|
tokValue CTMinus _ = ()
|
|
tokValue CTMultiply _ = ()
|
|
tokValue CTDivide _ = ()
|
|
tokValue CTOParen _ = ()
|
|
tokValue CTCParen _ = ()
|
|
tokValue CTIgnore _ = ()
|
|
|
|
ignored : WithBounds CalculatorToken -> Bool
|
|
ignored (MkBounded (Tok CTIgnore _) _ _) = True
|
|
ignored _ = False
|
|
|
|
number : Lexer
|
|
number = digits
|
|
|
|
calculatorTokenMap : TokenMap CalculatorToken
|
|
calculatorTokenMap = toTokenMap [
|
|
(spaces, CTIgnore),
|
|
(digits, CTNum),
|
|
(exact "+", CTPlus),
|
|
(exact "-", CTMinus),
|
|
(exact "*", CTMultiply),
|
|
(exact "/", CTDivide)
|
|
]
|
|
|
|
lexCalculator : String -> Maybe (List (WithBounds CalculatorToken))
|
|
lexCalculator str =
|
|
case lex calculatorTokenMap str of
|
|
(tokens, _, _, "") => Just tokens
|
|
_ => Nothing
|
|
|
|
mutual
|
|
term : Grammar state CalculatorToken True Double
|
|
term = do
|
|
num <- match CTNum
|
|
pure num
|
|
|
|
expr : Grammar state CalculatorToken True Double
|
|
expr = buildExpressionParser [
|
|
[ Infix ((*) <$ match CTMultiply) AssocLeft
|
|
, Infix ((/) <$ match CTDivide) AssocLeft
|
|
],
|
|
[ Infix ((+) <$ match CTPlus) AssocLeft
|
|
, Infix ((-) <$ match CTMinus) AssocLeft
|
|
]
|
|
] term
|
|
|
|
parseCalculator : List (WithBounds CalculatorToken) -> Either String Double
|
|
parseCalculator toks =
|
|
case parse expr $ filter (not . ignored) toks of
|
|
Right (l, []) => Right l
|
|
Right e => Left "contains tokens that were not consumed"
|
|
Left e => Left (show e)
|
|
|
|
parse1 : String -> Either String Double
|
|
parse1 x =
|
|
case lexCalculator x of
|
|
Just toks => parseCalculator toks
|
|
Nothing => Left "Failed to lex."
|
|
|
|
Testing out our calculator gives us back the following output:
|
|
|
|
.. code-block:: text
|
|
|
|
$ idris2 -p contrib Calculator.idr
|
|
Main> :exec printLn $ parse1 "1 + 2 - 3 * 4 / 5"
|
|
Right 0.6000000000000001
|