design-patterns-for-parser-combinators

j-mie6/design-patterns-for-parser-combinators

mirror of https://github.com/j-mie6/design-patterns-for-parser-combinators.git synced 2024-10-03 17:29:44 +03:00

History

Nicolas Wu c8c135f182 Added parser, but tests timeout (#6 ) This is the trace for our Haskell Symposium talk, consisting of the following "steps". Check out the original PR to see each of the individual commits and how it all came together! * Added parser, but tests timeout * Added chainl1 and chainr1 combinators (Pattern 1a: Homogeneous Chains) ``` {---------------------------------------------} {- Pattern 1a: Homogenous Chains -} {---------------------------------------------} {- For binary operators where the -} {- associativity is not specified, use -} {- `chainl1` or `chainr1` to combine -} {- operands with the operators. -} {---------------------------------------------} ``` The combinators `chainl1` and `chainr1` are commonly found in many parser combinator libraries, and are the canonical way of handling the left-recursion problem _without_ resorting to grammar refactoring. In `megaparsec`, however, the chains are replaced by precedence parsers. * Fixed parser using chainl1 ``` {---------------------------------------------} {- Pattern 1a: Homogenous Chains -} {---------------------------------------------} {- For binary operators where the -} {- associativity is not specified, use -} {- `chainl1` or `chainr1` to combine -} {- operands with the operators. -} {---------------------------------------------} ``` * Fixed whitespace parsing issue (Patterns 2a and 2bi) ``` {---------------------------------------------} {- Pattern 2a: Whitespace Combinators -} {---------------------------------------------} {- Build a `lexeme` combinator dedicated to -} {- consuming trailing whitespace after a -} {- given lexeme. Build a `fully` combinator -} {- to consume initial whitespace and the end -} {- of input. -} {---------------------------------------------} ``` After factoring the tokens into `Lexer.hs`, building the new combinators helps separate individual responsibilities. ``` {---------------------------------------------} {- Pattern 2bi: Tokenising Combinators -} {---------------------------------------------} {- Annotate terminals with a `token` -} {- combinator, built on `lexeme`, to -} {- atomically parse them with whitespace -} {- consumed. -} {---------------------------------------------} ``` Building a `token` combinator helps keep things atomic. * Added keyword combinator to fix "negatex" -> "negate x" (Pattern 2bii: Keyword Combinators) ``` {---------------------------------------------} {- Pattern 2bii: Keyword Combinators -} {---------------------------------------------} {- Avoid using `string` with `token` for -} {- keywords, use a `keyword` combinator that -} {- enforces that the keyword does not form a -} {- valid prefix of another token. -} {---------------------------------------------} ``` Using a `keyword` combinator helps to distingish keywords from identifiers and other tokens. * Fixed pretty printer Pretty printer was missing its parentheses, and the tests are comparing on pretty printed terms for some reason? All that is fixed now! * Added heterogeneous chains: infixl1, infixr1, prefix, and postfix (Pattern 1b: Heterogeneous Chains) ``` {---------------------------------------------} {- Pattern 1b: Heterogeneous Chains -} {---------------------------------------------} {- For associative binary operators where -} {- operand types may differ, use `infixl1` -} {- or `infixr1` to combine operands with -} {- their operators, in conjunction with -} {- strongly typed semantic actions -} {---------------------------------------------} ``` The problem with the traditional `chainl1` and `chainr1` is that they are indistinguishable in their types, which can cause minor typos to cause bugs. We offer _refined_ versions called `infixl1` and infixr1` that encode the associativity in the types. To use these, the AST needs to more strongly adhere to the grammar, though the original chains can be recovered by using `id :: a -> a`. * Switched to strong ast and infixes ``` {---------------------------------------------} {- Pattern 1b: Heterogeneous Chains -} {---------------------------------------------} {- For associative binary operators where -} {- operand types may differ, use `infixl1` -} {- or `infixr1` to combine operands with -} {- their operators, in conjunction with -} {- strongly typed semantic actions -} {---------------------------------------------} ``` Switching over to the `StrongAST`, broken into `Expr`, `Term`, `Negate`, and `Atom` allows us to use the heterogeneous chains, and improve the type safety of the parser. * Used OverloadedStrings to encapsulate lexing behaviour, tests working! (Pattern 2c: Overloaded Strings) ``` {---------------------------------------------} {- Pattern 2c: Overloaded Strings -} {---------------------------------------------} {- Hide tokenising logic by allowing string -} {- literals to serve as parsers. -} {---------------------------------------------} ``` By using `OverloadedStrings` we can hide all the tokenising logic behind the `IsString` instance for `Parser ()`. This means that we can add and remove keywords without needing to change anything in the parser, whilst keeping it looking clean. If the extension is undesirable, a combinator performing the job of `fromString` is also a great idea. Co-authored-by: Jamie Willis <j.willis19@imperial.ac.uk>		2021-08-26 16:08:23 +01:00
..
common	Almost working parser, enough to check the infrastructure is working!	2021-08-23 09:04:30 +01:00
convert	Added tests for converters	2021-08-20 17:19:58 +01:00
parser	Added parser, but tests timeout (#6 )	2021-08-26 16:08:23 +01:00