catala/compiler/surface/surface.mld

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

79 lines
3.1 KiB
Plaintext
Raw Normal View History

2020-12-14 17:23:04 +03:00
{0 Catala surface representation }
This representation is the first in the compilation chain
(see {{: index.html#architecture} Architecture}). Its purpose is to
2020-12-14 17:23:04 +03:00
host the output of the Catala parser, before any transformations have been made.
The module describing the abstract syntax tree is:
{!modules: Surface.Ast}
This representation can also be weaved into literate programming outputs
2020-12-14 20:09:38 +03:00
using the {{:literate.html} literate programming modules}.
2020-12-14 17:23:04 +03:00
{1 Lexing }
The lexing in the Catala compiler is done using
{{: https://github.com/ocaml-community/sedlex} sedlex}, the modern OCaml lexer
that offers full support for UTF-8. This support enables users of non-English
2020-12-14 17:23:04 +03:00
languages to use their favorite diacritics and symbols in their code.
While the parser of Catala is unique, three different lexers can be used to
2020-12-14 17:23:04 +03:00
produce the parser tokens.
{ul
{li {!module: Surface.Lexer_common} corresponds to a concise and programming-language-like
syntax for Catala. Examples of this syntax can be found in the test suite
2020-12-14 17:23:04 +03:00
of the compiler.}
{li {!module: Surface.Lexer_en} is the adaptation of {!module: Surface.Lexer_common}
2020-12-14 17:23:04 +03:00
with verbose English keywords matching legal concepts.}
{li {!module: Surface.Lexer_fr} is the adaptation of {!module: Surface.Lexer_common}
2020-12-14 17:23:04 +03:00
with verbose French keywords matching legal concepts.}
}
Relevant modules:
2022-01-19 12:54:16 +03:00
{!modules: Surface.Lexer_common Surface.Lexer_fr Surface.Lexer_en}
{1 Parsing }
2020-12-14 17:23:04 +03:00
The Catala compiler uses {{: http://cambium.inria.fr/~fpottier/menhir/} Menhir}
2022-01-19 12:54:16 +03:00
to perform its parsing.
2020-12-14 17:23:04 +03:00
2022-01-19 12:54:16 +03:00
{!module: Surface.Parser} is the main file where the parser tokens and the
grammar is declared. It is automatically translated into its parsing automata
2020-12-14 17:23:04 +03:00
equivalent by Menhir.
2022-01-19 12:54:16 +03:00
In order to provide decent syntax error messages, the Catala compiler uses the
novel error handling provided by Menhir and detailed in Section 11 of the
{{: http://cambium.inria.fr/~fpottier/menhir/manual.pdf} Menhir manual}.
2020-12-14 17:23:04 +03:00
A [parser.messages] source file has been manually annotated with custom
error message for every potential erroneous state of the parser, and Menhir
automatically generated the {!module: Surface.Parser_errors} module containing
2020-12-14 17:23:04 +03:00
the function linking the erroneous parser states to the custom error message.
To wrap it up, {!module: Surface.Parser_driver} glues all the parsing and
lexing together to perform the translation from source code to abstract syntax
2020-12-14 17:23:04 +03:00
tree, with meaningful error messages.
Relevant modules:
2022-01-19 12:54:16 +03:00
{!modules: Surface.Parser Surface.Parser_driver Surface.Parser_errors}
{1 Name resolution and translation }
2020-12-14 17:23:04 +03:00
The desugaring consists of translating {!module: Surface.Ast} to
2020-12-14 19:00:42 +03:00
{!module: Desugared.Ast} of the {{: desugared.html} desugared representation}.
The translation is implemented in
{!module: Surface.Desugaring}, but it relies on a helper module to perform the
2020-12-14 17:23:04 +03:00
name resolution: {!module: Surface.Name_resolution}. Indeed, in
{!module: Surface.Ast}, the variables identifiers are just [string], whereas in
{!module: Desugared.Ast} they have been turned into well-categorized types
2020-12-14 17:23:04 +03:00
with an unique identifier like {!type: Scopelang.Ast.ScopeName.t}.
2022-01-19 12:54:16 +03:00
Relevant modules:
{!modules: Surface.Name_resolution Surface.Desugaring}