2020-12-14 17:23:04 +03:00
|
|
|
{0 Catala surface representation }
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
This representation is the first in the compilation chain
|
|
|
|
(see {{: index.html#architecture} Architecture}). Its purpose is to
|
2020-12-14 17:23:04 +03:00
|
|
|
host the output of the Catala parser, before any transformations have been made.
|
|
|
|
|
|
|
|
The module describing the abstract syntax tree is:
|
|
|
|
|
|
|
|
{!modules: Surface.Ast}
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
This representation can also be weaved into literate programming outputs
|
2020-12-14 20:09:38 +03:00
|
|
|
using the {{:literate.html} literate programming modules}.
|
|
|
|
|
2020-12-14 17:23:04 +03:00
|
|
|
{1 Lexing }
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
The lexing in the Catala compiler is done using
|
|
|
|
{{: https://github.com/ocaml-community/sedlex} sedlex}, the modern OCaml lexer
|
|
|
|
that offers full support for UTF-8. This support enables users of non-English
|
2020-12-14 17:23:04 +03:00
|
|
|
languages to use their favorite diacritics and symbols in their code.
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
While the parser of Catala is unique, three different lexers can be used to
|
2020-12-14 17:23:04 +03:00
|
|
|
produce the parser tokens.
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
{ul
|
|
|
|
{li {!module: Surface.Lexer_common} corresponds to a concise and programming-language-like
|
|
|
|
syntax for Catala. Examples of this syntax can be found in the test suite
|
2020-12-14 17:23:04 +03:00
|
|
|
of the compiler.}
|
2022-01-02 16:53:51 +03:00
|
|
|
{li {!module: Surface.Lexer_en} is the adaptation of {!module: Surface.Lexer_common}
|
2020-12-14 17:23:04 +03:00
|
|
|
with verbose English keywords matching legal concepts.}
|
2022-01-02 16:53:51 +03:00
|
|
|
{li {!module: Surface.Lexer_fr} is the adaptation of {!module: Surface.Lexer_common}
|
2020-12-14 17:23:04 +03:00
|
|
|
with verbose French keywords matching legal concepts.}
|
|
|
|
}
|
|
|
|
|
|
|
|
Relevant modules:
|
|
|
|
|
2022-01-19 12:54:16 +03:00
|
|
|
{!modules: Surface.Lexer_common Surface.Lexer_fr Surface.Lexer_en}
|
|
|
|
|
|
|
|
|
|
|
|
{1 Parsing }
|
2020-12-14 17:23:04 +03:00
|
|
|
|
|
|
|
The Catala compiler uses {{: http://cambium.inria.fr/~fpottier/menhir/} Menhir}
|
2022-01-19 12:54:16 +03:00
|
|
|
to perform its parsing.
|
2020-12-14 17:23:04 +03:00
|
|
|
|
2022-01-19 12:54:16 +03:00
|
|
|
{!module: Surface.Parser} is the main file where the parser tokens and the
|
|
|
|
grammar is declared. It is automatically translated into its parsing automata
|
2020-12-14 17:23:04 +03:00
|
|
|
equivalent by Menhir.
|
|
|
|
|
2022-01-19 12:54:16 +03:00
|
|
|
In order to provide decent syntax error messages, the Catala compiler uses the
|
|
|
|
novel error handling provided by Menhir and detailed in Section 11 of the
|
|
|
|
{{: http://cambium.inria.fr/~fpottier/menhir/manual.pdf} Menhir manual}.
|
2020-12-14 17:23:04 +03:00
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
A [parser.messages] source file has been manually annotated with custom
|
|
|
|
error message for every potential erroneous state of the parser, and Menhir
|
|
|
|
automatically generated the {!module: Surface.Parser_errors} module containing
|
2020-12-14 17:23:04 +03:00
|
|
|
the function linking the erroneous parser states to the custom error message.
|
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
To wrap it up, {!module: Surface.Parser_driver} glues all the parsing and
|
|
|
|
lexing together to perform the translation from source code to abstract syntax
|
2020-12-14 17:23:04 +03:00
|
|
|
tree, with meaningful error messages.
|
|
|
|
|
|
|
|
Relevant modules:
|
|
|
|
|
2022-01-19 12:54:16 +03:00
|
|
|
{!modules: Surface.Parser Surface.Parser_driver Surface.Parser_errors}
|
|
|
|
|
|
|
|
{1 Name resolution and translation }
|
2020-12-14 17:23:04 +03:00
|
|
|
|
2022-01-02 16:53:51 +03:00
|
|
|
The desugaring consists of translating {!module: Surface.Ast} to
|
2020-12-14 19:00:42 +03:00
|
|
|
{!module: Desugared.Ast} of the {{: desugared.html} desugared representation}.
|
2022-01-02 16:53:51 +03:00
|
|
|
The translation is implemented in
|
|
|
|
{!module: Surface.Desugaring}, but it relies on a helper module to perform the
|
2020-12-14 17:23:04 +03:00
|
|
|
name resolution: {!module: Surface.Name_resolution}. Indeed, in
|
2022-01-02 16:53:51 +03:00
|
|
|
{!module: Surface.Ast}, the variables identifiers are just [string], whereas in
|
|
|
|
{!module: Desugared.Ast} they have been turned into well-categorized types
|
2020-12-14 17:23:04 +03:00
|
|
|
with an unique identifier like {!type: Scopelang.Ast.ScopeName.t}.
|
2022-01-19 12:54:16 +03:00
|
|
|
|
|
|
|
Relevant modules:
|
|
|
|
|
|
|
|
{!modules: Surface.Name_resolution Surface.Desugaring}
|