Document adding new languages

This commit is contained in:
Louis Gesbert 2021-08-20 12:26:45 +02:00
parent dfb358993c
commit e7ad186bd7

View File

@ -104,9 +104,10 @@ need more, here is how one can be added:
for scope parameters, variables or structure fields, since it won't compile
anymore.
- Add an element to the `builtin_expression` type in `surface/ast.ml(i)`
- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with proper
translations in all of the language-specific modules `surface/lexer_en.cppo.ml`,
`surface/lexer_fr.cppo.ml`, etc.
- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with
proper translations in all of the language-specific modules
`surface/lexer_en.cppo.ml`, `surface/lexer_fr.cppo.ml`, etc. Don't forget the
macro at the beginning of `lexer.cppo.ml`.
- The rest can all be done by following the type errors downstream:
- Add a corresponding element to the lower-level AST in `dcalc/ast.ml(i)`, type `unop`
- Extend the translation accordingly in `surface/desugaring.ml`
@ -123,11 +124,40 @@ The Catala language should be adapted to any legislative text that follows a
general-to-specifics statutes order. Therefore, there exists multiple versions
of the Catala surface syntax, adapted to the language of the legislative text.
Currently, Catala supports English and French legislative text via the
`--language=en`, `--language=fr` or `--language=pl` option.
Currently, Catala supports English, French and Polish legislative text via the
`--language=en`, `--language=fr` or `--language=pl` options.
Technically, support for new languages can be added via a new lexer. If you want
to add a new language, you can start from
[existing lexer examples](compiler/surface/lexer_fr.ml), tweak and open
a pull request. If you don't feel familiar enough with OCaml to do so, please
leave an issue on this repository.
To add support for a new language:
- the basic syntax localisation is defined in
`compiler/surface/lexer_xx.cppo.ml` where `xx` is the language code (`en`,
`fr`...)
- copy the files from another language, e.g.
[english](compiler/surface/lexer_en.cppo.ml), then replace the strings with your
translations. Be careful with the following:
- The file must be encoded in latin-1
- For a given token `FOO`, define `MS_FOO` to be the string version of the
keyword. Due to the encoding, use `\xNN` [escape
sequences](https://ocaml.org/manual/lex.html#escape-sequence) for utf8
characters.
- If the string contains spaces or non-latin1 characters, you need to define
`MR_FOO` as well with a regular expression in [sedlex
format](https://github.com/ocaml-community/sedlex#lexer-specifications).
Replace spaces with `", space_plus, "`, and unicode characters with `",
0xNNNN, "` where `NNNN` is the hexadecimal unicode codepoint.
**Hint:** You may get syntax errors with unhelpful locations because of
`sedlex`. In that case the command `ocamlc
_build/default/compiler/surface/lexer_xx.ml` may point you to the source of the
error.
- add your translation to the compilation rules:
- in `compiler/surface/dune`, copying another `parser_xx.cppo.ml` rule
- in the `extensions` list in `compiler/driver.ml`
- add a corresponding variant to `compiler/utils/cli.ml` `backend_lang`, try
to run `make build` and follow all type errors and `match non exhaustive`
warnings to be sure it is well handled everywhere.
- you may want to add syntax highlighting support, see `syntax_highlighting/`
and the rules in `Makefile`
- add examples and documentation!
Feel free to open a pull request for discussion even if you couldn't go through
all these steps, the `lexer_xx.cppo.ml` file is the important part.