From e7ad186bd70dacd78986d819b6f3dea0b600acd7 Mon Sep 17 00:00:00 2001 From: Louis Gesbert Date: Fri, 20 Aug 2021 12:26:45 +0200 Subject: [PATCH] Document adding new languages --- CONTRIBUTING.md | 50 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 10 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b289b030..a46bbdf7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -104,9 +104,10 @@ need more, here is how one can be added: for scope parameters, variables or structure fields, since it won't compile anymore. - Add an element to the `builtin_expression` type in `surface/ast.ml(i)` -- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with proper - translations in all of the language-specific modules `surface/lexer_en.cppo.ml`, - `surface/lexer_fr.cppo.ml`, etc. +- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with + proper translations in all of the language-specific modules + `surface/lexer_en.cppo.ml`, `surface/lexer_fr.cppo.ml`, etc. Don't forget the + macro at the beginning of `lexer.cppo.ml`. - The rest can all be done by following the type errors downstream: - Add a corresponding element to the lower-level AST in `dcalc/ast.ml(i)`, type `unop` - Extend the translation accordingly in `surface/desugaring.ml` @@ -123,11 +124,40 @@ The Catala language should be adapted to any legislative text that follows a general-to-specifics statutes order. Therefore, there exists multiple versions of the Catala surface syntax, adapted to the language of the legislative text. -Currently, Catala supports English and French legislative text via the -`--language=en`, `--language=fr` or `--language=pl` option. +Currently, Catala supports English, French and Polish legislative text via the +`--language=en`, `--language=fr` or `--language=pl` options. -Technically, support for new languages can be added via a new lexer. If you want -to add a new language, you can start from -[existing lexer examples](compiler/surface/lexer_fr.ml), tweak and open -a pull request. If you don't feel familiar enough with OCaml to do so, please -leave an issue on this repository. +To add support for a new language: +- the basic syntax localisation is defined in + `compiler/surface/lexer_xx.cppo.ml` where `xx` is the language code (`en`, + `fr`...) +- copy the files from another language, e.g. + [english](compiler/surface/lexer_en.cppo.ml), then replace the strings with your + translations. Be careful with the following: + - The file must be encoded in latin-1 + - For a given token `FOO`, define `MS_FOO` to be the string version of the + keyword. Due to the encoding, use `\xNN` [escape + sequences](https://ocaml.org/manual/lex.html#escape-sequence) for utf8 + characters. + - If the string contains spaces or non-latin1 characters, you need to define + `MR_FOO` as well with a regular expression in [sedlex + format](https://github.com/ocaml-community/sedlex#lexer-specifications). + Replace spaces with `", space_plus, "`, and unicode characters with `", + 0xNNNN, "` where `NNNN` is the hexadecimal unicode codepoint. + + **Hint:** You may get syntax errors with unhelpful locations because of + `sedlex`. In that case the command `ocamlc + _build/default/compiler/surface/lexer_xx.ml` may point you to the source of the + error. +- add your translation to the compilation rules: + - in `compiler/surface/dune`, copying another `parser_xx.cppo.ml` rule + - in the `extensions` list in `compiler/driver.ml` + - add a corresponding variant to `compiler/utils/cli.ml` `backend_lang`, try + to run `make build` and follow all type errors and `match non exhaustive` + warnings to be sure it is well handled everywhere. +- you may want to add syntax highlighting support, see `syntax_highlighting/` + and the rules in `Makefile` +- add examples and documentation! + +Feel free to open a pull request for discussion even if you couldn't go through +all these steps, the `lexer_xx.cppo.ml` file is the important part.