mirror of
https://github.com/CatalaLang/catala.git
synced 2024-11-08 07:51:43 +03:00
Document adding new languages
This commit is contained in:
parent
dfb358993c
commit
e7ad186bd7
@ -104,9 +104,10 @@ need more, here is how one can be added:
|
||||
for scope parameters, variables or structure fields, since it won't compile
|
||||
anymore.
|
||||
- Add an element to the `builtin_expression` type in `surface/ast.ml(i)`
|
||||
- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with proper
|
||||
translations in all of the language-specific modules `surface/lexer_en.cppo.ml`,
|
||||
`surface/lexer_fr.cppo.ml`, etc.
|
||||
- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with
|
||||
proper translations in all of the language-specific modules
|
||||
`surface/lexer_en.cppo.ml`, `surface/lexer_fr.cppo.ml`, etc. Don't forget the
|
||||
macro at the beginning of `lexer.cppo.ml`.
|
||||
- The rest can all be done by following the type errors downstream:
|
||||
- Add a corresponding element to the lower-level AST in `dcalc/ast.ml(i)`, type `unop`
|
||||
- Extend the translation accordingly in `surface/desugaring.ml`
|
||||
@ -123,11 +124,40 @@ The Catala language should be adapted to any legislative text that follows a
|
||||
general-to-specifics statutes order. Therefore, there exists multiple versions
|
||||
of the Catala surface syntax, adapted to the language of the legislative text.
|
||||
|
||||
Currently, Catala supports English and French legislative text via the
|
||||
`--language=en`, `--language=fr` or `--language=pl` option.
|
||||
Currently, Catala supports English, French and Polish legislative text via the
|
||||
`--language=en`, `--language=fr` or `--language=pl` options.
|
||||
|
||||
Technically, support for new languages can be added via a new lexer. If you want
|
||||
to add a new language, you can start from
|
||||
[existing lexer examples](compiler/surface/lexer_fr.ml), tweak and open
|
||||
a pull request. If you don't feel familiar enough with OCaml to do so, please
|
||||
leave an issue on this repository.
|
||||
To add support for a new language:
|
||||
- the basic syntax localisation is defined in
|
||||
`compiler/surface/lexer_xx.cppo.ml` where `xx` is the language code (`en`,
|
||||
`fr`...)
|
||||
- copy the files from another language, e.g.
|
||||
[english](compiler/surface/lexer_en.cppo.ml), then replace the strings with your
|
||||
translations. Be careful with the following:
|
||||
- The file must be encoded in latin-1
|
||||
- For a given token `FOO`, define `MS_FOO` to be the string version of the
|
||||
keyword. Due to the encoding, use `\xNN` [escape
|
||||
sequences](https://ocaml.org/manual/lex.html#escape-sequence) for utf8
|
||||
characters.
|
||||
- If the string contains spaces or non-latin1 characters, you need to define
|
||||
`MR_FOO` as well with a regular expression in [sedlex
|
||||
format](https://github.com/ocaml-community/sedlex#lexer-specifications).
|
||||
Replace spaces with `", space_plus, "`, and unicode characters with `",
|
||||
0xNNNN, "` where `NNNN` is the hexadecimal unicode codepoint.
|
||||
|
||||
**Hint:** You may get syntax errors with unhelpful locations because of
|
||||
`sedlex`. In that case the command `ocamlc
|
||||
_build/default/compiler/surface/lexer_xx.ml` may point you to the source of the
|
||||
error.
|
||||
- add your translation to the compilation rules:
|
||||
- in `compiler/surface/dune`, copying another `parser_xx.cppo.ml` rule
|
||||
- in the `extensions` list in `compiler/driver.ml`
|
||||
- add a corresponding variant to `compiler/utils/cli.ml` `backend_lang`, try
|
||||
to run `make build` and follow all type errors and `match non exhaustive`
|
||||
warnings to be sure it is well handled everywhere.
|
||||
- you may want to add syntax highlighting support, see `syntax_highlighting/`
|
||||
and the rules in `Makefile`
|
||||
- add examples and documentation!
|
||||
|
||||
Feel free to open a pull request for discussion even if you couldn't go through
|
||||
all these steps, the `lexer_xx.cppo.ml` file is the important part.
|
||||
|
Loading…
Reference in New Issue
Block a user