catala/CONTRIBUTING.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

169 lines
7.1 KiB
Markdown
Raw Normal View History

2020-06-02 11:26:30 +03:00
# Contributing to Catala
The project is open to external contributions, in the spirit of open source.
2020-12-14 12:59:15 +03:00
If you want to open a pull request, please follow the instructions below.
2020-06-02 11:26:30 +03:00
To ask a question to the Catala team, please open an issue on this repository.
You can also join the [Zulip chat](https://zulip.catala-lang.org/) to ask
2021-01-05 21:05:29 +03:00
any questions about the project.
If you want to contribute to the project on a longer-term basis, or if you have
specific competences as a socio-fiscal lawyer or a programming language specialist,
2020-12-14 12:59:15 +03:00
please [contact the authors](mailto:contact@catala-lang.org).
2022-02-14 01:40:23 +03:00
The Catala team meets over visioconference once every week.
2020-06-02 11:26:30 +03:00
Please note that the copyright of this code is owned by Inria;
by contributing, you disclaim all copyright interests in favor of Inria.
Both the code for the compiler and the examples in this repository are
2020-12-14 12:59:15 +03:00
distributed under the Apache2 license.
2020-06-02 11:26:30 +03:00
2022-02-14 01:40:23 +03:00
## Writing Catala code
2020-06-02 11:26:30 +03:00
Before writing Catala code, please read the
[tutorial](https://catala-lang.org/en/examples/tutorial). You can run the
programs of the tutorial yourself by following the instruction in the
[README of the `examples` directory](examples/README.md). Then, it is suggested
that you create a new example directory again according to the instructions of
2020-12-21 18:02:00 +03:00
this README.
Let us now present the typical Catala workflow. First, you need to locate
the legislative text that you want to use as a reference. Then, simply
2020-12-21 18:02:00 +03:00
copy-paste the text into your source file.
2020-06-02 11:26:30 +03:00
First you will have to format the copy-pasted text using Catala headings
2020-12-21 18:02:00 +03:00
and articles markers:
2020-06-02 11:26:30 +03:00
```markdown
## Heading
2020-06-02 11:26:30 +03:00
2021-03-10 13:33:51 +03:00
### Sub-heading (the more '#', the less important)
2020-06-02 11:26:30 +03:00
2021-06-25 11:50:49 +03:00
#### Legislative atom
2020-06-02 11:26:30 +03:00
```
Please look at the code of other examples to see how to format things properly.
While formatting the text, don't forget regularly to try and parse your example
2020-12-21 18:02:00 +03:00
using for instance
2020-06-02 11:26:30 +03:00
```
make -C examples/foo foo.tex
2022-02-14 01:40:23 +03:00
make -C examples/foo foo.py
make -C examples/foo foo.ml
2020-06-02 11:26:30 +03:00
```
to see if you've made any syntax errors. Once the text formatting is done, you
can start to annotate each legislative atom (article, provision, etc.) with
2020-12-21 18:02:00 +03:00
some Catala code. To open up a code section in Catala, simply use
2020-06-02 11:26:30 +03:00
2021-06-25 11:50:49 +03:00
````markdown
```catala
2020-06-02 11:26:30 +03:00
# In code sections, comments start with #
scope Foo:
<your code goes here>
```
2021-06-25 11:50:49 +03:00
````
2020-06-02 11:26:30 +03:00
While all the code sections are equivalent in terms of execution, you can
mark some as "metadata" so that they are printed differently on lawyer-facing
2020-12-21 18:02:00 +03:00
documents. Here's how it works:
2020-06-02 11:26:30 +03:00
2021-06-25 11:50:49 +03:00
````markdown
```catala-metadata
2020-06-02 11:26:30 +03:00
declaration structure FooBar:
data foo content boolean
2020-12-10 11:27:51 +03:00
data bar content money
2020-06-02 11:26:30 +03:00
<your structure/enumeration/scope declarations goes here>
```
2021-06-25 11:50:49 +03:00
````
Again, make sure to regularly check that your example is parsing correctly. The error message from the compiler should help you debug the syntax if need be. You can also
2020-12-21 18:02:00 +03:00
live-test the programs you wrote by feeding them to the interpreter
(see the [README of the `examples` directory](examples/README.md)); this will
2020-12-21 18:02:00 +03:00
also type-check the programs, which is useful for debugging them.
2020-06-02 11:26:30 +03:00
## Working on the compiler
The Catala compiler is a standard dune-managed OCaml project.
You can look at the
[online OCaml documentation](https://catala-lang.org/ocaml_docs/) for the
2020-12-21 18:02:00 +03:00
different modules' interfaces as well as high-level architecture documentation.
2020-12-14 12:59:15 +03:00
### Example: adding a builtin function
The language provides a limited number of builtin functions, which are sometimes
needed for things that can't easily be expressed in Catala itself; in case you
need more, here is how one can be added:
2021-05-04 18:48:52 +03:00
- Choose a name wisely. Be ready to patch any code that already used the name
for scope parameters, variables or structure fields, since it won't compile
anymore.
- Add an element to the `builtin_expression` type in `surface/ast.ml(i)`
2021-08-20 13:26:45 +03:00
- Add your builtin in the `builtins` list in `surface/lexer.cppo.ml`, and with
proper translations in all of the language-specific modules
`surface/lexer_en.cppo.ml`, `surface/lexer_fr.cppo.ml`, etc. Don't forget the
macro at the beginning of `lexer.cppo.ml`.
- The rest can all be done by following the type errors downstream:
- Add a corresponding element to the lower-level AST in `dcalc/ast.ml(i)`, type `unop`
- Extend the translation accordingly in `surface/desugaring.ml`
- Extend the printer (`dcalc/print.ml`) and the typer with correct type
information (`dcalc/typing.ml`)
- Finally, provide the implementations:
2021-05-04 18:48:52 +03:00
- in `lcalc/to_ocaml.ml`, function `format_unop`
- in `dcalc/interpreter.ml`, function `evaluate_operator`
- Update the syntax guide in `doc/syntax/syntax.tex` with your new builtin
2022-02-14 01:40:23 +03:00
### Internationalization of the Catala syntax
2020-12-14 12:59:15 +03:00
The Catala language should be adapted to any legislative text that follows a
2021-04-02 18:57:44 +03:00
general-to-specifics statutes order. Therefore, there exists multiple versions
2020-12-14 12:59:15 +03:00
of the Catala surface syntax, adapted to the language of the legislative text.
2021-08-20 13:26:45 +03:00
Currently, Catala supports English, French and Polish legislative text via the
`--language=en`, `--language=fr` or `--language=pl` options.
To add support for a new language:
2022-02-14 01:40:23 +03:00
2021-08-20 13:26:45 +03:00
- the basic syntax localisation is defined in
`compiler/surface/lexer_xx.cppo.ml` where `xx` is the language code (`en`,
`fr`...)
- copy the files from another language, e.g.
[english](compiler/surface/lexer_en.cppo.ml), then replace the strings with your
translations. Be careful with the following:
2022-02-14 01:40:23 +03:00
2021-08-20 13:26:45 +03:00
- The file must be encoded in latin-1
- For a given token `FOO`, define `MS_FOO` to be the string version of the
keyword. Due to the encoding, use `\xNN` [escape
sequences](https://ocaml.org/manual/lex.html#escape-sequence) for utf8
characters.
- If the string contains spaces or non-latin1 characters, you need to define
`MR_FOO` as well with a regular expression in [sedlex
format](https://github.com/ocaml-community/sedlex#lexer-specifications).
2022-02-14 01:40:23 +03:00
Replace spaces with `", space_plus, "`, and unicode characters with `", 0xNNNN, "` where `NNNN` is the hexadecimal unicode codepoint.
2021-08-20 13:26:45 +03:00
**Hint:** You may get syntax errors with unhelpful locations because of
2022-02-14 01:40:23 +03:00
`sedlex`. In that case the command `ocamlc _build/default/compiler/surface/lexer_xx.ml` may point you to the source of the
2021-08-20 13:26:45 +03:00
error.
2022-02-14 01:40:23 +03:00
2021-08-20 13:26:45 +03:00
- add your translation to the compilation rules:
- in `compiler/surface/dune`, copying another `parser_xx.cppo.ml` rule
- in the `extensions` list in `compiler/driver.ml`
- add a corresponding variant to `compiler/utils/cli.ml` `backend_lang`, try
to run `make build` and follow all type errors and `match non exhaustive`
warnings to be sure it is well handled everywhere.
- you may want to add syntax highlighting support, see `syntax_highlighting/`
and the rules in `Makefile`
- add examples and documentation!
Feel free to open a pull request for discussion even if you couldn't go through
all these steps, the `lexer_xx.cppo.ml` file is the important part.
2022-05-10 19:28:36 +03:00
### Automatic formatting
Please ensure to submit commits formatted using the included `ocamlformat`
configuration. The `make build` target should ensure that.
In case the formatting rules or ocamlformat version changed remotely, you can
use [this script](https://gist.github.com/AltGr/2891a61f721c8fd85b1da71e10c691b6) to
reformat your branch patch by patch before rebasing.