catala/CONTRIBUTING.md
2022-03-08 15:01:18 +01:00

6.9 KiB

Contributing to Catala

The project is open to external contributions, in the spirit of open source. If you want to open a pull request, please follow the instructions below.

To ask a question to the Catala team, please open an issue on this repository. You can also join the Zulip chat to ask any questions about the project.

If you want to contribute to the project on a longer-term basis, or if you have specific competences as a socio-fiscal lawyer or a programming language specialist, please contact the authors. The Catala team meets over visioconference once every week.

Please note that the copyright of this code is owned by Inria; by contributing, you disclaim all copyright interests in favor of Inria. Both the code for the compiler and the examples in this repository are distributed under the Apache2 license.

Writing Catala code

Before writing Catala code, please read the tutorial. You can run the programs of the tutorial yourself by following the instruction in the README of the examples directory. Then, it is suggested that you create a new example directory again according to the instructions of this README.

Let us now present the typical Catala workflow. First, you need to locate the legislative text that you want to use as a reference. Then, simply copy-paste the text into your source file.

First you will have to format the copy-pasted text using Catala headings and articles markers:

## Heading

### Sub-heading (the more '#', the less important)

#### Legislative atom

Please look at the code of other examples to see how to format things properly. While formatting the text, don't forget regularly to try and parse your example using for instance

make -C examples/foo foo.tex
make -C examples/foo foo.py
make -C examples/foo foo.ml

to see if you've made any syntax errors. Once the text formatting is done, you can start to annotate each legislative atom (article, provision, etc.) with some Catala code. To open up a code section in Catala, simply use

```catala
# In code sections, comments start with #
scope Foo:
  <your code goes here>
```

While all the code sections are equivalent in terms of execution, you can mark some as "metadata" so that they are printed differently on lawyer-facing documents. Here's how it works:

```catala-metadata
declaration structure FooBar:
  data foo content boolean
  data bar content money

<your structure/enumeration/scope declarations goes here>
```

Again, make sure to regularly check that your example is parsing correctly. The error message from the compiler should help you debug the syntax if need be. You can also live-test the programs you wrote by feeding them to the interpreter (see the README of the examples directory); this will also type-check the programs, which is useful for debugging them.

Working on the compiler

The Catala compiler is a standard dune-managed OCaml project. You can look at the online OCaml documentation for the different modules' interfaces as well as high-level architecture documentation.

Please note that the ocamlformat version this project uses is 0.20.1. Using another version may cause spurious diffs to appear in your pull requests.

Example: adding a builtin function

The language provides a limited number of builtin functions, which are sometimes needed for things that can't easily be expressed in Catala itself; in case you need more, here is how one can be added:

  • Choose a name wisely. Be ready to patch any code that already used the name for scope parameters, variables or structure fields, since it won't compile anymore.
  • Add an element to the builtin_expression type in surface/ast.ml(i)
  • Add your builtin in the builtins list in surface/lexer.cppo.ml, and with proper translations in all of the language-specific modules surface/lexer_en.cppo.ml, surface/lexer_fr.cppo.ml, etc. Don't forget the macro at the beginning of lexer.cppo.ml.
  • The rest can all be done by following the type errors downstream:
    • Add a corresponding element to the lower-level AST in dcalc/ast.ml(i), type unop
    • Extend the translation accordingly in surface/desugaring.ml
    • Extend the printer (dcalc/print.ml) and the typer with correct type information (dcalc/typing.ml)
    • Finally, provide the implementations:
      • in lcalc/to_ocaml.ml, function format_unop
      • in dcalc/interpreter.ml, function evaluate_operator
  • Update the syntax guide in doc/syntax/syntax.tex with your new builtin

Internationalization of the Catala syntax

The Catala language should be adapted to any legislative text that follows a general-to-specifics statutes order. Therefore, there exists multiple versions of the Catala surface syntax, adapted to the language of the legislative text.

Currently, Catala supports English, French and Polish legislative text via the --language=en, --language=fr or --language=pl options.

To add support for a new language:

  • the basic syntax localisation is defined in compiler/surface/lexer_xx.cppo.ml where xx is the language code (en, fr...)

  • copy the files from another language, e.g. english, then replace the strings with your translations. Be careful with the following:

    • The file must be encoded in latin-1
    • For a given token FOO, define MS_FOO to be the string version of the keyword. Due to the encoding, use \xNN escape sequences for utf8 characters.
    • If the string contains spaces or non-latin1 characters, you need to define MR_FOO as well with a regular expression in sedlex format. Replace spaces with ", space_plus, ", and unicode characters with ", 0xNNNN, " where NNNN is the hexadecimal unicode codepoint.

    Hint: You may get syntax errors with unhelpful locations because of sedlex. In that case the command ocamlc _build/default/compiler/surface/lexer_xx.ml may point you to the source of the error.

  • add your translation to the compilation rules:

    • in compiler/surface/dune, copying another parser_xx.cppo.ml rule
    • in the extensions list in compiler/driver.ml
    • add a corresponding variant to compiler/utils/cli.ml backend_lang, try to run make build and follow all type errors and match non exhaustive warnings to be sure it is well handled everywhere.
  • you may want to add syntax highlighting support, see syntax_highlighting/ and the rules in Makefile

  • add examples and documentation!

Feel free to open a pull request for discussion even if you couldn't go through all these steps, the lexer_xx.cppo.ml file is the important part.