- Use separate functions for successive passes in module `Driver.Passes`
- Use other functions for end results printing in module `Driver.Commands`
As a consequence, it is much more flexible to use by plugins or libs and we no
longer need the complex polymorphic variant parameter.
This patch leverages previous changes to use Cmdliner subcommands and
effectively specialises the flags of each Catala subcommand.
Other changes include:
- an attempt to normalise the generic options and reduce the number of global
references. Some are ok, like `debug` ; some would better be further cleaned up,
e.g. the ones used by Proof backend were moved to a `Proof.globals` module and
need discussion. The printer no longer relies on the global languages and prints
money amounts in an agnostic way.
- the plugin directory is automatically guessed and loaded even in dev setups.
Plugins are shown by the main `catala` command and listed in `catala --help`
- exception catching at the toplevel has been refactored a bit as well; return
codes are normalised to follow the manpage and avoid codes >= 128 that are
generally reserved for shells.
Update tests
The upside of this is that each command can define specific flags ; there is a
small loss of backwards-compatibility in that the command needs to be the first
argument.
`catala --help` will now only show a summary of commands, with more specific
manpages shown on `catala CMD --help`.
Another point is that the plugin interface is extended to allow plugins to be
registered as subcommands and have their own flags (this will be very useful for
adding flags to the lazy/dot/explanation plugin that has many options).
Note that no efforts has yet been made to specialise the options, the previous
type was just made global for all subcommands.
(first working dynload test with compilation done by manual calls to ocaml)
A few pieces of the puzzle:
* Loading of interfaces only from Catala files
* Registration of toplevel values in modules compiled to OCaml, to allow access
using dynlink
* Shady conversion from OCaml runtime values to/from Catala expressions, to
allow interop (ffi) of compiled modules and the interpreter
Two interdependent changes here:
1. Enforce all instances of Shared_ast.gexpr to use the generic type for marks.
This makes the interfaces a tad simpler to manipulate: you now write
`('a, 'm) gexpr` rather than `('a, 'm mark) gexpr`.
2. Define a polymorphic `Custom` mark case for use by pass-specific annotations.
And leverage this in the typing module
The module is renamed to `Mark`, and functions renamed to avoid redundancy:
`Marked.mark` is now `Mark.add`
`Marked.unmark` is now `Mark.remove`
`Marked.map_under_mark` is now simply `Mark.map`
etc.
`Marked.same_mark_as` is replaced by `Mark.copy`, but with the arguments
swapped (which seemed more convenient throughout)
Since a type `Mark.t` would indicate a mark, and to avoid confusion, the type
`Marked.t` is renamed to `Mark.ed` as a shorthand for `Mark.marked` ; this part
can easily be removed if that's too much quirkiness.
A module without mli is ok as long as it only contains types
Here we already stretch it a bit with some functor applications, but having
toplevel values defeats the expectation that you can safely `open` this module.
- Fix the printer for scopes
- Improve the printer for struct types
- Remove `Print.expr'`. Use `Expr.format` as the function with simplified arguments instead.
- `Print.expr` no longer needs the context
- This removes the need for `expr ~debug` + `expr_debug` ;
use `Print.expr` for normal (non-debug) output,
and `Print.expr' ?debug ()` for possibly debug output.
- This improves consistency of debug expr output in many places
- Prints simplified operators (without type suffix) in non-verbose mode
(this patch also fixes some cases of `Expr.skip_wrappers` and leverages the
binder equality provided by Bindlib)
To try it (without installing Catala):
```shell-session
$ make plugins
$ export CATALA_PLUGINS=_build/default/compiler/plugins
$ dune exec -- catala lazy examples/aides_logement/tests/tests_calcul_apl_locatif.catala_fr -s Exemple2
```
Keep in mind that this is a work-in-progress prototype :)
I made some changes in the meantime, and had to factorise e.g. the handling of
the `EEmptyError` case, but this is the simple approach type-wise of making the
function type for `∀ 'a. 'a —> 'a` (with `assert false` match cases), then
restricting its type do `dcalc` or `lcalc` in the `.mli`.
The phantom polymorphic variant qualifying AST nodes is reversed:
- previously, we were explicitely restricting each AST node to the passes where it belonged using a closed type (e.g. `[< dcalc | lcalc]`)
- now, each node instead declares the "feature" it provides using an open type (e.g. `[> 'Exceptions ]`)
- then the AST for a specific pass limits the features it allows with a closed type
The result is that you can mix and match all features if you wish,
even if the result is not a valid AST for any given pass. More
interestingly, it's now easier to write a function that works on
different ASTs at once (it's the inferred default if you don't write a
type restriction).
The opportunity was also taken to simplify the encoding of the
operators, which don't need a second type parameter anymore.
This is a hack, but not a dirty one: a new command `catala pygmentize` is added,
which is just a wrapper around `pygmentize` that calls it with the proper lexers
defined.
The point is that this needs no installation, just a stock `pygmentize`
installation and the `catala` binary.
This leverages the embedded lexer already used for HTML output, and uses the
LaTeX pygments backend to colorise code directly, without the need for `minted`.
Changelog:
---
A lot has been going on, with more than 530 patches and 70 PRs merged since
0.7.0 last summer. In summary:
- Quite a lot of syntax improvements and changes. Checkout the latest
[cheat-sheet](https://catalalang.github.io/catala/syntax.pdf) for an overview
- Allow local `let ... equals ... in ...` definitions
- Better error messages and positions throughout
- Added the ability to directly call a scope and retrieve its outputs, like a
function
- Added disambiguation, allowing to access structure fields without specifying
the structure type each time
- Added automated resolution of operators, allowing e.g. to write just `+` in
place of all the type-specific operators `+.`, `+$`, `+@`, `+^`, etc.
- More consistent priority for operators. It is no longer allowed to write `a
and b or c` without parenthesis.
- Added and changed some operators (`date + duration` now allowed either way,
`int / int` now returns a decimal, added `duration / duration`)
- Added the ability to have variables and functions defined at
top-level (outside of any scope). See annex A of the tutorial for details.
- Added support for functions with multiple arguments
- Some big refactors in the compiler, allowing much better code sharing between
the different passes, and making it much easier to extend. Also added the
possibility to run the type-checker earlier, etc.
- Countless bug-fixes
- Improvements to our proof backend with Z3
- A tool to automatically synchronise with the upstream French law from
Legifrance
Previously the `state` marker for rules was in a weird position:
```catala
rule foo under condition bar
consequence state st fulfilled
```
This patch unifies the syntax with definitions, now using instead:
```catala
rule foo state st
under condition bar
consequence fulfilled.
```
Interstingly enough, it was already implemented in the Python backend.
Required to implement *pro rata temporis*, which the US tax section 121 does
make use of.
Only allowed for durations expressed in days (as returned by `<date> - <date>`),
of course.
This adds a few positions to the parser, and tweaks some others, vastly
improving the reporting of some errors (inconsistent functions definitions, but
also exceptions cycles, etc.)
Changed the typing printing in the pretty printer to:
* () -> unit for empty lists
* a -> b for single elements lists
* (a, b, c, d) -> b for multiple elements lists
Since functions inside catala can now have multiple arguments (while
not yet being user-definable) the invariant is now about partial
evaluation not being possible.
Fixes a regression after the change in #368, which converted all
integer division to return a decimal. The code generation backend
was still using the integer division operand `//`, which is not
overloaded by class `Integer` in the catala runtime.
* temporary and undocumented while waiting for discussion an approval
* previous patches already allowed definition (at toplevel) but there was no
syntax for calls
* no syntax for multi-args _local_ functions yet
... that's one less thing to do
Two notes:
- Updated the syntax errors in
examples/NSW_community_gaming/tests/test_nsw_social_housie.catala_en ; those
probably aren't expected though, but fixing them is outside my purpose here
- There is consensus on keeping the error messages in English; however, here,
the error messages include hints on the syntax to use, which are only valid
for users of the English syntax.
* A possible solution would be to apply cppo on parser.messages, using the
macros already defined in lexer_LANG.cppo.ml. However, we would then need to
tweak (or duplicate!) the parser to use the messages for the correct language.
Furthermore, updating and merging the file on parser updates would need
special care.
* Another, maybe easier solution would be manual processing, using a custom
escape in the parser messages and rewriting that at runtime when printing
the message. We would need to extract a runtime version of the macro
definitions though.
Fixes#378
- the plugins are compiled as libraries rather than with `executable`, so that
dune is able to install them
- they get installed to `lib/catala/plugins/<plugin-name>/<plugin-name>.cmxs`
- the lookup for plugins is now recursive to cope with the plugin subdirectories
in the point above
Using them will lead to "not supported yet" errors soon after, but it's a start
to get to handling separate modules.
The idea is that `foo` can now also be `Bar.foo`, `Bar.Baz.foo`, `foo.Struc.fld`
can be `foo.Bar.Baz.Struc.fld`, etc.
The next steps are to enable the lookups to handle this paths, and to provide
ways to load the external modules to feed these lookups.
Closes#373
This forbids expressions such as `a and b or c`, avoiding the need to set an
implicit priority between `and`, `or` and `xor`, which I find error-prone.
Instead, when that appears, a message asking for explicit parentheses will be
shown to the user.
Implementation note: since that would be extremely tedious to do in the parser
directly, the parser is set to allow right-associativity without discrimination
for the logical operators, and the check is done during desugaring. This
required to explicit parentheses in the surface AST to discriminate the case
where the priority was explicit.
In particular `CONSTRUCTOR` is no longer valid for paths & modules, so let's
switch to the more usual LIDENT / UIDENT for lower- or upper- case idents.
cd compiler/surface
sed -i 's/VERTICAL/BAR/g' *
sed -i 's/BRACKET/BRACE/g' *
sed -i 's/SQUARE/BRACKET/g' *
sed -i 's/IDENT/LIDENT/g' *
sed -i 's/CONSTRUCTOR/UIDENT/g' *
Define a single expression rule with disambiguation using token priorities
instead of the many layers of intermediate rules with explicit sub-terms.
Also replaces `in` for collection operations (`x+1 for foo in [1;2]`) with
`among` which helps a lot.
Overloads are powerful, but let's clearly draw the line right now between
convenience and type safety, for when someone else will want to add new
operators.
it's unlikely to be used in any law, and likely to be cause for confusion.
best of all, the new operator has a different return type, which
ensures no inconsistency with the change can get overlooked.
This uses the same disambiguation mechanism put in place for
structures, calling the typer on individual rules on the desugared AST
to propagate types, in order to resolve ambiguous operators like `+`
to their strongly typed counterparts (`+!`, `+.`, `+$`, `+@`, `+$`) in
the translation to scopelang.
The patch includes some normalisation of the definition of all the
operators, and classifies them based on their typing policy instead of
their arity. It also adds a little more flexibility:
- a couple new operators, like `-` on date and duration
- optional type annotation on some aggregation constructions
The `Shared_ast` lib is also lightly restructured, with the `Expr`
module split into `Type`, `Operator` and `Expr`.
Some typing errors are changed a little, because they get triggered during the
typing of the disambiguation pass, which does not specify the expected return
type (it's an expected invariant that it should not be needed for
disambiguation).
It would be possible to still specify these types during disambiguation just to
get the same errors, but since the newer ones don't appear to be clearly worse
at the moment, it has not been done.
This is just a bunch of `sed` calls:
```shell
sed -i 's/ScopeSet/ScopeName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeMap/ScopeName.Map/g' compiler/**/*.ml*
sed -i 's/StructMap/StructName.Map/g' compiler/**/*.ml*
sed -i 's/StructSet/StructName.Set/g' compiler/**/*.ml*
sed -i 's/EnumMap/EnumName.Map/g' compiler/**/*.ml*
sed -i 's/EnumSet/EnumName.Set/g' compiler/**/*.ml*
sed -i 's/StructFieldName/StructField/g' compiler/**/*.ml*
sed -i 's/StructFieldMap/StructField.Map/g' compiler/**/*.ml*
sed -i 's/StructFieldSet/StructField.Set/g' compiler/**/*.ml*
sed -i 's/EnumConstructorMap/EnumConstructor.Map/g' compiler/**/*.ml*
sed -i 's/EnumConstructorSet/EnumConstructor.Set/g' compiler/**/*.ml*
sed -i 's/RuleMap/RuleName.Map/g' compiler/**/*.ml*
sed -i 's/RuleSet/RuleName.Set/g' compiler/**/*.ml*
sed -i 's/LabelMap/LabelName.Map/g' compiler/**/*.ml*
sed -i 's/LabelSet/LabelName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeVarMap/ScopeVar.Map/g' compiler/**/*.ml*
sed -i 's/ScopeVarSet/ScopeVar.Set/g' compiler/**/*.ml*
sed -i 's/SubScopeNameMap/SubScopeName.Map/g' compiler/**/*.ml*
sed -i 's/SubScopeNameSet/SubScopeName.Set/g' compiler/**/*.ml*
```
... and reformat
Many changes got bundled in here and would be too tedious to separate.
Closes#330
See changes in `shared_ast/definitions.ml` to check the main point.
- the biggest change is a modification of the struct and enum types in
expressions: they are now stored as `Map`s throughout passes, and no longer
converted to indexed lists after scopelang. Their accessors are also changed,
and tuples only exist in Lcalc (they're used for closure conversion).
This implied adding some more information in the contexts, to keep the mapping
between struct fields and scope output variables. It should also be much more
robust (no longer relying on assumptions upon different orderings).
- another very pervasive change is more cosmetic: the rewrite of the main AST to
use inline records, labelling individual subfields.
- moved the checks for correct definitions and accesses of structures from
`Scope_to_dcalc` to `Typing`
- defining some new shallow iterators in module `Shared_ast.Expr`, and
factorising a few same-pass rewriting functions accordingly (closure
conversion, optimisations, etc.)
- some smaller style improvements (ensuring we use the proper compare/equal
functions instead of `=` in a few `when` closes, for example)
Normally I would make sure this is not by default, or at leat disableable; but
here the code we print may contain utf8 anyway, so the terminal really needs to
support it. Anyway, it's just a little fancier, doesn't add much.
a quick fix for now, ideally we want an option for editor-friendly output.
But for now this is a very cheap way to at least have clickable error messages
which are a big time-saver.
Quite a few changes are included here, some of which have some extra
implications visible in the language:
- adds the `Scope of { -- input_v: value; ... }` construct in the language
- handle it down the pipeline:
* `ScopeCall` in the surface AST
* `EScopeCall` in desugared and scopelang
* expressions are now traversed to detect dependencies between scopes
* transformed into a normal function call in dcalc
- defining a scope now implicitely defines a structure with the same name, with
the output variables of the scope defined as fields. This allows us to type
the return value from a scope call and access its fields easily.
* the implications are mostly in surface/name_resolution.ml code-wise
* the `Scope_out` struct that was defined in scope_to_dcalc is no longer
needed/used and the fields are no longer renamed (changes some outputs; the
explicit suffix for variables with multiple states is ignored as well)
* one benefit is that disambiguation works just like for structures when there
are conflicts on field names
* however, it's now a conflict if a scope and a structure have the same
name (side-note: issues with conflicting enum / struct names or scope
variables / subscope names were silent and are now properly reported)
- you can consequently use scope names as types for variables as well. Writing
literals is not allowed though, they can only be obtained by calling the
scope.
Remaining TODOs:
- context variables are not handled properly at the moment
- error handling on invalid calls
- tests show a small error message regression; lots of examples will need
tweaking to avoid scope/struct name or struct fields / output variable
conflicts
- add a `->` syntax to make struct field access distinct from scope output var
access, enforced with typing. This is expected to reduce confusion of users
and add a little typing precision.
- document the new syntax & implications (tutorial, cheat-sheet)
- a consequence of the changes is that subscope variables also can now be typed.
A possible future evolution / simplification would be to rewrite subscopes as
explicit scope calls early in the pipeline. That could also allow to manipulate
them as expressions (bind them in let-ins, return them...)