... and add a custom printer
Since this is a very common bug, this patch should gain us a lot of time when
debugging uncaught Not_found errors, because the element not found can now be
printed straight away without the need for further debugging.
The small cost is that one should remember to catch the correct specialised
`Foo.Map.Not_found _` exception rather than the standard `Not_found` (which
would type-check but not catch the exception). Using `find_opt` should be
preferred anyway.
Note that the other functions from the module `Map` that raise `Not_found` are
not affected ; these functions are `choose`, `min/max_binding`,
`find_first/last` which either take a predicate or fail on the empty map, so it
wouldn't make sense for them (and we probably don't use them much).
- Use separate functions for successive passes in module `Driver.Passes`
- Use other functions for end results printing in module `Driver.Commands`
As a consequence, it is much more flexible to use by plugins or libs and we no
longer need the complex polymorphic variant parameter.
This patch leverages previous changes to use Cmdliner subcommands and
effectively specialises the flags of each Catala subcommand.
Other changes include:
- an attempt to normalise the generic options and reduce the number of global
references. Some are ok, like `debug` ; some would better be further cleaned up,
e.g. the ones used by Proof backend were moved to a `Proof.globals` module and
need discussion. The printer no longer relies on the global languages and prints
money amounts in an agnostic way.
- the plugin directory is automatically guessed and loaded even in dev setups.
Plugins are shown by the main `catala` command and listed in `catala --help`
- exception catching at the toplevel has been refactored a bit as well; return
codes are normalised to follow the manpage and avoid codes >= 128 that are
generally reserved for shells.
Update tests
(first working dynload test with compilation done by manual calls to ocaml)
A few pieces of the puzzle:
* Loading of interfaces only from Catala files
* Registration of toplevel values in modules compiled to OCaml, to allow access
using dynlink
* Shady conversion from OCaml runtime values to/from Catala expressions, to
allow interop (ffi) of compiled modules and the interpreter
Two interdependent changes here:
1. Enforce all instances of Shared_ast.gexpr to use the generic type for marks.
This makes the interfaces a tad simpler to manipulate: you now write
`('a, 'm) gexpr` rather than `('a, 'm mark) gexpr`.
2. Define a polymorphic `Custom` mark case for use by pass-specific annotations.
And leverage this in the typing module
The module is renamed to `Mark`, and functions renamed to avoid redundancy:
`Marked.mark` is now `Mark.add`
`Marked.unmark` is now `Mark.remove`
`Marked.map_under_mark` is now simply `Mark.map`
etc.
`Marked.same_mark_as` is replaced by `Mark.copy`, but with the arguments
swapped (which seemed more convenient throughout)
Since a type `Mark.t` would indicate a mark, and to avoid confusion, the type
`Marked.t` is renamed to `Mark.ed` as a shorthand for `Mark.marked` ; this part
can easily be removed if that's too much quirkiness.
- Fix the printer for scopes
- Improve the printer for struct types
- Remove `Print.expr'`. Use `Expr.format` as the function with simplified arguments instead.
- `Print.expr` no longer needs the context
- This removes the need for `expr ~debug` + `expr_debug` ;
use `Print.expr` for normal (non-debug) output,
and `Print.expr' ?debug ()` for possibly debug output.
- This improves consistency of debug expr output in many places
- Prints simplified operators (without type suffix) in non-verbose mode
(this patch also fixes some cases of `Expr.skip_wrappers` and leverages the
binder equality provided by Bindlib)
This adds a few positions to the parser, and tweaks some others, vastly
improving the reporting of some errors (inconsistent functions definitions, but
also exceptions cycles, etc.)
This uses the same disambiguation mechanism put in place for
structures, calling the typer on individual rules on the desugared AST
to propagate types, in order to resolve ambiguous operators like `+`
to their strongly typed counterparts (`+!`, `+.`, `+$`, `+@`, `+$`) in
the translation to scopelang.
The patch includes some normalisation of the definition of all the
operators, and classifies them based on their typing policy instead of
their arity. It also adds a little more flexibility:
- a couple new operators, like `-` on date and duration
- optional type annotation on some aggregation constructions
The `Shared_ast` lib is also lightly restructured, with the `Expr`
module split into `Type`, `Operator` and `Expr`.
Some typing errors are changed a little, because they get triggered during the
typing of the disambiguation pass, which does not specify the expected return
type (it's an expected invariant that it should not be needed for
disambiguation).
It would be possible to still specify these types during disambiguation just to
get the same errors, but since the newer ones don't appear to be clearly worse
at the moment, it has not been done.
This is just a bunch of `sed` calls:
```shell
sed -i 's/ScopeSet/ScopeName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeMap/ScopeName.Map/g' compiler/**/*.ml*
sed -i 's/StructMap/StructName.Map/g' compiler/**/*.ml*
sed -i 's/StructSet/StructName.Set/g' compiler/**/*.ml*
sed -i 's/EnumMap/EnumName.Map/g' compiler/**/*.ml*
sed -i 's/EnumSet/EnumName.Set/g' compiler/**/*.ml*
sed -i 's/StructFieldName/StructField/g' compiler/**/*.ml*
sed -i 's/StructFieldMap/StructField.Map/g' compiler/**/*.ml*
sed -i 's/StructFieldSet/StructField.Set/g' compiler/**/*.ml*
sed -i 's/EnumConstructorMap/EnumConstructor.Map/g' compiler/**/*.ml*
sed -i 's/EnumConstructorSet/EnumConstructor.Set/g' compiler/**/*.ml*
sed -i 's/RuleMap/RuleName.Map/g' compiler/**/*.ml*
sed -i 's/RuleSet/RuleName.Set/g' compiler/**/*.ml*
sed -i 's/LabelMap/LabelName.Map/g' compiler/**/*.ml*
sed -i 's/LabelSet/LabelName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeVarMap/ScopeVar.Map/g' compiler/**/*.ml*
sed -i 's/ScopeVarSet/ScopeVar.Set/g' compiler/**/*.ml*
sed -i 's/SubScopeNameMap/SubScopeName.Map/g' compiler/**/*.ml*
sed -i 's/SubScopeNameSet/SubScopeName.Set/g' compiler/**/*.ml*
```
... and reformat
Many changes got bundled in here and would be too tedious to separate.
Closes#330
See changes in `shared_ast/definitions.ml` to check the main point.
- the biggest change is a modification of the struct and enum types in
expressions: they are now stored as `Map`s throughout passes, and no longer
converted to indexed lists after scopelang. Their accessors are also changed,
and tuples only exist in Lcalc (they're used for closure conversion).
This implied adding some more information in the contexts, to keep the mapping
between struct fields and scope output variables. It should also be much more
robust (no longer relying on assumptions upon different orderings).
- another very pervasive change is more cosmetic: the rewrite of the main AST to
use inline records, labelling individual subfields.
- moved the checks for correct definitions and accesses of structures from
`Scope_to_dcalc` to `Typing`
- defining some new shallow iterators in module `Shared_ast.Expr`, and
factorising a few same-pass rewriting functions accordingly (closure
conversion, optimisations, etc.)
- some smaller style improvements (ensuring we use the proper compare/equal
functions instead of `=` in a few `when` closes, for example)
Quite a few changes are included here, some of which have some extra
implications visible in the language:
- adds the `Scope of { -- input_v: value; ... }` construct in the language
- handle it down the pipeline:
* `ScopeCall` in the surface AST
* `EScopeCall` in desugared and scopelang
* expressions are now traversed to detect dependencies between scopes
* transformed into a normal function call in dcalc
- defining a scope now implicitely defines a structure with the same name, with
the output variables of the scope defined as fields. This allows us to type
the return value from a scope call and access its fields easily.
* the implications are mostly in surface/name_resolution.ml code-wise
* the `Scope_out` struct that was defined in scope_to_dcalc is no longer
needed/used and the fields are no longer renamed (changes some outputs; the
explicit suffix for variables with multiple states is ignored as well)
* one benefit is that disambiguation works just like for structures when there
are conflicts on field names
* however, it's now a conflict if a scope and a structure have the same
name (side-note: issues with conflicting enum / struct names or scope
variables / subscope names were silent and are now properly reported)
- you can consequently use scope names as types for variables as well. Writing
literals is not allowed though, they can only be obtained by calling the
scope.
Remaining TODOs:
- context variables are not handled properly at the moment
- error handling on invalid calls
- tests show a small error message regression; lots of examples will need
tweaking to avoid scope/struct name or struct fields / output variable
conflicts
- add a `->` syntax to make struct field access distinct from scope output var
access, enforced with typing. This is expected to reduce confusion of users
and add a little typing precision.
- document the new syntax & implications (tutorial, cheat-sheet)
- a consequence of the changes is that subscope variables also can now be typed.
A possible future evolution / simplification would be to rewrite subscopes as
explicit scope calls early in the pipeline. That could also allow to manipulate
them as expressions (bind them in let-ins, return them...)
This was the only reasonable solution I found to the issue raised
[here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884).
This was a pretty tedious rewrite, but it should now ensure we are doing things
correctly. As a bonus, the "smart" expression constructors are now used
everywhere to build expressions (so another refactoring like this one should be
much easier) and this makes the code overall feel more
straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!)
---
Basically, we were using values of type `gexpr box = naked_gexpr marked box`
throughout when (re-)building expressions. This was done 99% of the time by
using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In
lots of places, we needed to recover the annotation of this expression later on,
typically to build its parent term (to inherit the position, or build the type).
Since it wasn't always possible to wrap these uses within `box_apply` (esp. as
bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`,
just to recover the position or type. This had the very unpleasant effect of
forcing the resolution of the whole box (including applying any stored closures)
to reach the top-level annotation which isn't even dependant on specific
variable bindings. Then, generally, throwing away the result.
Therefore, the change proposed here transforms
- `naked_gexpr marked Bindlib.box` into
- `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for
convenience)
This means only
1. not fitting the mark into the box right away when building, and
2. accessing the top-level mark directly without unboxing
The functions for building terms from module `Shared_ast.Expr` could be changed
easily. But then they needed to be consistently used throughout, without
manually building terms through `Bindlib.apply_box` -- which covers most of the
changes in this patch.
`Expr.Box.inj` is provided to swap back to a box, before binding for example.
Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`,
which hints at the amount of unnecessary work we were doing --'
it's now done explicitely from the driver, which allows to do it before typing
and is more consistent; the information was already forwarded to the later
compilation stages separately from the program AST anyway.
Note that this is incomplete in the case of desugared/scopelang because we only
have typing for expressions yet, and the scope/program structure is different.
The code allows passing an environment of types for scope/subscope variables in
order to resolve `ELocation` terms, but that's unused until we implement
scopelang typing at the scope level.
Also add some safeguards against bad propagation of types (e.g. checking the
arrow type of functions upon application); partly disabled at the moment since
they don't pass yet but that'll be further work.
This gives further uniformity in their interfaces and allows more common
handling.
The next step will be for all the `Expr.make_*` functions to work on expressions
annotated with the `'a mark` type, correctly propagating type information when
it is present. Then we could even imagine early propagation of type
information (without complete inference), which could for example be used for
overloaded operator disambiguation.
This will allow to unify with types used earlier in the
pipeline (`Scopelang.Ast.typ`).
It seems cleaner! But some areas may warrant a later clean-up, in particular
handling of options and their types in the backends, or possible name conflicts
of structs/enums with built-in types when printing.
Note that there were significant differences between the two printers (see the test diff!). Overall the `dcalc` one seemed newer so that's what I took, with only the required additions from `lcalc` (exceptions, raise and catch)
Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}`
The function parameters (e.g. `make_let_in`) could be removed from the
scope handling functions since now the types are compatible, which
makes them much easier to read.