Since functions inside catala can now have multiple arguments (while
not yet being user-definable) the invariant is now about partial
evaluation not being possible.
it's unlikely to be used in any law, and likely to be cause for confusion.
best of all, the new operator has a different return type, which
ensures no inconsistency with the change can get overlooked.
This uses the same disambiguation mechanism put in place for
structures, calling the typer on individual rules on the desugared AST
to propagate types, in order to resolve ambiguous operators like `+`
to their strongly typed counterparts (`+!`, `+.`, `+$`, `+@`, `+$`) in
the translation to scopelang.
The patch includes some normalisation of the definition of all the
operators, and classifies them based on their typing policy instead of
their arity. It also adds a little more flexibility:
- a couple new operators, like `-` on date and duration
- optional type annotation on some aggregation constructions
The `Shared_ast` lib is also lightly restructured, with the `Expr`
module split into `Type`, `Operator` and `Expr`.
This is just a bunch of `sed` calls:
```shell
sed -i 's/ScopeSet/ScopeName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeMap/ScopeName.Map/g' compiler/**/*.ml*
sed -i 's/StructMap/StructName.Map/g' compiler/**/*.ml*
sed -i 's/StructSet/StructName.Set/g' compiler/**/*.ml*
sed -i 's/EnumMap/EnumName.Map/g' compiler/**/*.ml*
sed -i 's/EnumSet/EnumName.Set/g' compiler/**/*.ml*
sed -i 's/StructFieldName/StructField/g' compiler/**/*.ml*
sed -i 's/StructFieldMap/StructField.Map/g' compiler/**/*.ml*
sed -i 's/StructFieldSet/StructField.Set/g' compiler/**/*.ml*
sed -i 's/EnumConstructorMap/EnumConstructor.Map/g' compiler/**/*.ml*
sed -i 's/EnumConstructorSet/EnumConstructor.Set/g' compiler/**/*.ml*
sed -i 's/RuleMap/RuleName.Map/g' compiler/**/*.ml*
sed -i 's/RuleSet/RuleName.Set/g' compiler/**/*.ml*
sed -i 's/LabelMap/LabelName.Map/g' compiler/**/*.ml*
sed -i 's/LabelSet/LabelName.Set/g' compiler/**/*.ml*
sed -i 's/ScopeVarMap/ScopeVar.Map/g' compiler/**/*.ml*
sed -i 's/ScopeVarSet/ScopeVar.Set/g' compiler/**/*.ml*
sed -i 's/SubScopeNameMap/SubScopeName.Map/g' compiler/**/*.ml*
sed -i 's/SubScopeNameSet/SubScopeName.Set/g' compiler/**/*.ml*
```
... and reformat
Many changes got bundled in here and would be too tedious to separate.
Closes#330
See changes in `shared_ast/definitions.ml` to check the main point.
- the biggest change is a modification of the struct and enum types in
expressions: they are now stored as `Map`s throughout passes, and no longer
converted to indexed lists after scopelang. Their accessors are also changed,
and tuples only exist in Lcalc (they're used for closure conversion).
This implied adding some more information in the contexts, to keep the mapping
between struct fields and scope output variables. It should also be much more
robust (no longer relying on assumptions upon different orderings).
- another very pervasive change is more cosmetic: the rewrite of the main AST to
use inline records, labelling individual subfields.
- moved the checks for correct definitions and accesses of structures from
`Scope_to_dcalc` to `Typing`
- defining some new shallow iterators in module `Shared_ast.Expr`, and
factorising a few same-pass rewriting functions accordingly (closure
conversion, optimisations, etc.)
- some smaller style improvements (ensuring we use the proper compare/equal
functions instead of `=` in a few `when` closes, for example)
This was the only reasonable solution I found to the issue raised
[here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884).
This was a pretty tedious rewrite, but it should now ensure we are doing things
correctly. As a bonus, the "smart" expression constructors are now used
everywhere to build expressions (so another refactoring like this one should be
much easier) and this makes the code overall feel more
straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!)
---
Basically, we were using values of type `gexpr box = naked_gexpr marked box`
throughout when (re-)building expressions. This was done 99% of the time by
using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In
lots of places, we needed to recover the annotation of this expression later on,
typically to build its parent term (to inherit the position, or build the type).
Since it wasn't always possible to wrap these uses within `box_apply` (esp. as
bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`,
just to recover the position or type. This had the very unpleasant effect of
forcing the resolution of the whole box (including applying any stored closures)
to reach the top-level annotation which isn't even dependant on specific
variable bindings. Then, generally, throwing away the result.
Therefore, the change proposed here transforms
- `naked_gexpr marked Bindlib.box` into
- `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for
convenience)
This means only
1. not fitting the mark into the box right away when building, and
2. accessing the top-level mark directly without unboxing
The functions for building terms from module `Shared_ast.Expr` could be changed
easily. But then they needed to be consistently used throughout, without
manually building terms through `Bindlib.apply_box` -- which covers most of the
changes in this patch.
`Expr.Box.inj` is provided to swap back to a box, before binding for example.
Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`,
which hints at the amount of unnecessary work we were doing --'
This moves dcalc/typing.ml to shared_ast, and generalises the input type, but
without yet implementing the extra cases (these are all `assert false`): it's
just a first step.
- don't print variable id on type variables, there should be no ambiguity
- print "array" as "collection" to match the language
- print just "collection" for "'a collection", which makes sense english-wise
The issue was coming from Bindlib: it stores variable bindings as closures, so
`Bindlib.box_apply f bx` actually delays the application of `f` until the term
is substituted or unboxed (likely long after we are out of the `try..with`
block).
The proposed fix is to make sure we run the wrapper outside of bindlib
applications, on explicitely unboxed terms.
This is a WIP. Current version 0.0.2 of `dates_calc` is in the process of being published, so I guess the CI should fail currently.
- [x] Test current implementation
- [x] I commented out the `duration / duration` operator, can we remove it?
- [x] Need to add support for first and last day of month
- [x] Finish porting the Z3 backend to dates_calc
This will allow to unify with types used earlier in the
pipeline (`Scopelang.Ast.typ`).
It seems cleaner! But some areas may warrant a later clean-up, in particular
handling of options and their types in the backends, or possible name conflicts
of structs/enums with built-in types when printing.
Note that there were significant differences between the two printers (see the test diff!). Overall the `dcalc` one seemed newer so that's what I took, with only the required additions from `lcalc` (exceptions, raise and catch)
Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}`
The function parameters (e.g. `make_let_in`) could be removed from the
scope handling functions since now the types are compatible, which
makes them much easier to read.
The huge benefit of this approach is that almost no changes are needed and we get compatible types between dcalc and lcalc, allowing to deduplicate a few functions.
It might not be the best in the long run: there are still benefits in factorising small parts of the AST as suggested in #157, and this forces a central AST definition that makes the nanopass-like approach a bit less legible.
Still, I think it's a step in the right direction and it doesn't really lock us in keeping to use the big GADT (as the minimal cascade of changes show).
It's not expected to stay that way forever, but some additional effort will be required for them to preserve (or restore) types; until then, be safe and don't forward possibly incorrect information.
The AST structures track annotations (e.g., at the moment, source code
position information) in a lot of places. This patch tidies up a bit and
removes some duplication, ensuring a single level of annotation wrapping
at each AST recursion level.
This will be important when adding type information in these
annotations, because there will be consitency constraints to be ensured
and duplication is a likely source of mistakes.
this patch is just a bunch of `sed` commands
```shell
cd compiler
sed -i 's/Pos.marked/Marked.pos/g' *.ml* **/*.ml*
sed -i 's/Pos.unmark/Marked.unmark/g' *.ml* **/*.ml*
sed -i 's/Pos\.get_position/Marked.get_mark/g' *.ml* **/*.ml*
sed -i 's/Pos\.same_pos_as/Marked.same_mark_as/g' *.ml* **/*.ml*
sed -i 's/Pos\.map_under_mark/Marked.map_under_mark/g' *.ml* **/*.ml*
sed -i 's/Pos\.mark/Marked.mark/g' *.ml* **/*.ml*
sed -i 's/Pos\.compare_marked/Marked.compare/g' *.ml* **/*.ml*
```
This avoids many intermediate calls to e.g. `Format.asprintf`; should result in
some cases in "more correct" use of `Format`¹, avoid the computation of unused
debug strings, and make the code more readable.
¹ for `Format` to work as expected, all intermediate calls need to go through
it. Some cases of formatting to an intermediate string then printing through Format
again are still present, but this makes the situation better.