This adds a few positions to the parser, and tweaks some others, vastly
improving the reporting of some errors (inconsistent functions definitions, but
also exceptions cycles, etc.)
* temporary and undocumented while waiting for discussion an approval
* previous patches already allowed definition (at toplevel) but there was no
syntax for calls
* no syntax for multi-args _local_ functions yet
Closes#373
This forbids expressions such as `a and b or c`, avoiding the need to set an
implicit priority between `and`, `or` and `xor`, which I find error-prone.
Instead, when that appears, a message asking for explicit parentheses will be
shown to the user.
Implementation note: since that would be extremely tedious to do in the parser
directly, the parser is set to allow right-associativity without discrimination
for the logical operators, and the check is done during desugaring. This
required to explicit parentheses in the surface AST to discriminate the case
where the priority was explicit.
Define a single expression rule with disambiguation using token priorities
instead of the many layers of intermediate rules with explicit sub-terms.
Also replaces `in` for collection operations (`x+1 for foo in [1;2]`) with
`among` which helps a lot.
it's unlikely to be used in any law, and likely to be cause for confusion.
best of all, the new operator has a different return type, which
ensures no inconsistency with the change can get overlooked.
Command used: `sed -i 's/\([-+*/><=]=\?\)[.$@^€$]/\1/g' **/*/*.catala_*`
The overload test, of course, is kept unchanged and ensures that explicit
operators still work.
This uses the same disambiguation mechanism put in place for
structures, calling the typer on individual rules on the desugared AST
to propagate types, in order to resolve ambiguous operators like `+`
to their strongly typed counterparts (`+!`, `+.`, `+$`, `+@`, `+$`) in
the translation to scopelang.
The patch includes some normalisation of the definition of all the
operators, and classifies them based on their typing policy instead of
their arity. It also adds a little more flexibility:
- a couple new operators, like `-` on date and duration
- optional type annotation on some aggregation constructions
The `Shared_ast` lib is also lightly restructured, with the `Expr`
module split into `Type`, `Operator` and `Expr`.
Some typing errors are changed a little, because they get triggered during the
typing of the disambiguation pass, which does not specify the expected return
type (it's an expected invariant that it should not be needed for
disambiguation).
It would be possible to still specify these types during disambiguation just to
get the same errors, but since the newer ones don't appear to be clearly worse
at the moment, it has not been done.
Many changes got bundled in here and would be too tedious to separate.
Closes#330
See changes in `shared_ast/definitions.ml` to check the main point.
- the biggest change is a modification of the struct and enum types in
expressions: they are now stored as `Map`s throughout passes, and no longer
converted to indexed lists after scopelang. Their accessors are also changed,
and tuples only exist in Lcalc (they're used for closure conversion).
This implied adding some more information in the contexts, to keep the mapping
between struct fields and scope output variables. It should also be much more
robust (no longer relying on assumptions upon different orderings).
- another very pervasive change is more cosmetic: the rewrite of the main AST to
use inline records, labelling individual subfields.
- moved the checks for correct definitions and accesses of structures from
`Scope_to_dcalc` to `Typing`
- defining some new shallow iterators in module `Shared_ast.Expr`, and
factorising a few same-pass rewriting functions accordingly (closure
conversion, optimisations, etc.)
- some smaller style improvements (ensuring we use the proper compare/equal
functions instead of `=` in a few `when` closes, for example)
Normally I would make sure this is not by default, or at leat disableable; but
here the code we print may contain utf8 anyway, so the terminal really needs to
support it. Anyway, it's just a little fancier, doesn't add much.
a quick fix for now, ideally we want an option for editor-friendly output.
But for now this is a very cheap way to at least have clickable error messages
which are a big time-saver.
Quite a few changes are included here, some of which have some extra
implications visible in the language:
- adds the `Scope of { -- input_v: value; ... }` construct in the language
- handle it down the pipeline:
* `ScopeCall` in the surface AST
* `EScopeCall` in desugared and scopelang
* expressions are now traversed to detect dependencies between scopes
* transformed into a normal function call in dcalc
- defining a scope now implicitely defines a structure with the same name, with
the output variables of the scope defined as fields. This allows us to type
the return value from a scope call and access its fields easily.
* the implications are mostly in surface/name_resolution.ml code-wise
* the `Scope_out` struct that was defined in scope_to_dcalc is no longer
needed/used and the fields are no longer renamed (changes some outputs; the
explicit suffix for variables with multiple states is ignored as well)
* one benefit is that disambiguation works just like for structures when there
are conflicts on field names
* however, it's now a conflict if a scope and a structure have the same
name (side-note: issues with conflicting enum / struct names or scope
variables / subscope names were silent and are now properly reported)
- you can consequently use scope names as types for variables as well. Writing
literals is not allowed though, they can only be obtained by calling the
scope.
Remaining TODOs:
- context variables are not handled properly at the moment
- error handling on invalid calls
- tests show a small error message regression; lots of examples will need
tweaking to avoid scope/struct name or struct fields / output variable
conflicts
- add a `->` syntax to make struct field access distinct from scope output var
access, enforced with typing. This is expected to reduce confusion of users
and add a little typing precision.
- document the new syntax & implications (tutorial, cheat-sheet)
- a consequence of the changes is that subscope variables also can now be typed.
A possible future evolution / simplification would be to rewrite subscopes as
explicit scope calls early in the pipeline. That could also allow to manipulate
them as expressions (bind them in let-ins, return them...)
Pass along a bindlib context to allow the variable names to be altered only when
disambiguation is needed. Partial fix to #240 (doesn't affect the backends, only
the printer for the intermediate ASTs).
This also has the benefit of making the output of the tests much more stable.
This is a workaround (but corresponds to what was executed before) and means
that we re-explore all exprs to look for free variables.
The proper fix will be to store boxed_exprs inside scopes instead.
These are just variable renumberings, and type error message changes but still
pointing to the same information; the latter are slightly better in general,
pointing to actual expressions rather than scope declarations.
Also add some safeguards against bad propagation of types (e.g. checking the
arrow type of functions upon application); partly disabled at the moment since
they don't pass yet but that'll be further work.
- don't print variable id on type variables, there should be no ambiguity
- print "array" as "collection" to match the language
- print just "collection" for "'a collection", which makes sense english-wise
The issue was coming from Bindlib: it stores variable bindings as closures, so
`Bindlib.box_apply f bx` actually delays the application of `f` until the term
is substituted or unboxed (likely long after we are out of the `try..with`
block).
The proposed fix is to make sure we run the wrapper outside of bindlib
applications, on explicitely unboxed terms.
Note that there were significant differences between the two printers (see the test diff!). Overall the `dcalc` one seemed newer so that's what I took, with only the required additions from `lcalc` (exceptions, raise and catch)
Follow-up of #287, #266 and #165.
Time spent
Pair programming sessions
Before 2022-07-11: 50h (50 h for each person of the pair programming duo)
Refactoring sessions
Before 2022-07-11: 24 h
2022-07-14: 3 h
Legal research sessions
Before 2022-07-11: 21,5 h
Testing and debugging
Before 2022-07-11: 13,5 h
2022-07-11: 3 h with Denis
2022-07-13: 2 h with Denis
2022-07-14: 1 h with Denis
2022-07-16: 2 h with Denis
2022-07-19: 2 h with Denis
2022-07-21: 2 h with Denis
2022-08-11: 6 h with Denis
2022-08-15: 4 h with Denis
2022-08-16: 2 h with Denis
UI and form
2022-08-09: 8 h with Denis
2022-08-10: 8 h with Denis
2022-08-15: 2 h with Denis
2022-08-16: 2 h with Denis
2022-08-17: 6 h with Denis
2022-08-18: 4 h with Denis
Before: `ELEMENT in SET`; now: `SET contains ELEMENT`
Using the `in` keyword was causing conflicts and blocking #203.
Current proposal has `contient` for the French syntax, and is untranslated (`contains`) for Polish.
Nothing shocking here:
- division by zero now reported on the application rather than the
operator
- renumbering of printed bindlib variables
- some whitespace changes
I removed the '.out' extension for now to preserve the test output file names and avoid a million file renames.
This makes the patch easier to read, and we can do the rename easily in another patch afterwards, without mixing with semantic changes.
(beautiful script àlarrache:
```bash
for f in */*/output/*; do
target_base=${f##*/}
target_base=${target_base%%.*}
echo $f | awk -F. '{
f=$1"."$2; if ($4 == "") { mode=$3; id=$3 } else { scope="-s "$3; mode=$4; id=$3"."$4}
printf "\n```catala-test {id=\"%s\"}\ncatala %s %s\n```\n",id,mode,scope;
}' >> $(dirname $f)/../${target_base}.*; done
```
Closes#208 (implementing Solution 1, without adding an explicit syntax)
Two exceptions or more, e.g. `(j1 |- c1)` and `(j2 |- c2)` such that `c1
= c2`, are collapsed by this transformation into `((j1 |- c1) | j2 |-
c2)`, introducing an arbitrary precedence that avoids the conflict.
The transormation is not applied if any exceptions apply to the subterms
themselves: while these exceptions could be merged, that would turn more
conflicts into arbitrary outcomes than wanted.