I am not convinced about this, would there be a better place where to put it?
My worry is that it could get confusing and should not be presented as "another
way to define subscopes", but it ended up a bit verbose.
There is also no section yet on "let-in" definitions, so I did not use them in
the example.
Quite a few changes are included here, some of which have some extra
implications visible in the language:
- adds the `Scope of { -- input_v: value; ... }` construct in the language
- handle it down the pipeline:
* `ScopeCall` in the surface AST
* `EScopeCall` in desugared and scopelang
* expressions are now traversed to detect dependencies between scopes
* transformed into a normal function call in dcalc
- defining a scope now implicitely defines a structure with the same name, with
the output variables of the scope defined as fields. This allows us to type
the return value from a scope call and access its fields easily.
* the implications are mostly in surface/name_resolution.ml code-wise
* the `Scope_out` struct that was defined in scope_to_dcalc is no longer
needed/used and the fields are no longer renamed (changes some outputs; the
explicit suffix for variables with multiple states is ignored as well)
* one benefit is that disambiguation works just like for structures when there
are conflicts on field names
* however, it's now a conflict if a scope and a structure have the same
name (side-note: issues with conflicting enum / struct names or scope
variables / subscope names were silent and are now properly reported)
- you can consequently use scope names as types for variables as well. Writing
literals is not allowed though, they can only be obtained by calling the
scope.
Remaining TODOs:
- context variables are not handled properly at the moment
- error handling on invalid calls
- tests show a small error message regression; lots of examples will need
tweaking to avoid scope/struct name or struct fields / output variable
conflicts
- add a `->` syntax to make struct field access distinct from scope output var
access, enforced with typing. This is expected to reduce confusion of users
and add a little typing precision.
- document the new syntax & implications (tutorial, cheat-sheet)
- a consequence of the changes is that subscope variables also can now be typed.
A possible future evolution / simplification would be to rewrite subscopes as
explicit scope calls early in the pipeline. That could also allow to manipulate
them as expressions (bind them in let-ins, return them...)
Pass along a bindlib context to allow the variable names to be altered only when
disambiguation is needed. Partial fix to #240 (doesn't affect the backends, only
the printer for the intermediate ASTs).
This also has the benefit of making the output of the tests much more stable.
This is a workaround (but corresponds to what was executed before) and means
that we re-explore all exprs to look for free variables.
The proper fix will be to store boxed_exprs inside scopes instead.
This was the only reasonable solution I found to the issue raised
[here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884).
This was a pretty tedious rewrite, but it should now ensure we are doing things
correctly. As a bonus, the "smart" expression constructors are now used
everywhere to build expressions (so another refactoring like this one should be
much easier) and this makes the code overall feel more
straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!)
---
Basically, we were using values of type `gexpr box = naked_gexpr marked box`
throughout when (re-)building expressions. This was done 99% of the time by
using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In
lots of places, we needed to recover the annotation of this expression later on,
typically to build its parent term (to inherit the position, or build the type).
Since it wasn't always possible to wrap these uses within `box_apply` (esp. as
bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`,
just to recover the position or type. This had the very unpleasant effect of
forcing the resolution of the whole box (including applying any stored closures)
to reach the top-level annotation which isn't even dependant on specific
variable bindings. Then, generally, throwing away the result.
Therefore, the change proposed here transforms
- `naked_gexpr marked Bindlib.box` into
- `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for
convenience)
This means only
1. not fitting the mark into the box right away when building, and
2. accessing the top-level mark directly without unboxing
The functions for building terms from module `Shared_ast.Expr` could be changed
easily. But then they needed to be consistently used throughout, without
manually building terms through `Bindlib.apply_box` -- which covers most of the
changes in this patch.
`Expr.Box.inj` is provided to swap back to a box, before binding for example.
Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`,
which hints at the amount of unnecessary work we were doing --'
These are just variable renumberings, and type error message changes but still
pointing to the same information; the latter are slightly better in general,
pointing to actual expressions rather than scope declarations.
it's now done explicitely from the driver, which allows to do it before typing
and is more consistent; the information was already forwarded to the later
compilation stages separately from the program AST anyway.