2022-08-16 18:09:26 +03:00
|
|
|
(* This file is part of the Catala compiler, a specification language for tax
|
|
|
|
and social benefits computation rules. Copyright (C) 2020-2022 Inria,
|
|
|
|
contributor: Denis Merigoux <denis.merigoux@inria.fr>, Alain Delaët-Tixeuil
|
|
|
|
<alain.delaet--tixeuil@inria.fr>, Louis Gesbert <louis.gesbert@inria.fr>
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not
|
|
|
|
use this file except in compliance with the License. You may obtain a copy of
|
|
|
|
the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
|
|
|
|
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
|
|
|
|
License for the specific language governing permissions and limitations under
|
|
|
|
the License. *)
|
|
|
|
|
2022-11-21 12:46:17 +03:00
|
|
|
open Catala_utils
|
2022-08-22 19:53:30 +03:00
|
|
|
open Definitions
|
2022-08-16 18:09:26 +03:00
|
|
|
|
Swap boxing and annotations in expressions
This was the only reasonable solution I found to the issue raised
[here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884).
This was a pretty tedious rewrite, but it should now ensure we are doing things
correctly. As a bonus, the "smart" expression constructors are now used
everywhere to build expressions (so another refactoring like this one should be
much easier) and this makes the code overall feel more
straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!)
---
Basically, we were using values of type `gexpr box = naked_gexpr marked box`
throughout when (re-)building expressions. This was done 99% of the time by
using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In
lots of places, we needed to recover the annotation of this expression later on,
typically to build its parent term (to inherit the position, or build the type).
Since it wasn't always possible to wrap these uses within `box_apply` (esp. as
bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`,
just to recover the position or type. This had the very unpleasant effect of
forcing the resolution of the whole box (including applying any stored closures)
to reach the top-level annotation which isn't even dependant on specific
variable bindings. Then, generally, throwing away the result.
Therefore, the change proposed here transforms
- `naked_gexpr marked Bindlib.box` into
- `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for
convenience)
This means only
1. not fitting the mark into the box right away when building, and
2. accessing the top-level mark directly without unboxing
The functions for building terms from module `Shared_ast.Expr` could be changed
easily. But then they needed to be consistently used throughout, without
manually building terms through `Bindlib.apply_box` -- which covers most of the
changes in this patch.
`Expr.Box.inj` is provided to swap back to a box, before binding for example.
Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`,
which hints at the amount of unnecessary work we were doing --'
2022-10-06 20:13:45 +03:00
|
|
|
let map_exprs_in_lets :
|
2024-02-09 18:48:02 +03:00
|
|
|
?typ:(typ -> typ) ->
|
Swap boxing and annotations in expressions
This was the only reasonable solution I found to the issue raised
[here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884).
This was a pretty tedious rewrite, but it should now ensure we are doing things
correctly. As a bonus, the "smart" expression constructors are now used
everywhere to build expressions (so another refactoring like this one should be
much easier) and this makes the code overall feel more
straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!)
---
Basically, we were using values of type `gexpr box = naked_gexpr marked box`
throughout when (re-)building expressions. This was done 99% of the time by
using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In
lots of places, we needed to recover the annotation of this expression later on,
typically to build its parent term (to inherit the position, or build the type).
Since it wasn't always possible to wrap these uses within `box_apply` (esp. as
bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`,
just to recover the position or type. This had the very unpleasant effect of
forcing the resolution of the whole box (including applying any stored closures)
to reach the top-level annotation which isn't even dependant on specific
variable bindings. Then, generally, throwing away the result.
Therefore, the change proposed here transforms
- `naked_gexpr marked Bindlib.box` into
- `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for
convenience)
This means only
1. not fitting the mark into the box right away when building, and
2. accessing the top-level mark directly without unboxing
The functions for building terms from module `Shared_ast.Expr` could be changed
easily. But then they needed to be consistently used throughout, without
manually building terms through `Bindlib.apply_box` -- which covers most of the
changes in this patch.
`Expr.Box.inj` is provided to swap back to a box, before binding for example.
Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`,
which hints at the amount of unnecessary work we were doing --'
2022-10-06 20:13:45 +03:00
|
|
|
f:('expr1 -> 'expr2 boxed) ->
|
|
|
|
varf:('expr1 Var.t -> 'expr2 Var.t) ->
|
|
|
|
'expr1 scope_body_expr ->
|
|
|
|
'expr2 scope_body_expr Bindlib.box =
|
2024-02-09 18:48:02 +03:00
|
|
|
fun ?(typ = Fun.id) ~f ~varf scope_body_expr ->
|
|
|
|
let f e = Expr.Box.lift (f e) in
|
|
|
|
BoundList.map ~last:f
|
|
|
|
~f:(fun v scope_let ->
|
|
|
|
( varf v,
|
|
|
|
Bindlib.box_apply
|
|
|
|
(fun scope_let_expr ->
|
2022-12-06 17:59:08 +03:00
|
|
|
{
|
|
|
|
scope_let with
|
|
|
|
scope_let_expr;
|
2024-02-09 18:48:02 +03:00
|
|
|
scope_let_typ = typ scope_let.scope_let_typ;
|
2022-12-06 17:59:08 +03:00
|
|
|
})
|
2024-02-09 18:48:02 +03:00
|
|
|
(f scope_let.scope_let_expr) ))
|
2022-08-16 18:09:26 +03:00
|
|
|
scope_body_expr
|
|
|
|
|
Implement safe renaming of idents for backend printing
Previously we had some heuristics in the backends trying to achieve this with a
lot of holes ; this should be much more solid, relying on `Bindlib` to do the
correct renamings.
**Note1**: it's not plugged into the backends other than OCaml at the moment.
**Note2**: the related, obsolete heuristics haven't been cleaned out yet
**Note3**: we conservatively suppose a single namespace at the moment. This is
required for e.g. Python, but it forces vars named like struct fields to be
renamed, which is more verbose in e.g. OCaml. The renaming engine could be
improved to support different namespaces, with a way to select how to route the
different kinds of identifiers into them.
Similarly, customisation for what needs to be uppercase or lowercase is not
available yet.
**Note4**: besides excluding keywords, we should also be careful to exclude (or
namespace):
- the idents used in the runtime (e.g. `o_add_int_int`)
- the dynamically generated idents (e.g. `embed_*`)
**Note5**: module names themselves aren't handled yet. The reason is that they
must be discoverable by the user, and even need to match the filenames, etc. In
other words, imagine that `Mod` is a keyword in the target language. You can't
rename a module called `Mod` to `Mod1` without knowing the whole module context,
because that would destroy the mapping for a module already called `Mod1`.
A reliable solution would be to translate all module names to e.g.
`CatalaModule_*`, which we can assume will never conflict with any built-in, and
forbid idents starting with that prefix. We may also want to restrict their
names to ASCII ? Currently we use a projection, but what if I have two modules
called `Là` and `La` ?
2024-08-05 18:08:36 +03:00
|
|
|
let map_last_item ~varf last =
|
|
|
|
Bindlib.box_list
|
|
|
|
@@ List.map
|
|
|
|
(function EVar v -> Bindlib.box_var (varf v) | _ -> assert false)
|
|
|
|
last
|
|
|
|
|
2024-02-09 18:48:02 +03:00
|
|
|
let map_exprs ?(typ = Fun.id) ~f ~varf scopes =
|
|
|
|
let f v = function
|
2023-01-23 14:19:36 +03:00
|
|
|
| ScopeDef (name, body) ->
|
|
|
|
let scope_input_var, scope_lets = Bindlib.unbind body.scope_body_expr in
|
2024-02-09 18:48:02 +03:00
|
|
|
let new_body_expr = map_exprs_in_lets ~typ ~f ~varf scope_lets in
|
2022-08-16 18:09:26 +03:00
|
|
|
let new_body_expr =
|
|
|
|
Bindlib.bind_var (varf scope_input_var) new_body_expr
|
|
|
|
in
|
2024-02-09 18:48:02 +03:00
|
|
|
( varf v,
|
|
|
|
Bindlib.box_apply
|
|
|
|
(fun scope_body_expr ->
|
|
|
|
ScopeDef (name, { body with scope_body_expr }))
|
|
|
|
new_body_expr )
|
|
|
|
| Topdef (name, ty, expr) ->
|
|
|
|
( varf v,
|
|
|
|
Bindlib.box_apply
|
|
|
|
(fun e -> Topdef (name, typ ty, e))
|
|
|
|
(Expr.Box.lift (f expr)) )
|
|
|
|
in
|
Implement safe renaming of idents for backend printing
Previously we had some heuristics in the backends trying to achieve this with a
lot of holes ; this should be much more solid, relying on `Bindlib` to do the
correct renamings.
**Note1**: it's not plugged into the backends other than OCaml at the moment.
**Note2**: the related, obsolete heuristics haven't been cleaned out yet
**Note3**: we conservatively suppose a single namespace at the moment. This is
required for e.g. Python, but it forces vars named like struct fields to be
renamed, which is more verbose in e.g. OCaml. The renaming engine could be
improved to support different namespaces, with a way to select how to route the
different kinds of identifiers into them.
Similarly, customisation for what needs to be uppercase or lowercase is not
available yet.
**Note4**: besides excluding keywords, we should also be careful to exclude (or
namespace):
- the idents used in the runtime (e.g. `o_add_int_int`)
- the dynamically generated idents (e.g. `embed_*`)
**Note5**: module names themselves aren't handled yet. The reason is that they
must be discoverable by the user, and even need to match the filenames, etc. In
other words, imagine that `Mod` is a keyword in the target language. You can't
rename a module called `Mod` to `Mod1` without knowing the whole module context,
because that would destroy the mapping for a module already called `Mod1`.
A reliable solution would be to translate all module names to e.g.
`CatalaModule_*`, which we can assume will never conflict with any built-in, and
forbid idents starting with that prefix. We may also want to restrict their
names to ASCII ? Currently we use a projection, but what if I have two modules
called `Là` and `La` ?
2024-08-05 18:08:36 +03:00
|
|
|
BoundList.map ~f ~last:(map_last_item ~varf) scopes
|
2024-02-09 18:48:02 +03:00
|
|
|
|
|
|
|
let fold_exprs ~f ~init scopes =
|
|
|
|
let f acc def _ =
|
|
|
|
match def with
|
|
|
|
| Topdef (_, typ, e) -> f acc e typ
|
|
|
|
| ScopeDef (_, scope) ->
|
|
|
|
let _, body = Bindlib.unbind scope.scope_body_expr in
|
|
|
|
let acc, last =
|
|
|
|
BoundList.fold_left body ~init:acc ~f:(fun acc sl _ ->
|
|
|
|
f acc sl.scope_let_expr sl.scope_let_typ)
|
|
|
|
in
|
|
|
|
f acc last (TStruct scope.scope_body_output_struct, Expr.pos last)
|
2023-01-23 14:19:36 +03:00
|
|
|
in
|
2024-02-09 18:48:02 +03:00
|
|
|
fst @@ BoundList.fold_left ~f ~init scopes
|
2022-09-30 17:38:35 +03:00
|
|
|
|
2024-02-09 18:48:02 +03:00
|
|
|
let typ body =
|
|
|
|
let pos = Mark.get (StructName.get_info body.scope_body_input_struct) in
|
|
|
|
let input_typ = Mark.add pos (TStruct body.scope_body_input_struct) in
|
|
|
|
let result_typ = Mark.add pos (TStruct body.scope_body_output_struct) in
|
2023-05-17 16:44:57 +03:00
|
|
|
Mark.add pos (TArrow ([input_typ], result_typ))
|
2022-08-17 12:49:16 +03:00
|
|
|
|
2024-02-09 18:48:02 +03:00
|
|
|
let get_body_mark scope_body =
|
|
|
|
let m0 =
|
|
|
|
match Bindlib.unbind scope_body.scope_body_expr with
|
|
|
|
| _, Last (_, m) | _, Cons ({ scope_let_expr = _, m; _ }, _) -> m
|
|
|
|
in
|
|
|
|
Expr.with_ty m0 (typ scope_body)
|
|
|
|
|
|
|
|
let unfold_body_expr (_ctx : decl_ctx) (scope_let : 'e scope_body_expr) =
|
|
|
|
BoundList.fold_right scope_let ~init:Expr.rebox ~f:(fun sl var acc ->
|
|
|
|
Expr.make_let_in var sl.scope_let_typ
|
|
|
|
(Expr.rebox sl.scope_let_expr)
|
|
|
|
acc sl.scope_let_pos)
|
|
|
|
|
2023-11-03 19:15:55 +03:00
|
|
|
let input_type ty io =
|
|
|
|
match io, ty with
|
|
|
|
| (Runtime.Reentrant, iopos), (TArrow (args, ret), tpos) ->
|
|
|
|
TArrow (args, (TDefault ret, iopos)), tpos
|
2023-11-07 20:25:57 +03:00
|
|
|
| (Runtime.Reentrant, iopos), (ty, tpos) -> TDefault (ty, tpos), iopos
|
2023-11-03 19:15:55 +03:00
|
|
|
| _, ty -> ty
|
|
|
|
|
2024-02-09 18:48:02 +03:00
|
|
|
let to_expr (ctx : decl_ctx) (body : 'e scope_body) : 'e boxed =
|
2022-08-17 12:49:16 +03:00
|
|
|
let var, body_expr = Bindlib.unbind body.scope_body_expr in
|
|
|
|
let body_expr = unfold_body_expr ctx body_expr in
|
2024-02-09 18:48:02 +03:00
|
|
|
let pos = Expr.pos body_expr in
|
2022-08-17 12:49:16 +03:00
|
|
|
Expr.make_abs [| var |] body_expr
|
2024-02-09 18:48:02 +03:00
|
|
|
[TStruct body.scope_body_input_struct, pos]
|
|
|
|
pos
|
|
|
|
|
|
|
|
let unfold (ctx : decl_ctx) (s : 'e code_item_list) (main_scope : ScopeName.t) :
|
|
|
|
'e boxed =
|
|
|
|
BoundList.fold_lr s ~top:None
|
|
|
|
~down:(fun v item main ->
|
|
|
|
match main, item with
|
|
|
|
| None, ScopeDef (name, body) when ScopeName.equal name main_scope ->
|
|
|
|
Some (Expr.make_var v (get_body_mark body))
|
|
|
|
| r, _ -> r)
|
Implement safe renaming of idents for backend printing
Previously we had some heuristics in the backends trying to achieve this with a
lot of holes ; this should be much more solid, relying on `Bindlib` to do the
correct renamings.
**Note1**: it's not plugged into the backends other than OCaml at the moment.
**Note2**: the related, obsolete heuristics haven't been cleaned out yet
**Note3**: we conservatively suppose a single namespace at the moment. This is
required for e.g. Python, but it forces vars named like struct fields to be
renamed, which is more verbose in e.g. OCaml. The renaming engine could be
improved to support different namespaces, with a way to select how to route the
different kinds of identifiers into them.
Similarly, customisation for what needs to be uppercase or lowercase is not
available yet.
**Note4**: besides excluding keywords, we should also be careful to exclude (or
namespace):
- the idents used in the runtime (e.g. `o_add_int_int`)
- the dynamically generated idents (e.g. `embed_*`)
**Note5**: module names themselves aren't handled yet. The reason is that they
must be discoverable by the user, and even need to match the filenames, etc. In
other words, imagine that `Mod` is a keyword in the target language. You can't
rename a module called `Mod` to `Mod1` without knowing the whole module context,
because that would destroy the mapping for a module already called `Mod1`.
A reliable solution would be to translate all module names to e.g.
`CatalaModule_*`, which we can assume will never conflict with any built-in, and
forbid idents starting with that prefix. We may also want to restrict their
names to ASCII ? Currently we use a projection, but what if I have two modules
called `Là` and `La` ?
2024-08-05 18:08:36 +03:00
|
|
|
~bottom:(fun _vlist -> function Some v -> v | None -> raise Not_found)
|
2024-02-09 18:48:02 +03:00
|
|
|
~up:(fun var item next ->
|
|
|
|
let e, typ =
|
|
|
|
match item with
|
|
|
|
| ScopeDef (_, body) -> to_expr ctx body, typ body
|
|
|
|
| Topdef (_, typ, expr) -> Expr.rebox expr, typ
|
|
|
|
in
|
|
|
|
Expr.make_let_in var typ e next (Expr.pos e))
|
|
|
|
|
|
|
|
let free_vars_body_expr scope_lets =
|
|
|
|
BoundList.fold_right scope_lets ~init:Expr.free_vars ~f:(fun sl v acc ->
|
|
|
|
Var.Set.union (Var.Set.remove v acc) (Expr.free_vars sl.scope_let_expr))
|
2022-08-17 12:49:16 +03:00
|
|
|
|
2023-01-23 14:19:36 +03:00
|
|
|
let free_vars_item = function
|
|
|
|
| ScopeDef (_, { scope_body_expr; _ }) ->
|
|
|
|
let v, body = Bindlib.unbind scope_body_expr in
|
|
|
|
Var.Set.remove v (free_vars_body_expr body)
|
|
|
|
| Topdef (_, _, expr) -> Expr.free_vars expr
|
2022-08-17 12:49:16 +03:00
|
|
|
|
2024-02-09 18:48:02 +03:00
|
|
|
let free_vars scopes =
|
|
|
|
BoundList.fold_right scopes
|
Implement safe renaming of idents for backend printing
Previously we had some heuristics in the backends trying to achieve this with a
lot of holes ; this should be much more solid, relying on `Bindlib` to do the
correct renamings.
**Note1**: it's not plugged into the backends other than OCaml at the moment.
**Note2**: the related, obsolete heuristics haven't been cleaned out yet
**Note3**: we conservatively suppose a single namespace at the moment. This is
required for e.g. Python, but it forces vars named like struct fields to be
renamed, which is more verbose in e.g. OCaml. The renaming engine could be
improved to support different namespaces, with a way to select how to route the
different kinds of identifiers into them.
Similarly, customisation for what needs to be uppercase or lowercase is not
available yet.
**Note4**: besides excluding keywords, we should also be careful to exclude (or
namespace):
- the idents used in the runtime (e.g. `o_add_int_int`)
- the dynamically generated idents (e.g. `embed_*`)
**Note5**: module names themselves aren't handled yet. The reason is that they
must be discoverable by the user, and even need to match the filenames, etc. In
other words, imagine that `Mod` is a keyword in the target language. You can't
rename a module called `Mod` to `Mod1` without knowing the whole module context,
because that would destroy the mapping for a module already called `Mod1`.
A reliable solution would be to translate all module names to e.g.
`CatalaModule_*`, which we can assume will never conflict with any built-in, and
forbid idents starting with that prefix. We may also want to restrict their
names to ASCII ? Currently we use a projection, but what if I have two modules
called `Là` and `La` ?
2024-08-05 18:08:36 +03:00
|
|
|
~init:(fun _vlist -> Var.Set.empty)
|
2024-02-09 18:48:02 +03:00
|
|
|
~f:(fun item v acc ->
|
|
|
|
Var.Set.union (Var.Set.remove v acc) (free_vars_item item))
|