catala/compiler/shared_ast/scope.mli

(* This file is part of the Catala compiler, a specification language for tax
   and social benefits computation rules. Copyright (C) 2020-2022 Inria,
   contributor: Denis Merigoux <denis.merigoux@inria.fr>, Alain Delaët-Tixeuil
   <alain.delaet--tixeuil@inria.fr>, Louis Gesbert <louis.gesbert@inria.fr>

   Licensed under the Apache License, Version 2.0 (the "License"); you may not
   use this file except in compliance with the License. You may obtain a copy of
   the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
   WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
   License for the specific language governing permissions and limitations under
   the License. *)

(** Functions handling the code item structures of [shared_ast], in particular
    the scopes *)

open Catala_utils
open Definitions

(** {2 Traversal functions} *)

val map_exprs_in_lets :
  ?typ:(typ -> typ) ->
  f:('expr1 -> 'expr2 boxed) ->
  varf:('expr1 Var.t -> 'expr2 Var.t) ->
  'expr1 scope_body_expr ->
  'expr2 scope_body_expr Bindlib.box
(** Usage
    [map_exprs_in_lets ~f:(fun e -> ...) ~varf:(fun var -> ...) scope_body_expr],
    where [e] is the right-hand-side of a scope let or the result of the scope
    body, and [var] represents the left-hand-side variable of a scope let.
    [~varf] is usually the identity function or [Var.translate] when the map
    sends the expression to a new flavor of the shared AST. If [~reset_types] is
    activated, then the resulting types in the scope let left-hand-sides will be
    reset to [TAny]. *)

val map_exprs :
  ?typ:(typ -> typ) ->
  f:('expr1 -> 'expr2 boxed) ->
  varf:('expr1 Var.t -> 'expr2 Var.t) ->
  'expr1 code_item_list ->
  'expr2 code_item_list Bindlib.box
(** This is the main map visitor for all the expressions inside all the scopes
    of the program. *)

val map_last_item :
  varf:(('a, 'm) naked_gexpr Bindlib.var -> 'e2 Bindlib.var) ->
  ('a, 'm) naked_gexpr list ->
  'e2 list Bindlib.box

(** Helper function to handle the [code_item_list] terminator when manually
    mapping on [code_item_list] *)

val fold_exprs :
  f:('acc -> 'expr -> typ -> 'acc) -> init:'acc -> 'expr code_item_list -> 'acc

(** {2 Conversions} *)

val to_expr : decl_ctx -> ('a any, 'm) gexpr scope_body -> ('a, 'm) boxed_gexpr
(** Usage: [to_expr ctx body scope_position] where [scope_position] corresponds
    to the line of the scope declaration for instance. *)

val unfold :
  decl_ctx -> ((_, 'm) gexpr as 'e) code_item_list -> ScopeName.t -> 'e boxed

val typ : _ scope_body -> typ
(** builds the arrow type for the specified scope *)

val input_type : typ -> Runtime.io_input Mark.pos -> typ
(** Returns the correct input type for scope input variables: this is [typ] for
    non-reentrant variables, but for reentrant variables, it is nested in a
    [TDefault], which only applies to the return type on functions. Note that
    this doesn't take thunking into account (thunking is added during the
    scopelang->dcalc translation) *)

(** {2 Analysis and tests} *)

val free_vars_body_expr : 'e scope_body_expr -> 'e Var.Set.t
val free_vars_item : 'e code_item -> 'e Var.Set.t
val free_vars : 'e code_item_list -> 'e Var.Set.t
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00			`(* This file is part of the Catala compiler, a specification language for tax`
			`and social benefits computation rules. Copyright (C) 2020-2022 Inria,`
			`contributor: Denis Merigoux <denis.merigoux@inria.fr>, Alain Delaët-Tixeuil`
			`<alain.delaet--tixeuil@inria.fr>, Louis Gesbert <louis.gesbert@inria.fr>`

			`Licensed under the Apache License, Version 2.0 (the "License"); you may not`
			`use this file except in compliance with the License. You may obtain a copy of`
			`the License at`

			`http://www.apache.org/licenses/LICENSE-2.0`

			`Unless required by applicable law or agreed to in writing, software`
			`distributed under the License is distributed on an "AS IS" BASIS, WITHOUT`
			`WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the`
			`License for the specific language governing permissions and limitations under`
			`the License. *)`

Toplevel definitions: branch cleanup - fix remaining warnings (mostly unused arguments) - renamings throughout for consistency and clarity 2023-02-13 17:00:23 +03:00			`(** Functions handling the code item structures of [shared_ast], in particular`
			`the scopes *)`
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00
Rename utils to catala_utils 2022-11-21 12:46:17 +03:00			`open Catala_utils`
Additional cleanup/fixes on the compiler refactoring following review ^^ 2022-08-22 19:53:30 +03:00			`open Definitions`
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00
			`(** {2 Traversal functions} *)`

			`val map_exprs_in_lets :`
Generalise the definition of lists of nested binders 2024-02-09 18:48:02 +03:00			`?typ:(typ -> typ) ->`
Swap boxing and annotations in expressions This was the only reasonable solution I found to the issue raised [here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884). This was a pretty tedious rewrite, but it should now ensure we are doing things correctly. As a bonus, the "smart" expression constructors are now used everywhere to build expressions (so another refactoring like this one should be much easier) and this makes the code overall feel more straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!) --- Basically, we were using values of type `gexpr box = naked_gexpr marked box` throughout when (re-)building expressions. This was done 99% of the time by using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In lots of places, we needed to recover the annotation of this expression later on, typically to build its parent term (to inherit the position, or build the type). Since it wasn't always possible to wrap these uses within `box_apply` (esp. as bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`, just to recover the position or type. This had the very unpleasant effect of forcing the resolution of the whole box (including applying any stored closures) to reach the top-level annotation which isn't even dependant on specific variable bindings. Then, generally, throwing away the result. Therefore, the change proposed here transforms - `naked_gexpr marked Bindlib.box` into - `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for convenience) This means only 1. not fitting the mark into the box right away when building, and 2. accessing the top-level mark directly without unboxing The functions for building terms from module `Shared_ast.Expr` could be changed easily. But then they needed to be consistently used throughout, without manually building terms through `Bindlib.apply_box` -- which covers most of the changes in this patch. `Expr.Box.inj` is provided to swap back to a box, before binding for example. Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`, which hints at the amount of unnecessary work we were doing --' 2022-10-06 20:13:45 +03:00			`f:('expr1 -> 'expr2 boxed) ->`
Make all supertypes use ('a, 't) gexpr as parameter instead of naked_gexpr 2022-08-25 20:46:13 +03:00			`varf:('expr1 Var.t -> 'expr2 Var.t) ->`
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00			`'expr1 scope_body_expr ->`
Swap boxing and annotations in expressions This was the only reasonable solution I found to the issue raised [here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884). This was a pretty tedious rewrite, but it should now ensure we are doing things correctly. As a bonus, the "smart" expression constructors are now used everywhere to build expressions (so another refactoring like this one should be much easier) and this makes the code overall feel more straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!) --- Basically, we were using values of type `gexpr box = naked_gexpr marked box` throughout when (re-)building expressions. This was done 99% of the time by using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In lots of places, we needed to recover the annotation of this expression later on, typically to build its parent term (to inherit the position, or build the type). Since it wasn't always possible to wrap these uses within `box_apply` (esp. as bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`, just to recover the position or type. This had the very unpleasant effect of forcing the resolution of the whole box (including applying any stored closures) to reach the top-level annotation which isn't even dependant on specific variable bindings. Then, generally, throwing away the result. Therefore, the change proposed here transforms - `naked_gexpr marked Bindlib.box` into - `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for convenience) This means only 1. not fitting the mark into the box right away when building, and 2. accessing the top-level mark directly without unboxing The functions for building terms from module `Shared_ast.Expr` could be changed easily. But then they needed to be consistently used throughout, without manually building terms through `Bindlib.apply_box` -- which covers most of the changes in this patch. `Expr.Box.inj` is provided to swap back to a box, before binding for example. Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`, which hints at the amount of unnecessary work we were doing --' 2022-10-06 20:13:45 +03:00			`'expr2 scope_body_expr Bindlib.box`
Small fixes 2023-03-28 10:38:47 +03:00			`(** Usage`
			`[map_exprs_in_lets ~f:(fun e -> ...) ~varf:(fun var -> ...) scope_body_expr],`
			`where [e] is the right-hand-side of a scope let or the result of the scope`
			`body, and [var] represents the left-hand-side variable of a scope let.`
			`[~varf] is usually the identity function or [Var.translate] when the map`
			`sends the expression to a new flavor of the shared AST. If [~reset_types] is`
			`activated, then the resulting types in the scope let left-hand-sides will be`
			`reset to [TAny]. *)`
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00
			`val map_exprs :`
Generalise the definition of lists of nested binders 2024-02-09 18:48:02 +03:00			`?typ:(typ -> typ) ->`
Swap boxing and annotations in expressions This was the only reasonable solution I found to the issue raised [here](https://github.com/CatalaLang/catala/pull/334#discussion_r987175884). This was a pretty tedious rewrite, but it should now ensure we are doing things correctly. As a bonus, the "smart" expression constructors are now used everywhere to build expressions (so another refactoring like this one should be much easier) and this makes the code overall feel more straightforward (`Bindlib.box_apply` or `let+` no longer need to be visible!) --- Basically, we were using values of type `gexpr box = naked_gexpr marked box` throughout when (re-)building expressions. This was done 99% of the time by using `Bindlib.box_apply add_mark naked_e` right after building `naked_e`. In lots of places, we needed to recover the annotation of this expression later on, typically to build its parent term (to inherit the position, or build the type). Since it wasn't always possible to wrap these uses within `box_apply` (esp. as bindlib boxes aren't a monad), here and there we had to call `Bindlib.unbox`, just to recover the position or type. This had the very unpleasant effect of forcing the resolution of the whole box (including applying any stored closures) to reach the top-level annotation which isn't even dependant on specific variable bindings. Then, generally, throwing away the result. Therefore, the change proposed here transforms - `naked_gexpr marked Bindlib.box` into - `naked_gexpr Bindlib.box marked` (aliased to `boxed_gexpr` or `gexpr boxed` for convenience) This means only 1. not fitting the mark into the box right away when building, and 2. accessing the top-level mark directly without unboxing The functions for building terms from module `Shared_ast.Expr` could be changed easily. But then they needed to be consistently used throughout, without manually building terms through `Bindlib.apply_box` -- which covers most of the changes in this patch. `Expr.Box.inj` is provided to swap back to a box, before binding for example. Additionally, this gives a 40% speedup on `make -C examples pass_all_tests`, which hints at the amount of unnecessary work we were doing --' 2022-10-06 20:13:45 +03:00			`f:('expr1 -> 'expr2 boxed) ->`
Make all supertypes use ('a, 't) gexpr as parameter instead of naked_gexpr 2022-08-25 20:46:13 +03:00			`varf:('expr1 Var.t -> 'expr2 Var.t) ->`
Add top-level definitions Only handled until before scalc at the moment. 2023-01-23 14:19:36 +03:00			`'expr1 code_item_list ->`
			`'expr2 code_item_list Bindlib.box`
Split Shared_ast.Expr of scope and program functions 2022-08-16 18:09:26 +03:00			`(** This is the main map visitor for all the expressions inside all the scopes`
			`of the program. *)`

Implement safe renaming of idents for backend printing Previously we had some heuristics in the backends trying to achieve this with a lot of holes ; this should be much more solid, relying on `Bindlib` to do the correct renamings. Note1: it's not plugged into the backends other than OCaml at the moment. Note2: the related, obsolete heuristics haven't been cleaned out yet Note3: we conservatively suppose a single namespace at the moment. This is required for e.g. Python, but it forces vars named like struct fields to be renamed, which is more verbose in e.g. OCaml. The renaming engine could be improved to support different namespaces, with a way to select how to route the different kinds of identifiers into them. Similarly, customisation for what needs to be uppercase or lowercase is not available yet. Note4: besides excluding keywords, we should also be careful to exclude (or namespace): - the idents used in the runtime (e.g. `o_add_int_int`) - the dynamically generated idents (e.g. `embed_`) Note5: module names themselves aren't handled yet. The reason is that they must be discoverable by the user, and even need to match the filenames, etc. In other words, imagine that `Mod` is a keyword in the target language. You can't rename a module called `Mod` to `Mod1` without knowing the whole module context, because that would destroy the mapping for a module already called `Mod1`. A reliable solution would be to translate all module names to e.g. `CatalaModule_`, which we can assume will never conflict with any built-in, and forbid idents starting with that prefix. We may also want to restrict their names to ASCII ? Currently we use a projection, but what if I have two modules called `Là` and `La` ? 2024-08-05 18:08:36 +03:00			`val map_last_item :`
			`varf:(('a, 'm) naked_gexpr Bindlib.var -> 'e2 Bindlib.var) ->`
			`('a, 'm) naked_gexpr list ->`
			`'e2 list Bindlib.box`

			`(** Helper function to handle the [code_item_list] terminator when manually`
			`mapping on [code_item_list] *)`

Generalise the definition of lists of nested binders 2024-02-09 18:48:02 +03:00			`val fold_exprs :`
			`f:('acc -> 'expr -> typ -> 'acc) -> init:'acc -> 'expr code_item_list -> 'acc`
Move mode handling code from dcalc to shared_ast Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}` The function parameters (e.g. `make_let_in`) could be removed from the scope handling functions since now the types are compatible, which makes them much easier to read. 2022-08-17 12:49:16 +03:00
			`(** {2 Conversions} *)`

Generalise the definition of lists of nested binders 2024-02-09 18:48:02 +03:00			`val to_expr : decl_ctx -> ('a any, 'm) gexpr scope_body -> ('a, 'm) boxed_gexpr`
Move mode handling code from dcalc to shared_ast Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}` The function parameters (e.g. `make_let_in`) could be removed from the scope handling functions since now the types are compatible, which makes them much easier to read. 2022-08-17 12:49:16 +03:00			`(** Usage: [to_expr ctx body scope_position] where [scope_position] corresponds`
			`to the line of the scope declaration for instance. *)`

			`val unfold :`
Generalise the definition of lists of nested binders 2024-02-09 18:48:02 +03:00			`decl_ctx -> ((_, 'm) gexpr as 'e) code_item_list -> ScopeName.t -> 'e boxed`

			`val typ : _ scope_body -> typ`
			`(** builds the arrow type for the specified scope *)`
Move mode handling code from dcalc to shared_ast Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}` The function parameters (e.g. `make_let_in`) could be removed from the scope handling functions since now the types are compatible, which makes them much easier to read. 2022-08-17 12:49:16 +03:00
Typing defaults fixes: keep in and out type in scope sigs 2023-11-03 19:15:55 +03:00			`val input_type : typ -> Runtime.io_input Mark.pos -> typ`
Reformat 2023-11-07 20:25:57 +03:00			`(** Returns the correct input type for scope input variables: this is [typ] for`
			`non-reentrant variables, but for reentrant variables, it is nested in a`
			`[TDefault], which only applies to the return type on functions. Note that`
			`this doesn't take thunking into account (thunking is added during the`
			`scopelang->dcalc translation) *)`
Typing defaults fixes: keep in and out type in scope sigs 2023-11-03 19:15:55 +03:00
Move mode handling code from dcalc to shared_ast Handling code should now be reasonably well sorted between `Shared_ast.{Var,Expr,Scope,Program}` The function parameters (e.g. `make_let_in`) could be removed from the scope handling functions since now the types are compatible, which makes them much easier to read. 2022-08-17 12:49:16 +03:00			`(** {2 Analysis and tests} *)`

Small cleanup Remove unneeded types, e.g. provisions for scalc 2022-08-26 12:06:00 +03:00			`val free_vars_body_expr : 'e scope_body_expr -> 'e Var.Set.t`
Add top-level definitions Only handled until before scalc at the moment. 2023-01-23 14:19:36 +03:00			`val free_vars_item : 'e code_item -> 'e Var.Set.t`
			`val free_vars : 'e code_item_list -> 'e Var.Set.t`