WIP

2024-10-04 23:27:15 +03:00 · 2022-01-12 15:59:48 +01:00 · 2022-01-12 15:59:48 +01:00 · 61cd48090c
commit 61cd48090c
parent c10f74c285
2 changed files with 1043 additions and 0 deletions
--- a/rfcs/001-overriding.md
+++ b/rfcs/001-overriding.md
@ -0,0 +1,960 @@
+---
+feature: overriding
+start-date: 2021-03-18
+author: Yann Hamdaoui
+---
+
+# Overriding
+
+This document is a proposal for an overriding mechanism in Nickel. It is
+expected to evolve while we put these ideas in practice, but shall serve as
+a design and implementation baseline.
+
+Related issues: [#103](https://github.com/tweag/nickel/issues/103),
+[#240](https://github.com/tweag/nickel/issues/240),
+[#255](https://github.com/tweag/nickel/issues/255),
+[#279](https://github.com/tweag/nickel/issues/279).
+
+## Context
+
+### Definition of the problem
+
+*Overriding* is the ability to take an existing configuration - in our case, a
+record - and update a specific part of it, such as a field or the field of a
+nested record.
+
+A naive and unergonomic of way of doing it is to repeat all the original
+values in a new record:
+
+```
+let record = {
+  a = 1,
+  b = "str",
+  c = false
+} in
+{
+  a = record.a,
+  b = record.b,
+  c = true
+}
+```
+
+This quickly becomes unmanageable for large records, which are common in Nix and
+Nixpkgs. In functional languages, this is usually better done using what's
+called functional update. Functional update is an operation that takes the
+original record, one or several fields to update, the corresponding new value,
+and returns a new updated record. It has the same semantics as our first
+snippet, but doesn't require to rewrite all unchanged fields.
+
+It can have a builtin syntax, such as OCaml's `with`: `{record with field =
+new_value}`, Haskell's `record {field = newValue}`, Nix `//` operator `record //
+{field = newValue}`, or Rust's syntax `RecordDataType {field: new_value,
+..record}`. There are more advanced programming techniques that make updating
+deeply nested records ergonomic such as
+[Lenses](https://www.fpcomplete.com/haskell/tutorial/lens/) in Haskell, but
+these rely too heavily on advanced language and typing features to be practical
+in Nickel.
+
+In Nickel, we can already do a functional update using merging, although the
+updated field must have been marked as `default`:
+
+```nickel
+let record = {
+  a = 1,
+  b = "str",
+  c | default = false,
+} in
+record & {c = true}
+```
+
+As explained in the next section though, this is not satisfying.
+
+### Overriding recursive records
+
+Nickel's records are different from the ones of OCaml, Haskell or Rust.  They
+are lazy and recursive by default. They are thus better understood as
+[codata](https://link.springer.com/chapter/10.1007/978-3-030-17184-1_5)
+rather than data. Take the following example:
+
+```nickel
+let record = {
+  protocol | default = `Http,
+  port | default = switch protocol {
+    `Http => 80,
+    `Ftp => 21,
+    _ => 8181,
+  },
+} in
+record & { protocol = `Ftp }
+```
+
+Here, our final record will be ``{protocol = `Ftp, port = 80}``. `port` is an
+expression depending on `protocol`, but it is not updated in the new version,
+which is something one could intuitively expect from a recursive record. Such
+recursive update is a pattern actively used in Nix, and is probably a common
+scenario for configurations in general: if I patch a base configuration by
+changing an option, I usually want other values depending on this option to be
+updated as well.
+
+## Comparing override mechanisms
+We are going to review various overriding mechanisms of Nix and other related
+languages. Let us sketch some general traits of overriding mechanisms as a
+framework for comparison:
+
+- **(ERG)**: Ergonomy. A mechanism is ergonomic if it avoids
+  complex encodings for overridable records, if it doesn't require extra library
+  functions, and so on.  It would ideally work out of the box with the native
+  records of the language.
+- **(NEST)**: Nested field overriding. The ease of overriding nested fields,
+  such as the value of `bar` in `{foo = {bar = "baz"}}`.  Ideally, overriding
+  nested fields is no harder than top-level fields.
+- **(COMP)**: Composability. Overriding is composable when one can seamlessly
+    apply several overrides to an initial record.
+- **(EXPR)**: Expressivity. An expressive mechanism is one that allow to express
+  and do more things. For example, it can be the ability to drop some fields, to
+  access the previous record's version of the data (`super`), and so on.
+
+## Overriding in Nix
+
+Nix expressions don't have a built-in way of overriding recursive records while
+automatically updating the dependent recursive fields either. Since the feature
+is still actively needed, it is provided by mechanisms implemented in library
+code.
+
+Keeping a language lean is usually a good design guideline: to provide an
+expressive yet simple set of features upon which others can be built as
+libraries. However, in the case of Nix, overriding is a fundamental pattern, and
+having it implemented in user code leads to some general well-known issues:
+
+- ~~**(ERG)**~~ Definition, update or field access is not as ergonomic as with
+  vanilla native records.
+- Several competing mechanisms have been implemented in slightly different
+  contexts. Overriding in general is already hard for beginners, but having to
+  learn several ones, all similar but still different, is even harder.
+- User-land implementations don't have a first-class access to information such
+  as source location and metavalues. In a dynamic language, this can make good
+  error reporting much harder.
+- It is potentially harder to make user land implementations efficient.
+
+See [this gist](https://gist.github.com/edolstra/29ce9d8ea399b703a7023073b0dbc00d)
+for more details. We continue with an overview of existing mechanisms in Nix and
+related languages.
+
+### Nixpkgs overriding
+
+The basic idea is to represent a recursive record explicitly as a function from
+a `self` record to the final record (here, in Nix syntax):
+
+```nix
+r = rec {
+  a = 1;
+  b = a + 1;
+}
+# r is represented as:
+rRepr = self: {
+  a = 1;
+  b = self.a + 1;
+}
+```
+
+`self` is a self-reference, akin to `this` in object oriented languages.  It is
+computed as a [fixpoint](https://en.wikipedia.org/wiki/Fixed-point_combinator),
+simply realized by auto-application in Nix, thanks to laziness:
+
+```nix
+r = rRepr r
+```
+
+The explicit dependency to `self` gives the ability to provide a different
+definition for `self.a` in the final value. Overriding is achieved exactly by
+functionally updating the `a` field of the `self` parameter before passing it to
+the original representation:
+
+```nix
+let extension = {a = 2;}; in
+# The fixpoint of result is { a = 2; b = 3; }
+resultRepr = self: (rRepr (self // extension)) // extension
+```
+
+The second outer update ensures that the final result is also set to `a = 2`,
+and not only the `a` appearing in `b`.
+
+Some details are left out, but this is the gist of it. See also [the Nix pill on
+overriding](https://nixos.org/guides/nix-pills/override-design-pattern.html) or
+[this article on fixpoints in Nix](http://r6.ca/blog/20140422T142911Z.html).
+
+#### Limits
+
+- ~~**(ERG)**~~ Use a specific representation, rather than handling good old plain records
+  (although the actual representation in Nixpkgs is more ergonomic than a plain
+  function).
+- ~~**(NEST)**~~ Overriding nested attribute sets is painful. If one do the naive thing,
+  the whole subrecord is erased:
+  ```nix
+  let rRepr = self: {
+    a = {b = self.a.c;};
+  }; in
+  let extension = {a = {c = 2;};}; in
+  rExt = let fixpoint = rRepr (fixpoint // extension); in
+  fixpoint // extension
+  # Gives {a = {c = 2;};} instead of expected {a = {b = 2; c = 2;};}
+  ```
+
+### Nixpkgs overlays
+
+[Overlays](https://nixos.wiki/wiki/Overlays) can be seen as a sequence of
+transformations from a base record, each layer having access to a `super`
+reference to the previous layer and the `self` reference to the final value.
+
+Take two consecutive transformations:
+
+1. Set the `a` field to `1`
+2. Set the `b` field to `1`, add `1` to the `a` field
+
+```nix
+let baseRepr = self: {c = self.a + self.b;}; in
+let overlay1 = self: super: {a = 1;}; in
+let overlay2 = self: super: {b = 1; a = super.a + 1;}; in
+let applyOverlays = self:
+  let base = baseRepr self; in
+  let first = base // overlay1 self base; in
+  let second = first // overlay2 self first; in
+  second; in
+let fixpoint = applyOverlays fixpoint; in fixpoint
+```
+
+In practice, the `super // ..` and fixpoints parts can be factorised in
+dedicated helper functions.
+
+#### Advantages
+
+- **(EXPR)**: The explicit representation of layers by `super`
+  and `self` gives a large control to the user.
+- **(COMP)**: Composition is first-class.
+
+#### Limits
+
+- ~~**(ERG)**~~ Users still need to manipulate representations.
+- **order-dependency**: The result is order-dependent. Applying both `self:
+  super: {a = super.a + 1}` and `self: super: {a = super.a / 2}` can give two
+  different results depending on which one is the first layer. This also means
+  that overrides must be grouped by layers, which is not necessarily the most
+  logical structure.
+- ~~**(NEST)**~~ Overriding nested fields is still clumsy. For example, to
+  override `lib.firefoxVersion`:
+  ```nix
+  self: super: { lib = (super.lib or {}) // { firefoxVersion = ...; }; }
+  ```
+
+### NixOs module system
+
+The NixOS module system takes a different approach. There, all the basic blocks
+— the modules — are merged together in an unspecified order. What's deciding the
+priority of one option over the other are attributes that are explicitly stated
+in the modules themselves. The declaration of options (type, priority, etc.) and
+their assignment must be made separately.
+
+Compared to overlays, the explicit reference to `super` has disappeared. This
+makes it look like modules can only override one value with another, instead of
+combining different pieces of data together.
+
+This apparent shortcoming is solved by custom merge functions, that can redefine
+how to combine different values of the field of a configuration. By default,
+when merging two list values, the module system only knows how to replace one
+value with the other, because there's no canonical and commutative way of
+merging two lists.  However, the user can specify that these lists should in
+fact be concatenated, resulting in the possibility of defining a list of paths
+by pieces:
+
+```nix
+# Some module
+{
+  paths = mkOption {
+    type = types.listOf types.path;
+  };
+}
+
+# Config
+[..]
+someModule.paths = [foo/bar];
+
+# Other config
+[..]
+someModule.paths = [/bar/baz];
+
+# Resulting config: [foo/bar /bar/baz]
+```
+
+#### Advantages
+
+- **order-independence**: as opposed to overlays, modules can be merged in any
+    order, and the final result will be the same.
+
+#### Limits
+
+- ~~**(EXPR)**~~: Despite custom merge functions, this model is less powerful
+    than overlays which have access to an explicit reference to `super`.
+    Modules, on the other hand, are all merged at the same level in an
+    unspecified order.
+
+    For example, the following has no equivalent in a merging model:
+    ```nix
+    let overlay = self: super: {b = self.c + 1; a = super.c + 1;}; in #...
+    ```
+
+However, modules are quite capable in practice. Removing fields and being
+order-dependent as in the overlay example above is not necessarily a good idea,
+and this limitation of expressivity may in fact be a good thing.
+
+## Overriding elsewhere
+
+### CUE
+CUE allows a form of late-binding for recursive attributes:
+
+```
+$cat test.cue
+fields: {
+ a: int
+ b: a + 1
+} & {
+ a: 2
+}
+$cue eval test.cue
+fields: {
+    a: 2
+    b: 3
+}
+```
+
+Combined with [default values](https://cuelang.org/docs/tutorials/tour/types/defaults/),
+this provides an overriding mechanism:
+
+```
+$cat test2.cue
+fields: {
+ a: int | *1
+ b: a + 1
+} & {
+ a: 2
+}
+$cue eval test2.cue
+fields: {
+    a: 2
+    b: 3
+}
+```
+
+The basics of this overriding mechanism are close in spirit to what we want to
+achieve in this proposal. This is similar to Nickel's current `default`
+behavior, but with the desired late-bound merging. The expressivity is however
+too limited to replace e.g. the NixOS module system.
+
+#### Advantages
+
+- **(ERG)**: Overriding works seamlessly with native records. Note however that
+    it requires fields to be explicitly marked as overridable (default).
+
+#### Limits
+
+- ~~**(EXP)**~~: One can only replace a value by another one. It is not
+     possible to add one to the previous value, for example.
+- ~~**(COMP)**~~: Overriding a record a second time is not possible.
+
+### Jsonnet
+
+Jsonnet refers to its overriding mechanism as *inheritance*, implemented by the
+`+` operator on objects:
+
+```jsonnet
+local obj = {
+  name: "Alice",
+  greeting: "Hello, " + self.name,
+};
+[
+  obj,
+  obj + { name: "Bob" },
+  obj + { greeting: super.greeting + "!"},
+  obj + { name: "Bob", greeting: super.greeting + "!"},
+]
+```
+
+This is similar to the Nix operator `//`, but doing recursive overriding in the
+expected way out of the box. The extension can access the previous version in
+the same way as Nixpkgs overlays, using the `super` keyword.
+
+#### Advantages
+
+- **(ERG)**: Integrated with native records.
+- **(COMP)**: Overriding is easily iterated.
+- **(EXP)**: Achieve the same level of expressivity than overlays with the
+    special keywords `self` and `super`.
+
+#### Limits
+
+- **order-dependency**: As with overlays, the order of application of overrides
+    matters.
+
+## Taking a step back
+
+While not exactly the same, all these overriding mechanisms are based on the
+same underlying principles:
+1. Represent recursive records (explicitly or implicitly) as a function of
+   `self` (and `super` in some cases)
+2. Compute the combination as a **fixpoint** of several recursive records,
+   giving back a standard record.
+
+As hinted by Jsonnet's terminology, the semantics of recursive records
+strikingly resemble the semantics of objects and classes in OOP.  Replace
+records with objects, fields with methods and overriding with inheritance. This
+is not so surprising: there's actually an history of encoding objects in
+functional languages as recursive records (see the introduction of
+[The Recursive Record Semantics of Objects Revisited](https://hal.inria.fr/inria-00072423)
+for a good overview) going back to 1988
+\[[Cardelli](http://lucacardelli.name/papers/inheritance.pdf)\].
+
+This is also mentioned in the
+[README](https://github.com/MuKnIO/nixpkgs/blob/devel/lib/pop.md#some-historical-context)
+of POP (an object system in Nix), where the author observes that overriding
+mechanisms in Nix (and Jsonnet for that matter) are a simplified lazy object
+system (simplified because objects lack proper state and there is no distinction
+between classes and instances). Their logical conclusion is to embrace this fact
+and design a proper object system helped by existing literature, rather than
+reinventing the wheel. Similarly, Nix [overlays](https://nixos.wiki/wiki/Overlays)
+can be seen as a single inheritance mechanism.
+
+Inheritance-based overriding imposes an order on the overrides. A single level
+inheritance is usually fine, but a complex hierarchy can become hard to maintain
+and to reason about.
+
+The NixOS module system is designed differently. It is based on merging: the
+configuration is created by combining a set of unordered records following
+specific rules. Of course, there's still a need for ordering information
+somewhere, but it is rather expressed as priorities. This system has the
+advantage of making merge commutative (in contrast with inheritance or the `//`
+operator), as in CUE, and to untie data definition from precedence
+specification: one can define a module where each field has a different
+priority, if it makes sense to group them logically.  With inheritance, values
+are required to be grouped in layers of the same priority, instead of logically.
+
+Summing up the differences between inheritance-based mechanisms and merge-based
+mechanism:
+
+**order-dependency**
+
+Inheritance is order dependent: the chain of extensions must be defined with the
+precise order they will be applied in mind, and definitions must be grouped in
+consequence. On the other hand, merging is commutative, and the precedence
+information is encoded as priorities. Thus, overriding by merging can be defined
+using stand-alone pieces of data, although the behavior of priorities is not
+local.
+
+**(EXP)**
+
+Merged records have only access to the final computed fixpoint `self`, while
+objects have access to the previous stage of extension via `super`. However, as
+in the NixOS module system, it is possible to address this issue using custom
+merge functions. This is a bit less expressive, but in a good way: it forces the
+merge strategy to be uniform along each field, while mechanisms like overlays or
+inheritance can do pretty much anything.
+
+## Proposal overview
+
+We basically propose to adopt the same kind of merge, priority, and custom merge
+function-based scheme as the NixOS module system. One big difference with the
+NixOS module system is that it would be built-in in the language, being usable
+for any native record whatsoever, and having good interaction with the rest of
+the language features. The implementation would have first class access to
+locations, to the AST, the memory layout, the metadata, and so on.
+
+We first describe the ideas at a high-level. Then, we review the issue and
+challenges of this approach, and propose concrete solutions to overcome or
+mitigate them. Finally, the precise operational semantics is laid out in a
+dedicated section.
+
+### Recursive records & merging
+
+**This section defines the semantics, rather than an actual efficient
+implementation**.
+
+As before, it is useful to see recursive records represented as functions - or
+constructors - that take a `self` parameter and return a final non recursive record,
+in the same way as the original overriding mechanism of Nixpkgs (the non-existing
+syntax `def := value` is used to insist on the fact that we are defining new objects):
+
+```
+r = {
+  a = 1;
+  b = a + 1;
+}
+// Definition of the representation of r
+repr(r) := fun self => {
+  a = 1;
+  b = self.a + 1;
+}
+```
+
+Field access amounts to compute a fix-point:
+
+```
+repr(r).foo := let fix = repr(r) fix in fix.foo
+```
+
+Merging simply merges the underlying representations:
+
+```
+repr(r1) & repr(r2) := fun self => r1 self & r2 self
+```
+
+### Priorities
+
+Merge is fundamentally commutative. The problem is, not all fields should be
+treated the same: some are default values that ought to be overridden, some are
+high-priority values that ought to override. The solution is to use a priority
+system, encoding the precedence of each value while retaining commutativity
+(only the priority annotation counts, not the order in which values are merged).
+
+Priorities have drawbacks. One is non-locality: the final result depends on
+other priorities of the field's values you are being merged with, potentially
+defined elsewhere. However, this can be mitigated by good defaults and an
+adapted set of priorities, such as a bottom `default`, a top `force`, and a
+infinite range of integers priorities in between, the default priority (when no
+priority is provided) being `0`. Doing so, it's easy to just use `default` or
+`force` to override or be overridden by anything, without knowing the precise
+integer priority. Integer priorities still gives freedom with an infinite supply
+of levels if required.
+
+### Custom merge functions
+
+Custom merge functions would be specified by a `merge` metavalue attribute. In
+order to enforce commutativity, they would receive their arguments in an
+indistinguishable order, such as having the type `{lower: Dyn, higher: Dyn,
+priority: <Different, Equal>} -> Dyn`.  If both have the same priority, the order
+is not specified, and may even be randomized by the interpreter. The `priority`
+field indicates when it is the case, if this case needs special handling.
+
+```nickel
+let mergeLists :
+  forall a. {lower: List a, higher: List a, priority: <Different, Equal>} -> List a
+  = fun args => args.lower @ args.higher in
+
+let Contract = {
+  path | List Str
+       | merge mergeLists
+       | doc "A list of paths to search in."
+} in
+
+let block1 | #Contract = {
+  path = ["/usr/local/bin"]
+} in
+let block2 = {
+  path = ["/bin"]
+} in
+
+// { path = ["usr/local/bin", "/bin"] }
+block1 & block2
+```
+
+## Issues & challenges
+
+The proposal raises some questions. While none seems insuperable, users will
+need to have a good mental picture of the system, and we should be careful to
+avoid ending up with a complicated system full of ad-hoc fixes to edge cases.
+
+### Implementation
+
+Recomputing the fixpoint at each field access is wasteful, because recursive
+records as functions satisfy the following property: as long as it is not
+merged, the `self` argument is constant. Thus, we can memoize the fixpoint and
+only invalidate `self` on a merge. Also, fields that do not depend on a
+recursive variable can be hoisted out of the function, using a representation
+like:
+
+```
+repr(r) := {
+  a = 1;
+  b = fun self => self.a + 1;
+}
+```
+
+Concretely, each field of a record may either be a thunk as usual, in the case
+of non recursive expressions like `a`, or a thunk together with the original
+expression in the recursive case. All record operations except merge operate
+on the thunk. A merge operation, on the other hand, restores the original
+expression again.
+
+There's more potential optimizations, but this first step should be a reasonable
+trade-off between implementation complexity and performance. See
+[#103](https://github.com/tweag/nickel/issues/103) for more details.
+
+### Scoping
+
+Should a record be able to access a yet undefined field because it is expected
+to be provided by a subsequent merge? Two possible approaches:
+
+ - **dynamic scoping**: records can reference fields that are not explicitly
+   defined locally, such as:
+   ```nickel
+   {a = b} & {b = 1}
+   ```
+   Dynamic scoping have a number of issues, and is usually considered bad
+   practice.
+ - **lexical scoping**: as currently, require self-referenced fields to be
+   defined locally. Note that thanks to contracts, one can require the existence
+   of a field without defining it. For example, we could write the previous
+   example as:
+   ```nickel
+   {a = b, b | Num} & {b = 1}
+   ```
+   This is also a better practice to explicitly state the fields whose presence
+   is assumed in general.
+
+This RFC proposes to adopt **lexical scoping**. We could have an even lighter
+syntax, such as `{a = b, b} & {b = 1}` for requiring the presence of a field
+`b`.
+
+### Priorities
+
+#### Levels
+
+This RFC proposes to adopt the following ordered set of priorities:
+
+- `default` is the bottom element
+- Integers priorities in the middle
+- `force` is the top element
+
+That is, `Priorities := default \/ {n | n integer} \/ top` with `default <= ...
+<= -1 <= 0 <= 1 <= ... <= force`.
+
+If not specified, the normal priority (the default priority, no to be confused
+with the `default` priority) is `0`. This provides an infinite supply of
+priorities both below (`default \/ {n | n < 0}`) as well as above (`force \/ {n
+| n > 0}`).
+
+Integer priorities are specified using the `priority` keyword. Defining more
+than one priority in the same meta-value is an error.
+
+Example:
+```nickel
+{
+  foo | Num
+      | default = 1,
+
+  bar | Str,
+  //equivalent to `bar | Str | priority 0`
+
+  baz.boo.bor | priority -4 = "value",
+
+  final | force = `CantOverrideMe,
+}
+```
+
+#### Recursive priorities
+
+As noted in [#240](https://github.com/tweag/nickel/issues/240), configurations
+should be easily overridable, and the approach outlined until now can end up
+annoyingly requiring configurations to be written with either `default` or
+`force` everywhere.
+
+This RFC proposes to add *recursive* (or "leafy", or "push down") priorities, as
+described in [#279](https://github.com/tweag/nickel/issues/279). We define the
+new meta-values `default rec` and `force rec`, whose semantics are defined as:
+
+- `eval(expr | default rec)`: case of `eval(expr)`:
+  * `{field1 = value1, .., fieldn = valuen} | annots`: `{field1 = (value1 | default rec aux), .., fieldn =
+      (valuen | default rec aux)} | annots`
+  * `v | annots` if `v` is not a record: `v | defaulted(annots)` where
+      `defaulted(annots)` is defined below.
+
+- `defaulted(annots)`:
+  * if `annots` contains the priority metavalue `force`, then `defaulted(annots)
+    := annots`
+  * otherwise, let `annots' | prio` be the decomposition of `annots` into a
+    priority `prio` (possibly empty) and the other metavalues, then
+    `defaulted(annots) := annots' | default`
+
+That is, `default rec` recursively overwrites all the priorities of the leafs of
+a record to `default`, excepted for `force` that is left untouched (metavalues
+are written in a liberal way, in that they can be empty).
+
+`force rec` is defined similarly, excepted that it erases all priorites, even
+default ones. The names `default rec/force rec` are just suggestions.
+
+Example:
+```nickel
+let neutralConf = {
+  foo = 1,
+  bar.baz = "stuff",
+  bar.blorg = false,
+}
+
+let defaulted | default rec = neutralConf
+// ^ Will evaluate to:
+// {
+//   foo | default = 1,
+//   bar = {
+//     baz | default = "stuff",
+//     bar.blorg | default = false,
+//   },
+// }
+// This is different from `neutralConf | default`! The latter version
+// would be overrided at once, as illustrated below.
+
+defaulted & {bar.baz = "shapoinkl"}
+// ^ Gives the expected:
+// {
+//   foo | default = 1,
+//   bar = {
+//     baz = "shapoinkl";
+//     bar.blor | default = false,
+//   },
+// }
+// While
+
+(neutralConf | default) & {bar.baz = "shapoinkl"}
+// ^ This gives only:
+// {bar.baz = "shapoinkl"}
+```
+
+This way, an existing definition (arbitrarily complex: that could be the root of
+all Nixpkgs) can easily (and lazily) be turned into a full overriding or
+overridable configuration. A possible extension is to have a user-provided
+function that is mapped on priotities: `default/force rec` are just special
+cases of this.
+
+### Custom merge functions
+
+In the case of the NixOS module system, all the configuration is merged in one
+final phase that conceptually evaluates the merge AST.
+
+In Nickel, merging can be done partially and observed by expressions. The
+natural semantics of custom merge functions should be to affect all subsequent
+merging, as for contracts application. Indeed, we want to specify a contract or
+a custom merge function once and for all, not to repeat it at every field
+definition:
+
+```nickel
+let add = fun args => args.lower + args.higher in
+// {a = 3}
+{a | merge add = 1} & {a = 1} & {a = 1}
+```
+
+But what about `{a = 1} & {a | merge add = 1} & {a = 2}`? If a merge annotation
+only affects terms on the right hand side, this breaks commutativity.
+
+```
+let r1 = {a = 1} in
+let r2 = {a = 1} in
+let r3 = {a | merge add = 1} in
+
+// {val = 2}
+r1 & r2 & r3
+
+// {val = 3}
+r3 & r1 & r2
+```
+
+Possible solutions:
+
+1. Make a `n-ary` merging behave differently than repeated merging, which would
+   apply the custom merge function to all arguments. This doesn't solve the
+   commutativity problem per se: we also have to decide that in ordinary
+   merging, custom function wouldn't affect subsequent merges anymore. This does
+   raise yet other problems, one being of a good distinctive syntax. More
+   generally, it sounds confusing and ad-hoc.
+
+2. What seems the right, but not trivial solution: symmetrize custom merging and
+   make it "distribute backward" as well. That is (simplifying by forgetting
+   about priority and `lower`/`higher` arguments):
+
+   ```nickel
+   val1 & val2 & (val3 | merge func)
+   <=> (val1 | merge func) & val2 & val3
+   <=> func (func val1 val2) val3
+   // instead of the naive
+   <=/=> func (val1 & val2) val3
+   ```
+
+This RFC proposes solution 2. This raises some difficulties:
+
+- What if `val1 & val2` is in a thunk that happens to be forced in between?
+  What should `let x = val1 & val2 in x & (val3 | merge func)` evaluate to, or
+  `let x = val1 & val2 in builtins.seq x (x & (val3 | merge func))`?
+
+  In some sense, we would like to have a call-by-name semantics rather than a
+  call-by-need.
+
+  * (a) In the case of record fields, overriding has to solve the
+    same problem of remembering an original expression along an evaluated version.
+    If we allow custom merge annotations only on record fields, it is possible to
+    recover the original merge AST (abstract syntax tree) and interpret it with
+    the custom merge function as wanted. This may prove a bit tricky to implement
+    in practice, for the merge expression `val1 & val2` may be obfuscated by some
+    program transformations or evaluation artifacts, but doable.
+
+  * (b) A more extreme take is to generalize this to any term: a
+    merge expression would behave like a lazy datatype `Merge(t1,t2)` with
+    respect to evaluation. Evaluating it would amount to automatically apply
+    `eval : Merge(t1,t2) -> t1 & t2` with a potential merge function, which
+    wouldn't erase the original top-level thunk `Merge(t1,t2)`.  This is in fact
+    how default values are currently handled. Doing so, we don't need to
+    restrict custom merge functions to record fields. This incurs an additional
+    cost though for merging (remembering all the original ASTs of merge
+    expressions), even when one doesn't actually use custom merge functions.
+
+We propose to implement directly the general solution of (b).
+
+## Operational semantics
+
+This section defines the operational semantics of merging with the choices
+previously described.
+
+The choices made in [custom merge function section](#custom-merge-functions-2),
+in particular symmetrizing merge with respect to custom merge function
+annotations, require the evaluation process to:
+
+1. Determine the abstract syntax tree (AST) of a merge expression
+2. Decide the merge function to use
+3. Interpret the tree with the given function
+
+Let us define the notion of a *merge tree*. The definition determines what is
+the area of influence of a merge function annotation, answering the following
+questions:
+
+```
+let add = fun x y => x + y in
+
+let var = 1 & 1 in
+var & (1 | merge add) // result?
+(var | merge add) & 1 // result?
+
+((1 & 1) + (1 & 1)) & (1 | merge add) // result?
+((1 & 1) + (1 & 1) | merge add) & 1 // result?
+
+// file: somefile.ncl
+1 & 1
+// file: other.ncl
+(import "somefile") & (1 | merge add) // result?
+(import "somefile" | merge add) & 1 // result?
+```
+
+In the following, we will write "meta"-code (think of the code of the Nickel
+interpreter) in ML pseudo-code. To distinguish this meta-code from Nickel
+expressions, we use the quote syntax `[| exp |]` to denote a Nickel expression.
+For example, `hasAnnot [| f (1 + 1) |] = false` means that the value of the
+`hasAnnot` (meta-)function on the Nickel expression `f (1 + 1)` is `false`.
+
+### Merge tree
+
+A merge tree is a binary tree whose nodes are labelled by a `metadata`.  Leafs
+are labelled with both a `metadata` and a Nickel expression `e` in WHNF (weak
+head normal form, that is, evaluated). In this context, a `metadata` is a
+(meta-)record which field names corresponds to meta attributes
+`custom,priority,default,contract` and so on. Fields are optional, excepted
+`priority`, which must always be defined.
+
+```haskell
+data Metadata = Metadata {
+  priority :: Priority,
+  merge :: Option Priority,
+  contracts :: Option (List Contract),
+  // ...
+}
+
+data AbsMergeTree a =
+  Merge a a
+  | Exp Expression
+
+type MergeTree = MergeTree (AbsMergeTree (MergeTree, Metadata))
+```
+
+The function `mergeTree: Expression -> MergeTree` compute the merge tree of an
+expression. It needs to evaluate leafs to see if they contains themselves merge
+expressions, such that `let x = a & b in x & c` and `a & b & c` has the same
+merge tree: that is, merge treee commute with evaluation.
+
+```
+// we maintain an environment of bindings in `env`
+
+metaData [| e | attr = val, ... |] ::=
+  { attr = val, ... }                    if priority is set
+  { priority = normal, attr = val, ...}  otherwise
+
+mergeTree e ::= (absMergeTree e, metaData e)
+
+absMergeTree [| e1 & e2 |] = Merge(mergeTree(e1), mergeTree(e2))
+
+// whnf = Weak head normal form, result of evaluation
+absMergeTree [| whnf |] @ e = Exp(e)
+
+// Should mergeTree cross import boundaries? Probably not
+absMergeTree [| import path |] @ e = Exp(e)
+
+// All other cases
+absMergeTree e = weakEval e
+
+// weakEval is defined exactly as standard evaluation, excepted that it stops at
+// merge expressions, as if they were a lazy datatype in weak head normal form
+
+weakEval [| e1 & e2 |] @ e = e
+
+// all other cases are defined exactly as for eval
+weakEval e = ... 
+```
+
+### Semantics
+
+Now that we defined merge trees, we need to extract a potential merge fonction
+from it:
+
+```
+type MergeFunction =
+  {value: Dyn, priority: Priority} ->
+  {value: Dyn, priority: Priority} -> Dyn
+
+extractMergeFuns : MergeTree -> List MergeFunction
+
+extractMergeFuns (Merge(ast1,_),Merge(ast2,_)) = extractMergeFuns ast1 @ extractMergeFuns ast2
+extractMergeFuns (Leaf(e,meta)) = [f] if meta.merge == Some([| f |])
+                                  []  otherwise
+
+mergeFunction : MergeTree -> Result MergeFunction ()
+mergeFunction t = let funs = extractMergeFuns t in
+  if lists.length t == 0 then
+    Ok(__builtinMerge) // the standard `&` merge function
+  else if lists.length t == 1 then
+    Ok(head t)
+  else
+    Err(())
+```
+
+And the interpretation of a merge tree by a merge function:
+
+```
+// can be extended, as long as it is a partially ordered set
+Priority = Number | -inf | +inf | ...
+
+interpret (Merge(ast_1,meta_1),Merge(ast_2,meta_2)) f =
+    f {value = interpret ast_i f, prio = meta_i.priority}
+      {value = interpret ast_j f, prio = meta_j.priority}
+    where
+      i,j s.t meta_i.priority <= meta_j.priority
+```
+
+We can finally define the evaluation of merge:
+
+```
+eval [| e1 & e2 |] =
+  let t = mergeTree [| e1 & e2 |] in
+  interpret t (mergeFunction t)
+```
+
+Please keep in mind that `mergeTree (e1 & e2)` refers to the **original** merge
+tree of this expression. It needs to be accessible even after evaluation by the
+implementation.
+
+In practice, we can't update a thunk that contains `e1 & e2` with the result of
+the evaluation. This is already the case with `default` values currenlty
+(`default (1 + 1)` isn't updated to `2`, but to `default 2`, otherwise the
+semantics would change). A good view on this is that the semantics is inherently
+call-by-name, and that any caching mechanism (including call-by-need) is a
+practical semantics-preserving optimization.
--- a/rfcs/002-merge-types-terms-syntax.md
+++ b/rfcs/002-merge-types-terms-syntax.md
@ -0,0 +1,83 @@
+---
+feature: merge types and terms syntax
+start-date: 2022-01-12
+author: Yann Hamdaoui
+---
+
+# Merge types and terms syntax
+
+The present RFC aims to make the experience of writing Nickel -- and in
+particular writing contracts -- more streamlined by using a common syntax for
+terms and types.
+
+In a standard statically typed language, terms and types are usually two
+distinct syntactic categories. Most of the constructs on one side indeed don't
+make sense on the other side: what could mean the type `1 + 1`, or the term
+`forall a. a -> a`? Separating the two syntactically also allows for a better
+mental separation for the user as well. Finally, separating the two gives more
+room in the grammar of the language to reuse syntax in a different way. For
+example in Nickel we are using the record syntax for both record literals et
+record types, and we are using the pipe operator `|` both for metavalues (terms)
+and row tails (types).
+
+However, Nickel is different from the standard typed functional language.
+Because of the duo static typing/contracts, we do have to:
+
+- Give a static meaning to terms. This is the `#` operator, lifting a contracts
+  (a term) to a type. Currently, this type is quite rigid: this is an opaque
+  type.
+- Give a dynamic meaning to types. Because each type annotation gives rise
+  to a contract, we have to derive a contract from a type. There is currently no
+  first-class operator to do that in Nickel, but the interpreter does it
+  internally in the `open_contract()` function. Throughout this document, we
+  will use the `§` operator to denote this operation, pretending it exists in
+  the language.
+
+We could precisely give a meaning to a type `1 + 1` (`#(1 + 1)`) and to a term
+`forall a. a -> a` (`§(forall a. a -> a)`). Why we propose to do so is explained
+in the next section.
+
+## Motivation
+
+The original motivation is that, when writing contracts, we may want to mix both
+user-defined contracts as well as contracts derived from builtin types. A very
+simple example is the following nullable contract, that takes another contract
+as a parameter:
+
+```nickel
+let Nullable = fun contract label value =>
+  if value == null then null
+  else contracts.apply contract label value
+in
+...
+```
+
+Now, say we want to encode a nullable list:
+
+```nickel
+{
+  list_maybe | #(Nullable §(List Num))
+}
+```
+
+This is already not looking great, but bearable. However, mixing types and
+contract even further is not improbable. For example, if we want to express that
+a field is either `null` or a list of ports:
+
+```nickel
+{
+  list_ports_maybe | #(Nullabe §(List #Port))
+}
+```
+
+This starts to be really verbose and hard to parse. It's not only aesthetic:
+when peer-programming Nickel with other Nickel beginners, they happened to be
+confused as why sometimes we need a hash and sometimes not, why we used `List`
+and sometimes `list` (the equivalent of `§` list in this document).
+
+What's more, the type application syntax (`List Num`) actually coincides to the
+standard application within contracts. That is, `§(List Num) ~ §List §Num`. It
+would thus make sense to define type application for user-defined contracts by
+`#Foo Type ~ #(Foo §Type)`, and to be able to write directly `#Nullable Num` or
+`#Nullable #Port` instead of `#(Nullable §Num)` and `#(Nullable Port)`
+respectively.