1
1
mirror of https://github.com/tweag/nickel.git synced 2024-10-04 23:27:15 +03:00
This commit is contained in:
Yann Hamdaoui 2022-01-12 15:59:48 +01:00
parent c10f74c285
commit 61cd48090c
2 changed files with 1043 additions and 0 deletions

960
rfcs/001-overriding.md Normal file
View File

@ -0,0 +1,960 @@
---
feature: overriding
start-date: 2021-03-18
author: Yann Hamdaoui
---
# Overriding
This document is a proposal for an overriding mechanism in Nickel. It is
expected to evolve while we put these ideas in practice, but shall serve as
a design and implementation baseline.
Related issues: [#103](https://github.com/tweag/nickel/issues/103),
[#240](https://github.com/tweag/nickel/issues/240),
[#255](https://github.com/tweag/nickel/issues/255),
[#279](https://github.com/tweag/nickel/issues/279).
## Context
### Definition of the problem
*Overriding* is the ability to take an existing configuration - in our case, a
record - and update a specific part of it, such as a field or the field of a
nested record.
A naive and unergonomic of way of doing it is to repeat all the original
values in a new record:
```
let record = {
a = 1,
b = "str",
c = false
} in
{
a = record.a,
b = record.b,
c = true
}
```
This quickly becomes unmanageable for large records, which are common in Nix and
Nixpkgs. In functional languages, this is usually better done using what's
called functional update. Functional update is an operation that takes the
original record, one or several fields to update, the corresponding new value,
and returns a new updated record. It has the same semantics as our first
snippet, but doesn't require to rewrite all unchanged fields.
It can have a builtin syntax, such as OCaml's `with`: `{record with field =
new_value}`, Haskell's `record {field = newValue}`, Nix `//` operator `record //
{field = newValue}`, or Rust's syntax `RecordDataType {field: new_value,
..record}`. There are more advanced programming techniques that make updating
deeply nested records ergonomic such as
[Lenses](https://www.fpcomplete.com/haskell/tutorial/lens/) in Haskell, but
these rely too heavily on advanced language and typing features to be practical
in Nickel.
In Nickel, we can already do a functional update using merging, although the
updated field must have been marked as `default`:
```nickel
let record = {
a = 1,
b = "str",
c | default = false,
} in
record & {c = true}
```
As explained in the next section though, this is not satisfying.
### Overriding recursive records
Nickel's records are different from the ones of OCaml, Haskell or Rust. They
are lazy and recursive by default. They are thus better understood as
[codata](https://link.springer.com/chapter/10.1007/978-3-030-17184-1_5)
rather than data. Take the following example:
```nickel
let record = {
protocol | default = `Http,
port | default = switch protocol {
`Http => 80,
`Ftp => 21,
_ => 8181,
},
} in
record & { protocol = `Ftp }
```
Here, our final record will be ``{protocol = `Ftp, port = 80}``. `port` is an
expression depending on `protocol`, but it is not updated in the new version,
which is something one could intuitively expect from a recursive record. Such
recursive update is a pattern actively used in Nix, and is probably a common
scenario for configurations in general: if I patch a base configuration by
changing an option, I usually want other values depending on this option to be
updated as well.
## Comparing override mechanisms
We are going to review various overriding mechanisms of Nix and other related
languages. Let us sketch some general traits of overriding mechanisms as a
framework for comparison:
- **(ERG)**: Ergonomy. A mechanism is ergonomic if it avoids
complex encodings for overridable records, if it doesn't require extra library
functions, and so on. It would ideally work out of the box with the native
records of the language.
- **(NEST)**: Nested field overriding. The ease of overriding nested fields,
such as the value of `bar` in `{foo = {bar = "baz"}}`. Ideally, overriding
nested fields is no harder than top-level fields.
- **(COMP)**: Composability. Overriding is composable when one can seamlessly
apply several overrides to an initial record.
- **(EXPR)**: Expressivity. An expressive mechanism is one that allow to express
and do more things. For example, it can be the ability to drop some fields, to
access the previous record's version of the data (`super`), and so on.
## Overriding in Nix
Nix expressions don't have a built-in way of overriding recursive records while
automatically updating the dependent recursive fields either. Since the feature
is still actively needed, it is provided by mechanisms implemented in library
code.
Keeping a language lean is usually a good design guideline: to provide an
expressive yet simple set of features upon which others can be built as
libraries. However, in the case of Nix, overriding is a fundamental pattern, and
having it implemented in user code leads to some general well-known issues:
- ~~**(ERG)**~~ Definition, update or field access is not as ergonomic as with
vanilla native records.
- Several competing mechanisms have been implemented in slightly different
contexts. Overriding in general is already hard for beginners, but having to
learn several ones, all similar but still different, is even harder.
- User-land implementations don't have a first-class access to information such
as source location and metavalues. In a dynamic language, this can make good
error reporting much harder.
- It is potentially harder to make user land implementations efficient.
See [this gist](https://gist.github.com/edolstra/29ce9d8ea399b703a7023073b0dbc00d)
for more details. We continue with an overview of existing mechanisms in Nix and
related languages.
### Nixpkgs overriding
The basic idea is to represent a recursive record explicitly as a function from
a `self` record to the final record (here, in Nix syntax):
```nix
r = rec {
a = 1;
b = a + 1;
}
# r is represented as:
rRepr = self: {
a = 1;
b = self.a + 1;
}
```
`self` is a self-reference, akin to `this` in object oriented languages. It is
computed as a [fixpoint](https://en.wikipedia.org/wiki/Fixed-point_combinator),
simply realized by auto-application in Nix, thanks to laziness:
```nix
r = rRepr r
```
The explicit dependency to `self` gives the ability to provide a different
definition for `self.a` in the final value. Overriding is achieved exactly by
functionally updating the `a` field of the `self` parameter before passing it to
the original representation:
```nix
let extension = {a = 2;}; in
# The fixpoint of result is { a = 2; b = 3; }
resultRepr = self: (rRepr (self // extension)) // extension
```
The second outer update ensures that the final result is also set to `a = 2`,
and not only the `a` appearing in `b`.
Some details are left out, but this is the gist of it. See also [the Nix pill on
overriding](https://nixos.org/guides/nix-pills/override-design-pattern.html) or
[this article on fixpoints in Nix](http://r6.ca/blog/20140422T142911Z.html).
#### Limits
- ~~**(ERG)**~~ Use a specific representation, rather than handling good old plain records
(although the actual representation in Nixpkgs is more ergonomic than a plain
function).
- ~~**(NEST)**~~ Overriding nested attribute sets is painful. If one do the naive thing,
the whole subrecord is erased:
```nix
let rRepr = self: {
a = {b = self.a.c;};
}; in
let extension = {a = {c = 2;};}; in
rExt = let fixpoint = rRepr (fixpoint // extension); in
fixpoint // extension
# Gives {a = {c = 2;};} instead of expected {a = {b = 2; c = 2;};}
```
### Nixpkgs overlays
[Overlays](https://nixos.wiki/wiki/Overlays) can be seen as a sequence of
transformations from a base record, each layer having access to a `super`
reference to the previous layer and the `self` reference to the final value.
Take two consecutive transformations:
1. Set the `a` field to `1`
2. Set the `b` field to `1`, add `1` to the `a` field
```nix
let baseRepr = self: {c = self.a + self.b;}; in
let overlay1 = self: super: {a = 1;}; in
let overlay2 = self: super: {b = 1; a = super.a + 1;}; in
let applyOverlays = self:
let base = baseRepr self; in
let first = base // overlay1 self base; in
let second = first // overlay2 self first; in
second; in
let fixpoint = applyOverlays fixpoint; in fixpoint
```
In practice, the `super // ..` and fixpoints parts can be factorised in
dedicated helper functions.
#### Advantages
- **(EXPR)**: The explicit representation of layers by `super`
and `self` gives a large control to the user.
- **(COMP)**: Composition is first-class.
#### Limits
- ~~**(ERG)**~~ Users still need to manipulate representations.
- **order-dependency**: The result is order-dependent. Applying both `self:
super: {a = super.a + 1}` and `self: super: {a = super.a / 2}` can give two
different results depending on which one is the first layer. This also means
that overrides must be grouped by layers, which is not necessarily the most
logical structure.
- ~~**(NEST)**~~ Overriding nested fields is still clumsy. For example, to
override `lib.firefoxVersion`:
```nix
self: super: { lib = (super.lib or {}) // { firefoxVersion = ...; }; }
```
### NixOs module system
The NixOS module system takes a different approach. There, all the basic blocks
— the modules — are merged together in an unspecified order. What's deciding the
priority of one option over the other are attributes that are explicitly stated
in the modules themselves. The declaration of options (type, priority, etc.) and
their assignment must be made separately.
Compared to overlays, the explicit reference to `super` has disappeared. This
makes it look like modules can only override one value with another, instead of
combining different pieces of data together.
This apparent shortcoming is solved by custom merge functions, that can redefine
how to combine different values of the field of a configuration. By default,
when merging two list values, the module system only knows how to replace one
value with the other, because there's no canonical and commutative way of
merging two lists. However, the user can specify that these lists should in
fact be concatenated, resulting in the possibility of defining a list of paths
by pieces:
```nix
# Some module
{
paths = mkOption {
type = types.listOf types.path;
};
}
# Config
[..]
someModule.paths = [foo/bar];
# Other config
[..]
someModule.paths = [/bar/baz];
# Resulting config: [foo/bar /bar/baz]
```
#### Advantages
- **order-independence**: as opposed to overlays, modules can be merged in any
order, and the final result will be the same.
#### Limits
- ~~**(EXPR)**~~: Despite custom merge functions, this model is less powerful
than overlays which have access to an explicit reference to `super`.
Modules, on the other hand, are all merged at the same level in an
unspecified order.
For example, the following has no equivalent in a merging model:
```nix
let overlay = self: super: {b = self.c + 1; a = super.c + 1;}; in #...
```
However, modules are quite capable in practice. Removing fields and being
order-dependent as in the overlay example above is not necessarily a good idea,
and this limitation of expressivity may in fact be a good thing.
## Overriding elsewhere
### CUE
CUE allows a form of late-binding for recursive attributes:
```
$cat test.cue
fields: {
a: int
b: a + 1
} & {
a: 2
}
$cue eval test.cue
fields: {
a: 2
b: 3
}
```
Combined with [default values](https://cuelang.org/docs/tutorials/tour/types/defaults/),
this provides an overriding mechanism:
```
$cat test2.cue
fields: {
a: int | *1
b: a + 1
} & {
a: 2
}
$cue eval test2.cue
fields: {
a: 2
b: 3
}
```
The basics of this overriding mechanism are close in spirit to what we want to
achieve in this proposal. This is similar to Nickel's current `default`
behavior, but with the desired late-bound merging. The expressivity is however
too limited to replace e.g. the NixOS module system.
#### Advantages
- **(ERG)**: Overriding works seamlessly with native records. Note however that
it requires fields to be explicitly marked as overridable (default).
#### Limits
- ~~**(EXP)**~~: One can only replace a value by another one. It is not
possible to add one to the previous value, for example.
- ~~**(COMP)**~~: Overriding a record a second time is not possible.
### Jsonnet
Jsonnet refers to its overriding mechanism as *inheritance*, implemented by the
`+` operator on objects:
```jsonnet
local obj = {
name: "Alice",
greeting: "Hello, " + self.name,
};
[
obj,
obj + { name: "Bob" },
obj + { greeting: super.greeting + "!"},
obj + { name: "Bob", greeting: super.greeting + "!"},
]
```
This is similar to the Nix operator `//`, but doing recursive overriding in the
expected way out of the box. The extension can access the previous version in
the same way as Nixpkgs overlays, using the `super` keyword.
#### Advantages
- **(ERG)**: Integrated with native records.
- **(COMP)**: Overriding is easily iterated.
- **(EXP)**: Achieve the same level of expressivity than overlays with the
special keywords `self` and `super`.
#### Limits
- **order-dependency**: As with overlays, the order of application of overrides
matters.
## Taking a step back
While not exactly the same, all these overriding mechanisms are based on the
same underlying principles:
1. Represent recursive records (explicitly or implicitly) as a function of
`self` (and `super` in some cases)
2. Compute the combination as a **fixpoint** of several recursive records,
giving back a standard record.
As hinted by Jsonnet's terminology, the semantics of recursive records
strikingly resemble the semantics of objects and classes in OOP. Replace
records with objects, fields with methods and overriding with inheritance. This
is not so surprising: there's actually an history of encoding objects in
functional languages as recursive records (see the introduction of
[The Recursive Record Semantics of Objects Revisited](https://hal.inria.fr/inria-00072423)
for a good overview) going back to 1988
\[[Cardelli](http://lucacardelli.name/papers/inheritance.pdf)\].
This is also mentioned in the
[README](https://github.com/MuKnIO/nixpkgs/blob/devel/lib/pop.md#some-historical-context)
of POP (an object system in Nix), where the author observes that overriding
mechanisms in Nix (and Jsonnet for that matter) are a simplified lazy object
system (simplified because objects lack proper state and there is no distinction
between classes and instances). Their logical conclusion is to embrace this fact
and design a proper object system helped by existing literature, rather than
reinventing the wheel. Similarly, Nix [overlays](https://nixos.wiki/wiki/Overlays)
can be seen as a single inheritance mechanism.
Inheritance-based overriding imposes an order on the overrides. A single level
inheritance is usually fine, but a complex hierarchy can become hard to maintain
and to reason about.
The NixOS module system is designed differently. It is based on merging: the
configuration is created by combining a set of unordered records following
specific rules. Of course, there's still a need for ordering information
somewhere, but it is rather expressed as priorities. This system has the
advantage of making merge commutative (in contrast with inheritance or the `//`
operator), as in CUE, and to untie data definition from precedence
specification: one can define a module where each field has a different
priority, if it makes sense to group them logically. With inheritance, values
are required to be grouped in layers of the same priority, instead of logically.
Summing up the differences between inheritance-based mechanisms and merge-based
mechanism:
**order-dependency**
Inheritance is order dependent: the chain of extensions must be defined with the
precise order they will be applied in mind, and definitions must be grouped in
consequence. On the other hand, merging is commutative, and the precedence
information is encoded as priorities. Thus, overriding by merging can be defined
using stand-alone pieces of data, although the behavior of priorities is not
local.
**(EXP)**
Merged records have only access to the final computed fixpoint `self`, while
objects have access to the previous stage of extension via `super`. However, as
in the NixOS module system, it is possible to address this issue using custom
merge functions. This is a bit less expressive, but in a good way: it forces the
merge strategy to be uniform along each field, while mechanisms like overlays or
inheritance can do pretty much anything.
## Proposal overview
We basically propose to adopt the same kind of merge, priority, and custom merge
function-based scheme as the NixOS module system. One big difference with the
NixOS module system is that it would be built-in in the language, being usable
for any native record whatsoever, and having good interaction with the rest of
the language features. The implementation would have first class access to
locations, to the AST, the memory layout, the metadata, and so on.
We first describe the ideas at a high-level. Then, we review the issue and
challenges of this approach, and propose concrete solutions to overcome or
mitigate them. Finally, the precise operational semantics is laid out in a
dedicated section.
### Recursive records & merging
**This section defines the semantics, rather than an actual efficient
implementation**.
As before, it is useful to see recursive records represented as functions - or
constructors - that take a `self` parameter and return a final non recursive record,
in the same way as the original overriding mechanism of Nixpkgs (the non-existing
syntax `def := value` is used to insist on the fact that we are defining new objects):
```
r = {
a = 1;
b = a + 1;
}
// Definition of the representation of r
repr(r) := fun self => {
a = 1;
b = self.a + 1;
}
```
Field access amounts to compute a fix-point:
```
repr(r).foo := let fix = repr(r) fix in fix.foo
```
Merging simply merges the underlying representations:
```
repr(r1) & repr(r2) := fun self => r1 self & r2 self
```
### Priorities
Merge is fundamentally commutative. The problem is, not all fields should be
treated the same: some are default values that ought to be overridden, some are
high-priority values that ought to override. The solution is to use a priority
system, encoding the precedence of each value while retaining commutativity
(only the priority annotation counts, not the order in which values are merged).
Priorities have drawbacks. One is non-locality: the final result depends on
other priorities of the field's values you are being merged with, potentially
defined elsewhere. However, this can be mitigated by good defaults and an
adapted set of priorities, such as a bottom `default`, a top `force`, and a
infinite range of integers priorities in between, the default priority (when no
priority is provided) being `0`. Doing so, it's easy to just use `default` or
`force` to override or be overridden by anything, without knowing the precise
integer priority. Integer priorities still gives freedom with an infinite supply
of levels if required.
### Custom merge functions
Custom merge functions would be specified by a `merge` metavalue attribute. In
order to enforce commutativity, they would receive their arguments in an
indistinguishable order, such as having the type `{lower: Dyn, higher: Dyn,
priority: <Different, Equal>} -> Dyn`. If both have the same priority, the order
is not specified, and may even be randomized by the interpreter. The `priority`
field indicates when it is the case, if this case needs special handling.
```nickel
let mergeLists :
forall a. {lower: List a, higher: List a, priority: <Different, Equal>} -> List a
= fun args => args.lower @ args.higher in
let Contract = {
path | List Str
| merge mergeLists
| doc "A list of paths to search in."
} in
let block1 | #Contract = {
path = ["/usr/local/bin"]
} in
let block2 = {
path = ["/bin"]
} in
// { path = ["usr/local/bin", "/bin"] }
block1 & block2
```
## Issues & challenges
The proposal raises some questions. While none seems insuperable, users will
need to have a good mental picture of the system, and we should be careful to
avoid ending up with a complicated system full of ad-hoc fixes to edge cases.
### Implementation
Recomputing the fixpoint at each field access is wasteful, because recursive
records as functions satisfy the following property: as long as it is not
merged, the `self` argument is constant. Thus, we can memoize the fixpoint and
only invalidate `self` on a merge. Also, fields that do not depend on a
recursive variable can be hoisted out of the function, using a representation
like:
```
repr(r) := {
a = 1;
b = fun self => self.a + 1;
}
```
Concretely, each field of a record may either be a thunk as usual, in the case
of non recursive expressions like `a`, or a thunk together with the original
expression in the recursive case. All record operations except merge operate
on the thunk. A merge operation, on the other hand, restores the original
expression again.
There's more potential optimizations, but this first step should be a reasonable
trade-off between implementation complexity and performance. See
[#103](https://github.com/tweag/nickel/issues/103) for more details.
### Scoping
Should a record be able to access a yet undefined field because it is expected
to be provided by a subsequent merge? Two possible approaches:
- **dynamic scoping**: records can reference fields that are not explicitly
defined locally, such as:
```nickel
{a = b} & {b = 1}
```
Dynamic scoping have a number of issues, and is usually considered bad
practice.
- **lexical scoping**: as currently, require self-referenced fields to be
defined locally. Note that thanks to contracts, one can require the existence
of a field without defining it. For example, we could write the previous
example as:
```nickel
{a = b, b | Num} & {b = 1}
```
This is also a better practice to explicitly state the fields whose presence
is assumed in general.
This RFC proposes to adopt **lexical scoping**. We could have an even lighter
syntax, such as `{a = b, b} & {b = 1}` for requiring the presence of a field
`b`.
### Priorities
#### Levels
This RFC proposes to adopt the following ordered set of priorities:
- `default` is the bottom element
- Integers priorities in the middle
- `force` is the top element
That is, `Priorities := default \/ {n | n integer} \/ top` with `default <= ...
<= -1 <= 0 <= 1 <= ... <= force`.
If not specified, the normal priority (the default priority, no to be confused
with the `default` priority) is `0`. This provides an infinite supply of
priorities both below (`default \/ {n | n < 0}`) as well as above (`force \/ {n
| n > 0}`).
Integer priorities are specified using the `priority` keyword. Defining more
than one priority in the same meta-value is an error.
Example:
```nickel
{
foo | Num
| default = 1,
bar | Str,
//equivalent to `bar | Str | priority 0`
baz.boo.bor | priority -4 = "value",
final | force = `CantOverrideMe,
}
```
#### Recursive priorities
As noted in [#240](https://github.com/tweag/nickel/issues/240), configurations
should be easily overridable, and the approach outlined until now can end up
annoyingly requiring configurations to be written with either `default` or
`force` everywhere.
This RFC proposes to add *recursive* (or "leafy", or "push down") priorities, as
described in [#279](https://github.com/tweag/nickel/issues/279). We define the
new meta-values `default rec` and `force rec`, whose semantics are defined as:
- `eval(expr | default rec)`: case of `eval(expr)`:
* `{field1 = value1, .., fieldn = valuen} | annots`: `{field1 = (value1 | default rec aux), .., fieldn =
(valuen | default rec aux)} | annots`
* `v | annots` if `v` is not a record: `v | defaulted(annots)` where
`defaulted(annots)` is defined below.
- `defaulted(annots)`:
* if `annots` contains the priority metavalue `force`, then `defaulted(annots)
:= annots`
* otherwise, let `annots' | prio` be the decomposition of `annots` into a
priority `prio` (possibly empty) and the other metavalues, then
`defaulted(annots) := annots' | default`
That is, `default rec` recursively overwrites all the priorities of the leafs of
a record to `default`, excepted for `force` that is left untouched (metavalues
are written in a liberal way, in that they can be empty).
`force rec` is defined similarly, excepted that it erases all priorites, even
default ones. The names `default rec/force rec` are just suggestions.
Example:
```nickel
let neutralConf = {
foo = 1,
bar.baz = "stuff",
bar.blorg = false,
}
let defaulted | default rec = neutralConf
// ^ Will evaluate to:
// {
// foo | default = 1,
// bar = {
// baz | default = "stuff",
// bar.blorg | default = false,
// },
// }
// This is different from `neutralConf | default`! The latter version
// would be overrided at once, as illustrated below.
defaulted & {bar.baz = "shapoinkl"}
// ^ Gives the expected:
// {
// foo | default = 1,
// bar = {
// baz = "shapoinkl";
// bar.blor | default = false,
// },
// }
// While
(neutralConf | default) & {bar.baz = "shapoinkl"}
// ^ This gives only:
// {bar.baz = "shapoinkl"}
```
This way, an existing definition (arbitrarily complex: that could be the root of
all Nixpkgs) can easily (and lazily) be turned into a full overriding or
overridable configuration. A possible extension is to have a user-provided
function that is mapped on priotities: `default/force rec` are just special
cases of this.
### Custom merge functions
In the case of the NixOS module system, all the configuration is merged in one
final phase that conceptually evaluates the merge AST.
In Nickel, merging can be done partially and observed by expressions. The
natural semantics of custom merge functions should be to affect all subsequent
merging, as for contracts application. Indeed, we want to specify a contract or
a custom merge function once and for all, not to repeat it at every field
definition:
```nickel
let add = fun args => args.lower + args.higher in
// {a = 3}
{a | merge add = 1} & {a = 1} & {a = 1}
```
But what about `{a = 1} & {a | merge add = 1} & {a = 2}`? If a merge annotation
only affects terms on the right hand side, this breaks commutativity.
```
let r1 = {a = 1} in
let r2 = {a = 1} in
let r3 = {a | merge add = 1} in
// {val = 2}
r1 & r2 & r3
// {val = 3}
r3 & r1 & r2
```
Possible solutions:
1. Make a `n-ary` merging behave differently than repeated merging, which would
apply the custom merge function to all arguments. This doesn't solve the
commutativity problem per se: we also have to decide that in ordinary
merging, custom function wouldn't affect subsequent merges anymore. This does
raise yet other problems, one being of a good distinctive syntax. More
generally, it sounds confusing and ad-hoc.
2. What seems the right, but not trivial solution: symmetrize custom merging and
make it "distribute backward" as well. That is (simplifying by forgetting
about priority and `lower`/`higher` arguments):
```nickel
val1 & val2 & (val3 | merge func)
<=> (val1 | merge func) & val2 & val3
<=> func (func val1 val2) val3
// instead of the naive
<=/=> func (val1 & val2) val3
```
This RFC proposes solution 2. This raises some difficulties:
- What if `val1 & val2` is in a thunk that happens to be forced in between?
What should `let x = val1 & val2 in x & (val3 | merge func)` evaluate to, or
`let x = val1 & val2 in builtins.seq x (x & (val3 | merge func))`?
In some sense, we would like to have a call-by-name semantics rather than a
call-by-need.
* (a) In the case of record fields, overriding has to solve the
same problem of remembering an original expression along an evaluated version.
If we allow custom merge annotations only on record fields, it is possible to
recover the original merge AST (abstract syntax tree) and interpret it with
the custom merge function as wanted. This may prove a bit tricky to implement
in practice, for the merge expression `val1 & val2` may be obfuscated by some
program transformations or evaluation artifacts, but doable.
* (b) A more extreme take is to generalize this to any term: a
merge expression would behave like a lazy datatype `Merge(t1,t2)` with
respect to evaluation. Evaluating it would amount to automatically apply
`eval : Merge(t1,t2) -> t1 & t2` with a potential merge function, which
wouldn't erase the original top-level thunk `Merge(t1,t2)`. This is in fact
how default values are currently handled. Doing so, we don't need to
restrict custom merge functions to record fields. This incurs an additional
cost though for merging (remembering all the original ASTs of merge
expressions), even when one doesn't actually use custom merge functions.
We propose to implement directly the general solution of (b).
## Operational semantics
This section defines the operational semantics of merging with the choices
previously described.
The choices made in [custom merge function section](#custom-merge-functions-2),
in particular symmetrizing merge with respect to custom merge function
annotations, require the evaluation process to:
1. Determine the abstract syntax tree (AST) of a merge expression
2. Decide the merge function to use
3. Interpret the tree with the given function
Let us define the notion of a *merge tree*. The definition determines what is
the area of influence of a merge function annotation, answering the following
questions:
```
let add = fun x y => x + y in
let var = 1 & 1 in
var & (1 | merge add) // result?
(var | merge add) & 1 // result?
((1 & 1) + (1 & 1)) & (1 | merge add) // result?
((1 & 1) + (1 & 1) | merge add) & 1 // result?
// file: somefile.ncl
1 & 1
// file: other.ncl
(import "somefile") & (1 | merge add) // result?
(import "somefile" | merge add) & 1 // result?
```
In the following, we will write "meta"-code (think of the code of the Nickel
interpreter) in ML pseudo-code. To distinguish this meta-code from Nickel
expressions, we use the quote syntax `[| exp |]` to denote a Nickel expression.
For example, `hasAnnot [| f (1 + 1) |] = false` means that the value of the
`hasAnnot` (meta-)function on the Nickel expression `f (1 + 1)` is `false`.
### Merge tree
A merge tree is a binary tree whose nodes are labelled by a `metadata`. Leafs
are labelled with both a `metadata` and a Nickel expression `e` in WHNF (weak
head normal form, that is, evaluated). In this context, a `metadata` is a
(meta-)record which field names corresponds to meta attributes
`custom,priority,default,contract` and so on. Fields are optional, excepted
`priority`, which must always be defined.
```haskell
data Metadata = Metadata {
priority :: Priority,
merge :: Option Priority,
contracts :: Option (List Contract),
// ...
}
data AbsMergeTree a =
Merge a a
| Exp Expression
type MergeTree = MergeTree (AbsMergeTree (MergeTree, Metadata))
```
The function `mergeTree: Expression -> MergeTree` compute the merge tree of an
expression. It needs to evaluate leafs to see if they contains themselves merge
expressions, such that `let x = a & b in x & c` and `a & b & c` has the same
merge tree: that is, merge treee commute with evaluation.
```
// we maintain an environment of bindings in `env`
metaData [| e | attr = val, ... |] ::=
{ attr = val, ... } if priority is set
{ priority = normal, attr = val, ...} otherwise
mergeTree e ::= (absMergeTree e, metaData e)
absMergeTree [| e1 & e2 |] = Merge(mergeTree(e1), mergeTree(e2))
// whnf = Weak head normal form, result of evaluation
absMergeTree [| whnf |] @ e = Exp(e)
// Should mergeTree cross import boundaries? Probably not
absMergeTree [| import path |] @ e = Exp(e)
// All other cases
absMergeTree e = weakEval e
// weakEval is defined exactly as standard evaluation, excepted that it stops at
// merge expressions, as if they were a lazy datatype in weak head normal form
weakEval [| e1 & e2 |] @ e = e
// all other cases are defined exactly as for eval
weakEval e = ...
```
### Semantics
Now that we defined merge trees, we need to extract a potential merge fonction
from it:
```
type MergeFunction =
{value: Dyn, priority: Priority} ->
{value: Dyn, priority: Priority} -> Dyn
extractMergeFuns : MergeTree -> List MergeFunction
extractMergeFuns (Merge(ast1,_),Merge(ast2,_)) = extractMergeFuns ast1 @ extractMergeFuns ast2
extractMergeFuns (Leaf(e,meta)) = [f] if meta.merge == Some([| f |])
[] otherwise
mergeFunction : MergeTree -> Result MergeFunction ()
mergeFunction t = let funs = extractMergeFuns t in
if lists.length t == 0 then
Ok(__builtinMerge) // the standard `&` merge function
else if lists.length t == 1 then
Ok(head t)
else
Err(())
```
And the interpretation of a merge tree by a merge function:
```
// can be extended, as long as it is a partially ordered set
Priority = Number | -inf | +inf | ...
interpret (Merge(ast_1,meta_1),Merge(ast_2,meta_2)) f =
f {value = interpret ast_i f, prio = meta_i.priority}
{value = interpret ast_j f, prio = meta_j.priority}
where
i,j s.t meta_i.priority <= meta_j.priority
```
We can finally define the evaluation of merge:
```
eval [| e1 & e2 |] =
let t = mergeTree [| e1 & e2 |] in
interpret t (mergeFunction t)
```
Please keep in mind that `mergeTree (e1 & e2)` refers to the **original** merge
tree of this expression. It needs to be accessible even after evaluation by the
implementation.
In practice, we can't update a thunk that contains `e1 & e2` with the result of
the evaluation. This is already the case with `default` values currenlty
(`default (1 + 1)` isn't updated to `2`, but to `default 2`, otherwise the
semantics would change). A good view on this is that the semantics is inherently
call-by-name, and that any caching mechanism (including call-by-need) is a
practical semantics-preserving optimization.

View File

@ -0,0 +1,83 @@
---
feature: merge types and terms syntax
start-date: 2022-01-12
author: Yann Hamdaoui
---
# Merge types and terms syntax
The present RFC aims to make the experience of writing Nickel -- and in
particular writing contracts -- more streamlined by using a common syntax for
terms and types.
In a standard statically typed language, terms and types are usually two
distinct syntactic categories. Most of the constructs on one side indeed don't
make sense on the other side: what could mean the type `1 + 1`, or the term
`forall a. a -> a`? Separating the two syntactically also allows for a better
mental separation for the user as well. Finally, separating the two gives more
room in the grammar of the language to reuse syntax in a different way. For
example in Nickel we are using the record syntax for both record literals et
record types, and we are using the pipe operator `|` both for metavalues (terms)
and row tails (types).
However, Nickel is different from the standard typed functional language.
Because of the duo static typing/contracts, we do have to:
- Give a static meaning to terms. This is the `#` operator, lifting a contracts
(a term) to a type. Currently, this type is quite rigid: this is an opaque
type.
- Give a dynamic meaning to types. Because each type annotation gives rise
to a contract, we have to derive a contract from a type. There is currently no
first-class operator to do that in Nickel, but the interpreter does it
internally in the `open_contract()` function. Throughout this document, we
will use the `§` operator to denote this operation, pretending it exists in
the language.
We could precisely give a meaning to a type `1 + 1` (`#(1 + 1)`) and to a term
`forall a. a -> a` (`§(forall a. a -> a)`). Why we propose to do so is explained
in the next section.
## Motivation
The original motivation is that, when writing contracts, we may want to mix both
user-defined contracts as well as contracts derived from builtin types. A very
simple example is the following nullable contract, that takes another contract
as a parameter:
```nickel
let Nullable = fun contract label value =>
if value == null then null
else contracts.apply contract label value
in
...
```
Now, say we want to encode a nullable list:
```nickel
{
list_maybe | #(Nullable §(List Num))
}
```
This is already not looking great, but bearable. However, mixing types and
contract even further is not improbable. For example, if we want to express that
a field is either `null` or a list of ports:
```nickel
{
list_ports_maybe | #(Nullabe §(List #Port))
}
```
This starts to be really verbose and hard to parse. It's not only aesthetic:
when peer-programming Nickel with other Nickel beginners, they happened to be
confused as why sometimes we need a hash and sometimes not, why we used `List`
and sometimes `list` (the equivalent of `§` list in this document).
What's more, the type application syntax (`List Num`) actually coincides to the
standard application within contracts. That is, `§(List Num) ~ §List §Num`. It
would thus make sense to define type application for user-defined contracts by
`#Foo Type ~ #(Foo §Type)`, and to be able to write directly `#Nullable Num` or
`#Nullable #Port` instead of `#(Nullable §Num)` and `#(Nullable Port)`
respectively.