mirror of
https://github.com/tweag/nickel.git
synced 2024-10-04 23:27:15 +03:00
WIP
This commit is contained in:
parent
c10f74c285
commit
61cd48090c
960
rfcs/001-overriding.md
Normal file
960
rfcs/001-overriding.md
Normal file
@ -0,0 +1,960 @@
|
||||
---
|
||||
feature: overriding
|
||||
start-date: 2021-03-18
|
||||
author: Yann Hamdaoui
|
||||
---
|
||||
|
||||
# Overriding
|
||||
|
||||
This document is a proposal for an overriding mechanism in Nickel. It is
|
||||
expected to evolve while we put these ideas in practice, but shall serve as
|
||||
a design and implementation baseline.
|
||||
|
||||
Related issues: [#103](https://github.com/tweag/nickel/issues/103),
|
||||
[#240](https://github.com/tweag/nickel/issues/240),
|
||||
[#255](https://github.com/tweag/nickel/issues/255),
|
||||
[#279](https://github.com/tweag/nickel/issues/279).
|
||||
|
||||
## Context
|
||||
|
||||
### Definition of the problem
|
||||
|
||||
*Overriding* is the ability to take an existing configuration - in our case, a
|
||||
record - and update a specific part of it, such as a field or the field of a
|
||||
nested record.
|
||||
|
||||
A naive and unergonomic of way of doing it is to repeat all the original
|
||||
values in a new record:
|
||||
|
||||
```
|
||||
let record = {
|
||||
a = 1,
|
||||
b = "str",
|
||||
c = false
|
||||
} in
|
||||
{
|
||||
a = record.a,
|
||||
b = record.b,
|
||||
c = true
|
||||
}
|
||||
```
|
||||
|
||||
This quickly becomes unmanageable for large records, which are common in Nix and
|
||||
Nixpkgs. In functional languages, this is usually better done using what's
|
||||
called functional update. Functional update is an operation that takes the
|
||||
original record, one or several fields to update, the corresponding new value,
|
||||
and returns a new updated record. It has the same semantics as our first
|
||||
snippet, but doesn't require to rewrite all unchanged fields.
|
||||
|
||||
It can have a builtin syntax, such as OCaml's `with`: `{record with field =
|
||||
new_value}`, Haskell's `record {field = newValue}`, Nix `//` operator `record //
|
||||
{field = newValue}`, or Rust's syntax `RecordDataType {field: new_value,
|
||||
..record}`. There are more advanced programming techniques that make updating
|
||||
deeply nested records ergonomic such as
|
||||
[Lenses](https://www.fpcomplete.com/haskell/tutorial/lens/) in Haskell, but
|
||||
these rely too heavily on advanced language and typing features to be practical
|
||||
in Nickel.
|
||||
|
||||
In Nickel, we can already do a functional update using merging, although the
|
||||
updated field must have been marked as `default`:
|
||||
|
||||
```nickel
|
||||
let record = {
|
||||
a = 1,
|
||||
b = "str",
|
||||
c | default = false,
|
||||
} in
|
||||
record & {c = true}
|
||||
```
|
||||
|
||||
As explained in the next section though, this is not satisfying.
|
||||
|
||||
### Overriding recursive records
|
||||
|
||||
Nickel's records are different from the ones of OCaml, Haskell or Rust. They
|
||||
are lazy and recursive by default. They are thus better understood as
|
||||
[codata](https://link.springer.com/chapter/10.1007/978-3-030-17184-1_5)
|
||||
rather than data. Take the following example:
|
||||
|
||||
```nickel
|
||||
let record = {
|
||||
protocol | default = `Http,
|
||||
port | default = switch protocol {
|
||||
`Http => 80,
|
||||
`Ftp => 21,
|
||||
_ => 8181,
|
||||
},
|
||||
} in
|
||||
record & { protocol = `Ftp }
|
||||
```
|
||||
|
||||
Here, our final record will be ``{protocol = `Ftp, port = 80}``. `port` is an
|
||||
expression depending on `protocol`, but it is not updated in the new version,
|
||||
which is something one could intuitively expect from a recursive record. Such
|
||||
recursive update is a pattern actively used in Nix, and is probably a common
|
||||
scenario for configurations in general: if I patch a base configuration by
|
||||
changing an option, I usually want other values depending on this option to be
|
||||
updated as well.
|
||||
|
||||
## Comparing override mechanisms
|
||||
We are going to review various overriding mechanisms of Nix and other related
|
||||
languages. Let us sketch some general traits of overriding mechanisms as a
|
||||
framework for comparison:
|
||||
|
||||
- **(ERG)**: Ergonomy. A mechanism is ergonomic if it avoids
|
||||
complex encodings for overridable records, if it doesn't require extra library
|
||||
functions, and so on. It would ideally work out of the box with the native
|
||||
records of the language.
|
||||
- **(NEST)**: Nested field overriding. The ease of overriding nested fields,
|
||||
such as the value of `bar` in `{foo = {bar = "baz"}}`. Ideally, overriding
|
||||
nested fields is no harder than top-level fields.
|
||||
- **(COMP)**: Composability. Overriding is composable when one can seamlessly
|
||||
apply several overrides to an initial record.
|
||||
- **(EXPR)**: Expressivity. An expressive mechanism is one that allow to express
|
||||
and do more things. For example, it can be the ability to drop some fields, to
|
||||
access the previous record's version of the data (`super`), and so on.
|
||||
|
||||
## Overriding in Nix
|
||||
|
||||
Nix expressions don't have a built-in way of overriding recursive records while
|
||||
automatically updating the dependent recursive fields either. Since the feature
|
||||
is still actively needed, it is provided by mechanisms implemented in library
|
||||
code.
|
||||
|
||||
Keeping a language lean is usually a good design guideline: to provide an
|
||||
expressive yet simple set of features upon which others can be built as
|
||||
libraries. However, in the case of Nix, overriding is a fundamental pattern, and
|
||||
having it implemented in user code leads to some general well-known issues:
|
||||
|
||||
- ~~**(ERG)**~~ Definition, update or field access is not as ergonomic as with
|
||||
vanilla native records.
|
||||
- Several competing mechanisms have been implemented in slightly different
|
||||
contexts. Overriding in general is already hard for beginners, but having to
|
||||
learn several ones, all similar but still different, is even harder.
|
||||
- User-land implementations don't have a first-class access to information such
|
||||
as source location and metavalues. In a dynamic language, this can make good
|
||||
error reporting much harder.
|
||||
- It is potentially harder to make user land implementations efficient.
|
||||
|
||||
See [this gist](https://gist.github.com/edolstra/29ce9d8ea399b703a7023073b0dbc00d)
|
||||
for more details. We continue with an overview of existing mechanisms in Nix and
|
||||
related languages.
|
||||
|
||||
### Nixpkgs overriding
|
||||
|
||||
The basic idea is to represent a recursive record explicitly as a function from
|
||||
a `self` record to the final record (here, in Nix syntax):
|
||||
|
||||
```nix
|
||||
r = rec {
|
||||
a = 1;
|
||||
b = a + 1;
|
||||
}
|
||||
# r is represented as:
|
||||
rRepr = self: {
|
||||
a = 1;
|
||||
b = self.a + 1;
|
||||
}
|
||||
```
|
||||
|
||||
`self` is a self-reference, akin to `this` in object oriented languages. It is
|
||||
computed as a [fixpoint](https://en.wikipedia.org/wiki/Fixed-point_combinator),
|
||||
simply realized by auto-application in Nix, thanks to laziness:
|
||||
|
||||
```nix
|
||||
r = rRepr r
|
||||
```
|
||||
|
||||
The explicit dependency to `self` gives the ability to provide a different
|
||||
definition for `self.a` in the final value. Overriding is achieved exactly by
|
||||
functionally updating the `a` field of the `self` parameter before passing it to
|
||||
the original representation:
|
||||
|
||||
```nix
|
||||
let extension = {a = 2;}; in
|
||||
# The fixpoint of result is { a = 2; b = 3; }
|
||||
resultRepr = self: (rRepr (self // extension)) // extension
|
||||
```
|
||||
|
||||
The second outer update ensures that the final result is also set to `a = 2`,
|
||||
and not only the `a` appearing in `b`.
|
||||
|
||||
Some details are left out, but this is the gist of it. See also [the Nix pill on
|
||||
overriding](https://nixos.org/guides/nix-pills/override-design-pattern.html) or
|
||||
[this article on fixpoints in Nix](http://r6.ca/blog/20140422T142911Z.html).
|
||||
|
||||
#### Limits
|
||||
|
||||
- ~~**(ERG)**~~ Use a specific representation, rather than handling good old plain records
|
||||
(although the actual representation in Nixpkgs is more ergonomic than a plain
|
||||
function).
|
||||
- ~~**(NEST)**~~ Overriding nested attribute sets is painful. If one do the naive thing,
|
||||
the whole subrecord is erased:
|
||||
```nix
|
||||
let rRepr = self: {
|
||||
a = {b = self.a.c;};
|
||||
}; in
|
||||
let extension = {a = {c = 2;};}; in
|
||||
rExt = let fixpoint = rRepr (fixpoint // extension); in
|
||||
fixpoint // extension
|
||||
# Gives {a = {c = 2;};} instead of expected {a = {b = 2; c = 2;};}
|
||||
```
|
||||
|
||||
### Nixpkgs overlays
|
||||
|
||||
[Overlays](https://nixos.wiki/wiki/Overlays) can be seen as a sequence of
|
||||
transformations from a base record, each layer having access to a `super`
|
||||
reference to the previous layer and the `self` reference to the final value.
|
||||
|
||||
Take two consecutive transformations:
|
||||
|
||||
1. Set the `a` field to `1`
|
||||
2. Set the `b` field to `1`, add `1` to the `a` field
|
||||
|
||||
```nix
|
||||
let baseRepr = self: {c = self.a + self.b;}; in
|
||||
let overlay1 = self: super: {a = 1;}; in
|
||||
let overlay2 = self: super: {b = 1; a = super.a + 1;}; in
|
||||
let applyOverlays = self:
|
||||
let base = baseRepr self; in
|
||||
let first = base // overlay1 self base; in
|
||||
let second = first // overlay2 self first; in
|
||||
second; in
|
||||
let fixpoint = applyOverlays fixpoint; in fixpoint
|
||||
```
|
||||
|
||||
In practice, the `super // ..` and fixpoints parts can be factorised in
|
||||
dedicated helper functions.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- **(EXPR)**: The explicit representation of layers by `super`
|
||||
and `self` gives a large control to the user.
|
||||
- **(COMP)**: Composition is first-class.
|
||||
|
||||
#### Limits
|
||||
|
||||
- ~~**(ERG)**~~ Users still need to manipulate representations.
|
||||
- **order-dependency**: The result is order-dependent. Applying both `self:
|
||||
super: {a = super.a + 1}` and `self: super: {a = super.a / 2}` can give two
|
||||
different results depending on which one is the first layer. This also means
|
||||
that overrides must be grouped by layers, which is not necessarily the most
|
||||
logical structure.
|
||||
- ~~**(NEST)**~~ Overriding nested fields is still clumsy. For example, to
|
||||
override `lib.firefoxVersion`:
|
||||
```nix
|
||||
self: super: { lib = (super.lib or {}) // { firefoxVersion = ...; }; }
|
||||
```
|
||||
|
||||
### NixOs module system
|
||||
|
||||
The NixOS module system takes a different approach. There, all the basic blocks
|
||||
— the modules — are merged together in an unspecified order. What's deciding the
|
||||
priority of one option over the other are attributes that are explicitly stated
|
||||
in the modules themselves. The declaration of options (type, priority, etc.) and
|
||||
their assignment must be made separately.
|
||||
|
||||
Compared to overlays, the explicit reference to `super` has disappeared. This
|
||||
makes it look like modules can only override one value with another, instead of
|
||||
combining different pieces of data together.
|
||||
|
||||
This apparent shortcoming is solved by custom merge functions, that can redefine
|
||||
how to combine different values of the field of a configuration. By default,
|
||||
when merging two list values, the module system only knows how to replace one
|
||||
value with the other, because there's no canonical and commutative way of
|
||||
merging two lists. However, the user can specify that these lists should in
|
||||
fact be concatenated, resulting in the possibility of defining a list of paths
|
||||
by pieces:
|
||||
|
||||
```nix
|
||||
# Some module
|
||||
{
|
||||
paths = mkOption {
|
||||
type = types.listOf types.path;
|
||||
};
|
||||
}
|
||||
|
||||
# Config
|
||||
[..]
|
||||
someModule.paths = [foo/bar];
|
||||
|
||||
# Other config
|
||||
[..]
|
||||
someModule.paths = [/bar/baz];
|
||||
|
||||
# Resulting config: [foo/bar /bar/baz]
|
||||
```
|
||||
|
||||
#### Advantages
|
||||
|
||||
- **order-independence**: as opposed to overlays, modules can be merged in any
|
||||
order, and the final result will be the same.
|
||||
|
||||
#### Limits
|
||||
|
||||
- ~~**(EXPR)**~~: Despite custom merge functions, this model is less powerful
|
||||
than overlays which have access to an explicit reference to `super`.
|
||||
Modules, on the other hand, are all merged at the same level in an
|
||||
unspecified order.
|
||||
|
||||
For example, the following has no equivalent in a merging model:
|
||||
```nix
|
||||
let overlay = self: super: {b = self.c + 1; a = super.c + 1;}; in #...
|
||||
```
|
||||
|
||||
However, modules are quite capable in practice. Removing fields and being
|
||||
order-dependent as in the overlay example above is not necessarily a good idea,
|
||||
and this limitation of expressivity may in fact be a good thing.
|
||||
|
||||
## Overriding elsewhere
|
||||
|
||||
### CUE
|
||||
CUE allows a form of late-binding for recursive attributes:
|
||||
|
||||
```
|
||||
$cat test.cue
|
||||
fields: {
|
||||
a: int
|
||||
b: a + 1
|
||||
} & {
|
||||
a: 2
|
||||
}
|
||||
$cue eval test.cue
|
||||
fields: {
|
||||
a: 2
|
||||
b: 3
|
||||
}
|
||||
```
|
||||
|
||||
Combined with [default values](https://cuelang.org/docs/tutorials/tour/types/defaults/),
|
||||
this provides an overriding mechanism:
|
||||
|
||||
```
|
||||
$cat test2.cue
|
||||
fields: {
|
||||
a: int | *1
|
||||
b: a + 1
|
||||
} & {
|
||||
a: 2
|
||||
}
|
||||
$cue eval test2.cue
|
||||
fields: {
|
||||
a: 2
|
||||
b: 3
|
||||
}
|
||||
```
|
||||
|
||||
The basics of this overriding mechanism are close in spirit to what we want to
|
||||
achieve in this proposal. This is similar to Nickel's current `default`
|
||||
behavior, but with the desired late-bound merging. The expressivity is however
|
||||
too limited to replace e.g. the NixOS module system.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- **(ERG)**: Overriding works seamlessly with native records. Note however that
|
||||
it requires fields to be explicitly marked as overridable (default).
|
||||
|
||||
#### Limits
|
||||
|
||||
- ~~**(EXP)**~~: One can only replace a value by another one. It is not
|
||||
possible to add one to the previous value, for example.
|
||||
- ~~**(COMP)**~~: Overriding a record a second time is not possible.
|
||||
|
||||
### Jsonnet
|
||||
|
||||
Jsonnet refers to its overriding mechanism as *inheritance*, implemented by the
|
||||
`+` operator on objects:
|
||||
|
||||
```jsonnet
|
||||
local obj = {
|
||||
name: "Alice",
|
||||
greeting: "Hello, " + self.name,
|
||||
};
|
||||
[
|
||||
obj,
|
||||
obj + { name: "Bob" },
|
||||
obj + { greeting: super.greeting + "!"},
|
||||
obj + { name: "Bob", greeting: super.greeting + "!"},
|
||||
]
|
||||
```
|
||||
|
||||
This is similar to the Nix operator `//`, but doing recursive overriding in the
|
||||
expected way out of the box. The extension can access the previous version in
|
||||
the same way as Nixpkgs overlays, using the `super` keyword.
|
||||
|
||||
#### Advantages
|
||||
|
||||
- **(ERG)**: Integrated with native records.
|
||||
- **(COMP)**: Overriding is easily iterated.
|
||||
- **(EXP)**: Achieve the same level of expressivity than overlays with the
|
||||
special keywords `self` and `super`.
|
||||
|
||||
#### Limits
|
||||
|
||||
- **order-dependency**: As with overlays, the order of application of overrides
|
||||
matters.
|
||||
|
||||
## Taking a step back
|
||||
|
||||
While not exactly the same, all these overriding mechanisms are based on the
|
||||
same underlying principles:
|
||||
1. Represent recursive records (explicitly or implicitly) as a function of
|
||||
`self` (and `super` in some cases)
|
||||
2. Compute the combination as a **fixpoint** of several recursive records,
|
||||
giving back a standard record.
|
||||
|
||||
As hinted by Jsonnet's terminology, the semantics of recursive records
|
||||
strikingly resemble the semantics of objects and classes in OOP. Replace
|
||||
records with objects, fields with methods and overriding with inheritance. This
|
||||
is not so surprising: there's actually an history of encoding objects in
|
||||
functional languages as recursive records (see the introduction of
|
||||
[The Recursive Record Semantics of Objects Revisited](https://hal.inria.fr/inria-00072423)
|
||||
for a good overview) going back to 1988
|
||||
\[[Cardelli](http://lucacardelli.name/papers/inheritance.pdf)\].
|
||||
|
||||
This is also mentioned in the
|
||||
[README](https://github.com/MuKnIO/nixpkgs/blob/devel/lib/pop.md#some-historical-context)
|
||||
of POP (an object system in Nix), where the author observes that overriding
|
||||
mechanisms in Nix (and Jsonnet for that matter) are a simplified lazy object
|
||||
system (simplified because objects lack proper state and there is no distinction
|
||||
between classes and instances). Their logical conclusion is to embrace this fact
|
||||
and design a proper object system helped by existing literature, rather than
|
||||
reinventing the wheel. Similarly, Nix [overlays](https://nixos.wiki/wiki/Overlays)
|
||||
can be seen as a single inheritance mechanism.
|
||||
|
||||
Inheritance-based overriding imposes an order on the overrides. A single level
|
||||
inheritance is usually fine, but a complex hierarchy can become hard to maintain
|
||||
and to reason about.
|
||||
|
||||
The NixOS module system is designed differently. It is based on merging: the
|
||||
configuration is created by combining a set of unordered records following
|
||||
specific rules. Of course, there's still a need for ordering information
|
||||
somewhere, but it is rather expressed as priorities. This system has the
|
||||
advantage of making merge commutative (in contrast with inheritance or the `//`
|
||||
operator), as in CUE, and to untie data definition from precedence
|
||||
specification: one can define a module where each field has a different
|
||||
priority, if it makes sense to group them logically. With inheritance, values
|
||||
are required to be grouped in layers of the same priority, instead of logically.
|
||||
|
||||
Summing up the differences between inheritance-based mechanisms and merge-based
|
||||
mechanism:
|
||||
|
||||
**order-dependency**
|
||||
|
||||
Inheritance is order dependent: the chain of extensions must be defined with the
|
||||
precise order they will be applied in mind, and definitions must be grouped in
|
||||
consequence. On the other hand, merging is commutative, and the precedence
|
||||
information is encoded as priorities. Thus, overriding by merging can be defined
|
||||
using stand-alone pieces of data, although the behavior of priorities is not
|
||||
local.
|
||||
|
||||
**(EXP)**
|
||||
|
||||
Merged records have only access to the final computed fixpoint `self`, while
|
||||
objects have access to the previous stage of extension via `super`. However, as
|
||||
in the NixOS module system, it is possible to address this issue using custom
|
||||
merge functions. This is a bit less expressive, but in a good way: it forces the
|
||||
merge strategy to be uniform along each field, while mechanisms like overlays or
|
||||
inheritance can do pretty much anything.
|
||||
|
||||
## Proposal overview
|
||||
|
||||
We basically propose to adopt the same kind of merge, priority, and custom merge
|
||||
function-based scheme as the NixOS module system. One big difference with the
|
||||
NixOS module system is that it would be built-in in the language, being usable
|
||||
for any native record whatsoever, and having good interaction with the rest of
|
||||
the language features. The implementation would have first class access to
|
||||
locations, to the AST, the memory layout, the metadata, and so on.
|
||||
|
||||
We first describe the ideas at a high-level. Then, we review the issue and
|
||||
challenges of this approach, and propose concrete solutions to overcome or
|
||||
mitigate them. Finally, the precise operational semantics is laid out in a
|
||||
dedicated section.
|
||||
|
||||
### Recursive records & merging
|
||||
|
||||
**This section defines the semantics, rather than an actual efficient
|
||||
implementation**.
|
||||
|
||||
As before, it is useful to see recursive records represented as functions - or
|
||||
constructors - that take a `self` parameter and return a final non recursive record,
|
||||
in the same way as the original overriding mechanism of Nixpkgs (the non-existing
|
||||
syntax `def := value` is used to insist on the fact that we are defining new objects):
|
||||
|
||||
```
|
||||
r = {
|
||||
a = 1;
|
||||
b = a + 1;
|
||||
}
|
||||
// Definition of the representation of r
|
||||
repr(r) := fun self => {
|
||||
a = 1;
|
||||
b = self.a + 1;
|
||||
}
|
||||
```
|
||||
|
||||
Field access amounts to compute a fix-point:
|
||||
|
||||
```
|
||||
repr(r).foo := let fix = repr(r) fix in fix.foo
|
||||
```
|
||||
|
||||
Merging simply merges the underlying representations:
|
||||
|
||||
```
|
||||
repr(r1) & repr(r2) := fun self => r1 self & r2 self
|
||||
```
|
||||
|
||||
### Priorities
|
||||
|
||||
Merge is fundamentally commutative. The problem is, not all fields should be
|
||||
treated the same: some are default values that ought to be overridden, some are
|
||||
high-priority values that ought to override. The solution is to use a priority
|
||||
system, encoding the precedence of each value while retaining commutativity
|
||||
(only the priority annotation counts, not the order in which values are merged).
|
||||
|
||||
Priorities have drawbacks. One is non-locality: the final result depends on
|
||||
other priorities of the field's values you are being merged with, potentially
|
||||
defined elsewhere. However, this can be mitigated by good defaults and an
|
||||
adapted set of priorities, such as a bottom `default`, a top `force`, and a
|
||||
infinite range of integers priorities in between, the default priority (when no
|
||||
priority is provided) being `0`. Doing so, it's easy to just use `default` or
|
||||
`force` to override or be overridden by anything, without knowing the precise
|
||||
integer priority. Integer priorities still gives freedom with an infinite supply
|
||||
of levels if required.
|
||||
|
||||
### Custom merge functions
|
||||
|
||||
Custom merge functions would be specified by a `merge` metavalue attribute. In
|
||||
order to enforce commutativity, they would receive their arguments in an
|
||||
indistinguishable order, such as having the type `{lower: Dyn, higher: Dyn,
|
||||
priority: <Different, Equal>} -> Dyn`. If both have the same priority, the order
|
||||
is not specified, and may even be randomized by the interpreter. The `priority`
|
||||
field indicates when it is the case, if this case needs special handling.
|
||||
|
||||
```nickel
|
||||
let mergeLists :
|
||||
forall a. {lower: List a, higher: List a, priority: <Different, Equal>} -> List a
|
||||
= fun args => args.lower @ args.higher in
|
||||
|
||||
let Contract = {
|
||||
path | List Str
|
||||
| merge mergeLists
|
||||
| doc "A list of paths to search in."
|
||||
} in
|
||||
|
||||
let block1 | #Contract = {
|
||||
path = ["/usr/local/bin"]
|
||||
} in
|
||||
let block2 = {
|
||||
path = ["/bin"]
|
||||
} in
|
||||
|
||||
// { path = ["usr/local/bin", "/bin"] }
|
||||
block1 & block2
|
||||
```
|
||||
|
||||
## Issues & challenges
|
||||
|
||||
The proposal raises some questions. While none seems insuperable, users will
|
||||
need to have a good mental picture of the system, and we should be careful to
|
||||
avoid ending up with a complicated system full of ad-hoc fixes to edge cases.
|
||||
|
||||
### Implementation
|
||||
|
||||
Recomputing the fixpoint at each field access is wasteful, because recursive
|
||||
records as functions satisfy the following property: as long as it is not
|
||||
merged, the `self` argument is constant. Thus, we can memoize the fixpoint and
|
||||
only invalidate `self` on a merge. Also, fields that do not depend on a
|
||||
recursive variable can be hoisted out of the function, using a representation
|
||||
like:
|
||||
|
||||
```
|
||||
repr(r) := {
|
||||
a = 1;
|
||||
b = fun self => self.a + 1;
|
||||
}
|
||||
```
|
||||
|
||||
Concretely, each field of a record may either be a thunk as usual, in the case
|
||||
of non recursive expressions like `a`, or a thunk together with the original
|
||||
expression in the recursive case. All record operations except merge operate
|
||||
on the thunk. A merge operation, on the other hand, restores the original
|
||||
expression again.
|
||||
|
||||
There's more potential optimizations, but this first step should be a reasonable
|
||||
trade-off between implementation complexity and performance. See
|
||||
[#103](https://github.com/tweag/nickel/issues/103) for more details.
|
||||
|
||||
### Scoping
|
||||
|
||||
Should a record be able to access a yet undefined field because it is expected
|
||||
to be provided by a subsequent merge? Two possible approaches:
|
||||
|
||||
- **dynamic scoping**: records can reference fields that are not explicitly
|
||||
defined locally, such as:
|
||||
```nickel
|
||||
{a = b} & {b = 1}
|
||||
```
|
||||
Dynamic scoping have a number of issues, and is usually considered bad
|
||||
practice.
|
||||
- **lexical scoping**: as currently, require self-referenced fields to be
|
||||
defined locally. Note that thanks to contracts, one can require the existence
|
||||
of a field without defining it. For example, we could write the previous
|
||||
example as:
|
||||
```nickel
|
||||
{a = b, b | Num} & {b = 1}
|
||||
```
|
||||
This is also a better practice to explicitly state the fields whose presence
|
||||
is assumed in general.
|
||||
|
||||
This RFC proposes to adopt **lexical scoping**. We could have an even lighter
|
||||
syntax, such as `{a = b, b} & {b = 1}` for requiring the presence of a field
|
||||
`b`.
|
||||
|
||||
### Priorities
|
||||
|
||||
#### Levels
|
||||
|
||||
This RFC proposes to adopt the following ordered set of priorities:
|
||||
|
||||
- `default` is the bottom element
|
||||
- Integers priorities in the middle
|
||||
- `force` is the top element
|
||||
|
||||
That is, `Priorities := default \/ {n | n integer} \/ top` with `default <= ...
|
||||
<= -1 <= 0 <= 1 <= ... <= force`.
|
||||
|
||||
If not specified, the normal priority (the default priority, no to be confused
|
||||
with the `default` priority) is `0`. This provides an infinite supply of
|
||||
priorities both below (`default \/ {n | n < 0}`) as well as above (`force \/ {n
|
||||
| n > 0}`).
|
||||
|
||||
Integer priorities are specified using the `priority` keyword. Defining more
|
||||
than one priority in the same meta-value is an error.
|
||||
|
||||
Example:
|
||||
```nickel
|
||||
{
|
||||
foo | Num
|
||||
| default = 1,
|
||||
|
||||
bar | Str,
|
||||
//equivalent to `bar | Str | priority 0`
|
||||
|
||||
baz.boo.bor | priority -4 = "value",
|
||||
|
||||
final | force = `CantOverrideMe,
|
||||
}
|
||||
```
|
||||
|
||||
#### Recursive priorities
|
||||
|
||||
As noted in [#240](https://github.com/tweag/nickel/issues/240), configurations
|
||||
should be easily overridable, and the approach outlined until now can end up
|
||||
annoyingly requiring configurations to be written with either `default` or
|
||||
`force` everywhere.
|
||||
|
||||
This RFC proposes to add *recursive* (or "leafy", or "push down") priorities, as
|
||||
described in [#279](https://github.com/tweag/nickel/issues/279). We define the
|
||||
new meta-values `default rec` and `force rec`, whose semantics are defined as:
|
||||
|
||||
- `eval(expr | default rec)`: case of `eval(expr)`:
|
||||
* `{field1 = value1, .., fieldn = valuen} | annots`: `{field1 = (value1 | default rec aux), .., fieldn =
|
||||
(valuen | default rec aux)} | annots`
|
||||
* `v | annots` if `v` is not a record: `v | defaulted(annots)` where
|
||||
`defaulted(annots)` is defined below.
|
||||
|
||||
- `defaulted(annots)`:
|
||||
* if `annots` contains the priority metavalue `force`, then `defaulted(annots)
|
||||
:= annots`
|
||||
* otherwise, let `annots' | prio` be the decomposition of `annots` into a
|
||||
priority `prio` (possibly empty) and the other metavalues, then
|
||||
`defaulted(annots) := annots' | default`
|
||||
|
||||
That is, `default rec` recursively overwrites all the priorities of the leafs of
|
||||
a record to `default`, excepted for `force` that is left untouched (metavalues
|
||||
are written in a liberal way, in that they can be empty).
|
||||
|
||||
`force rec` is defined similarly, excepted that it erases all priorites, even
|
||||
default ones. The names `default rec/force rec` are just suggestions.
|
||||
|
||||
Example:
|
||||
```nickel
|
||||
let neutralConf = {
|
||||
foo = 1,
|
||||
bar.baz = "stuff",
|
||||
bar.blorg = false,
|
||||
}
|
||||
|
||||
let defaulted | default rec = neutralConf
|
||||
// ^ Will evaluate to:
|
||||
// {
|
||||
// foo | default = 1,
|
||||
// bar = {
|
||||
// baz | default = "stuff",
|
||||
// bar.blorg | default = false,
|
||||
// },
|
||||
// }
|
||||
// This is different from `neutralConf | default`! The latter version
|
||||
// would be overrided at once, as illustrated below.
|
||||
|
||||
defaulted & {bar.baz = "shapoinkl"}
|
||||
// ^ Gives the expected:
|
||||
// {
|
||||
// foo | default = 1,
|
||||
// bar = {
|
||||
// baz = "shapoinkl";
|
||||
// bar.blor | default = false,
|
||||
// },
|
||||
// }
|
||||
// While
|
||||
|
||||
(neutralConf | default) & {bar.baz = "shapoinkl"}
|
||||
// ^ This gives only:
|
||||
// {bar.baz = "shapoinkl"}
|
||||
```
|
||||
|
||||
This way, an existing definition (arbitrarily complex: that could be the root of
|
||||
all Nixpkgs) can easily (and lazily) be turned into a full overriding or
|
||||
overridable configuration. A possible extension is to have a user-provided
|
||||
function that is mapped on priotities: `default/force rec` are just special
|
||||
cases of this.
|
||||
|
||||
### Custom merge functions
|
||||
|
||||
In the case of the NixOS module system, all the configuration is merged in one
|
||||
final phase that conceptually evaluates the merge AST.
|
||||
|
||||
In Nickel, merging can be done partially and observed by expressions. The
|
||||
natural semantics of custom merge functions should be to affect all subsequent
|
||||
merging, as for contracts application. Indeed, we want to specify a contract or
|
||||
a custom merge function once and for all, not to repeat it at every field
|
||||
definition:
|
||||
|
||||
```nickel
|
||||
let add = fun args => args.lower + args.higher in
|
||||
// {a = 3}
|
||||
{a | merge add = 1} & {a = 1} & {a = 1}
|
||||
```
|
||||
|
||||
But what about `{a = 1} & {a | merge add = 1} & {a = 2}`? If a merge annotation
|
||||
only affects terms on the right hand side, this breaks commutativity.
|
||||
|
||||
```
|
||||
let r1 = {a = 1} in
|
||||
let r2 = {a = 1} in
|
||||
let r3 = {a | merge add = 1} in
|
||||
|
||||
// {val = 2}
|
||||
r1 & r2 & r3
|
||||
|
||||
// {val = 3}
|
||||
r3 & r1 & r2
|
||||
```
|
||||
|
||||
Possible solutions:
|
||||
|
||||
1. Make a `n-ary` merging behave differently than repeated merging, which would
|
||||
apply the custom merge function to all arguments. This doesn't solve the
|
||||
commutativity problem per se: we also have to decide that in ordinary
|
||||
merging, custom function wouldn't affect subsequent merges anymore. This does
|
||||
raise yet other problems, one being of a good distinctive syntax. More
|
||||
generally, it sounds confusing and ad-hoc.
|
||||
|
||||
2. What seems the right, but not trivial solution: symmetrize custom merging and
|
||||
make it "distribute backward" as well. That is (simplifying by forgetting
|
||||
about priority and `lower`/`higher` arguments):
|
||||
|
||||
```nickel
|
||||
val1 & val2 & (val3 | merge func)
|
||||
<=> (val1 | merge func) & val2 & val3
|
||||
<=> func (func val1 val2) val3
|
||||
// instead of the naive
|
||||
<=/=> func (val1 & val2) val3
|
||||
```
|
||||
|
||||
This RFC proposes solution 2. This raises some difficulties:
|
||||
|
||||
- What if `val1 & val2` is in a thunk that happens to be forced in between?
|
||||
What should `let x = val1 & val2 in x & (val3 | merge func)` evaluate to, or
|
||||
`let x = val1 & val2 in builtins.seq x (x & (val3 | merge func))`?
|
||||
|
||||
In some sense, we would like to have a call-by-name semantics rather than a
|
||||
call-by-need.
|
||||
|
||||
* (a) In the case of record fields, overriding has to solve the
|
||||
same problem of remembering an original expression along an evaluated version.
|
||||
If we allow custom merge annotations only on record fields, it is possible to
|
||||
recover the original merge AST (abstract syntax tree) and interpret it with
|
||||
the custom merge function as wanted. This may prove a bit tricky to implement
|
||||
in practice, for the merge expression `val1 & val2` may be obfuscated by some
|
||||
program transformations or evaluation artifacts, but doable.
|
||||
|
||||
* (b) A more extreme take is to generalize this to any term: a
|
||||
merge expression would behave like a lazy datatype `Merge(t1,t2)` with
|
||||
respect to evaluation. Evaluating it would amount to automatically apply
|
||||
`eval : Merge(t1,t2) -> t1 & t2` with a potential merge function, which
|
||||
wouldn't erase the original top-level thunk `Merge(t1,t2)`. This is in fact
|
||||
how default values are currently handled. Doing so, we don't need to
|
||||
restrict custom merge functions to record fields. This incurs an additional
|
||||
cost though for merging (remembering all the original ASTs of merge
|
||||
expressions), even when one doesn't actually use custom merge functions.
|
||||
|
||||
We propose to implement directly the general solution of (b).
|
||||
|
||||
## Operational semantics
|
||||
|
||||
This section defines the operational semantics of merging with the choices
|
||||
previously described.
|
||||
|
||||
The choices made in [custom merge function section](#custom-merge-functions-2),
|
||||
in particular symmetrizing merge with respect to custom merge function
|
||||
annotations, require the evaluation process to:
|
||||
|
||||
1. Determine the abstract syntax tree (AST) of a merge expression
|
||||
2. Decide the merge function to use
|
||||
3. Interpret the tree with the given function
|
||||
|
||||
Let us define the notion of a *merge tree*. The definition determines what is
|
||||
the area of influence of a merge function annotation, answering the following
|
||||
questions:
|
||||
|
||||
```
|
||||
let add = fun x y => x + y in
|
||||
|
||||
let var = 1 & 1 in
|
||||
var & (1 | merge add) // result?
|
||||
(var | merge add) & 1 // result?
|
||||
|
||||
((1 & 1) + (1 & 1)) & (1 | merge add) // result?
|
||||
((1 & 1) + (1 & 1) | merge add) & 1 // result?
|
||||
|
||||
// file: somefile.ncl
|
||||
1 & 1
|
||||
// file: other.ncl
|
||||
(import "somefile") & (1 | merge add) // result?
|
||||
(import "somefile" | merge add) & 1 // result?
|
||||
```
|
||||
|
||||
In the following, we will write "meta"-code (think of the code of the Nickel
|
||||
interpreter) in ML pseudo-code. To distinguish this meta-code from Nickel
|
||||
expressions, we use the quote syntax `[| exp |]` to denote a Nickel expression.
|
||||
For example, `hasAnnot [| f (1 + 1) |] = false` means that the value of the
|
||||
`hasAnnot` (meta-)function on the Nickel expression `f (1 + 1)` is `false`.
|
||||
|
||||
### Merge tree
|
||||
|
||||
A merge tree is a binary tree whose nodes are labelled by a `metadata`. Leafs
|
||||
are labelled with both a `metadata` and a Nickel expression `e` in WHNF (weak
|
||||
head normal form, that is, evaluated). In this context, a `metadata` is a
|
||||
(meta-)record which field names corresponds to meta attributes
|
||||
`custom,priority,default,contract` and so on. Fields are optional, excepted
|
||||
`priority`, which must always be defined.
|
||||
|
||||
```haskell
|
||||
data Metadata = Metadata {
|
||||
priority :: Priority,
|
||||
merge :: Option Priority,
|
||||
contracts :: Option (List Contract),
|
||||
// ...
|
||||
}
|
||||
|
||||
data AbsMergeTree a =
|
||||
Merge a a
|
||||
| Exp Expression
|
||||
|
||||
type MergeTree = MergeTree (AbsMergeTree (MergeTree, Metadata))
|
||||
```
|
||||
|
||||
The function `mergeTree: Expression -> MergeTree` compute the merge tree of an
|
||||
expression. It needs to evaluate leafs to see if they contains themselves merge
|
||||
expressions, such that `let x = a & b in x & c` and `a & b & c` has the same
|
||||
merge tree: that is, merge treee commute with evaluation.
|
||||
|
||||
```
|
||||
// we maintain an environment of bindings in `env`
|
||||
|
||||
metaData [| e | attr = val, ... |] ::=
|
||||
{ attr = val, ... } if priority is set
|
||||
{ priority = normal, attr = val, ...} otherwise
|
||||
|
||||
mergeTree e ::= (absMergeTree e, metaData e)
|
||||
|
||||
absMergeTree [| e1 & e2 |] = Merge(mergeTree(e1), mergeTree(e2))
|
||||
|
||||
// whnf = Weak head normal form, result of evaluation
|
||||
absMergeTree [| whnf |] @ e = Exp(e)
|
||||
|
||||
// Should mergeTree cross import boundaries? Probably not
|
||||
absMergeTree [| import path |] @ e = Exp(e)
|
||||
|
||||
// All other cases
|
||||
absMergeTree e = weakEval e
|
||||
|
||||
// weakEval is defined exactly as standard evaluation, excepted that it stops at
|
||||
// merge expressions, as if they were a lazy datatype in weak head normal form
|
||||
|
||||
weakEval [| e1 & e2 |] @ e = e
|
||||
|
||||
// all other cases are defined exactly as for eval
|
||||
weakEval e = ...
|
||||
```
|
||||
|
||||
### Semantics
|
||||
|
||||
Now that we defined merge trees, we need to extract a potential merge fonction
|
||||
from it:
|
||||
|
||||
```
|
||||
type MergeFunction =
|
||||
{value: Dyn, priority: Priority} ->
|
||||
{value: Dyn, priority: Priority} -> Dyn
|
||||
|
||||
extractMergeFuns : MergeTree -> List MergeFunction
|
||||
|
||||
extractMergeFuns (Merge(ast1,_),Merge(ast2,_)) = extractMergeFuns ast1 @ extractMergeFuns ast2
|
||||
extractMergeFuns (Leaf(e,meta)) = [f] if meta.merge == Some([| f |])
|
||||
[] otherwise
|
||||
|
||||
mergeFunction : MergeTree -> Result MergeFunction ()
|
||||
mergeFunction t = let funs = extractMergeFuns t in
|
||||
if lists.length t == 0 then
|
||||
Ok(__builtinMerge) // the standard `&` merge function
|
||||
else if lists.length t == 1 then
|
||||
Ok(head t)
|
||||
else
|
||||
Err(())
|
||||
```
|
||||
|
||||
And the interpretation of a merge tree by a merge function:
|
||||
|
||||
```
|
||||
// can be extended, as long as it is a partially ordered set
|
||||
Priority = Number | -inf | +inf | ...
|
||||
|
||||
interpret (Merge(ast_1,meta_1),Merge(ast_2,meta_2)) f =
|
||||
f {value = interpret ast_i f, prio = meta_i.priority}
|
||||
{value = interpret ast_j f, prio = meta_j.priority}
|
||||
where
|
||||
i,j s.t meta_i.priority <= meta_j.priority
|
||||
```
|
||||
|
||||
We can finally define the evaluation of merge:
|
||||
|
||||
```
|
||||
eval [| e1 & e2 |] =
|
||||
let t = mergeTree [| e1 & e2 |] in
|
||||
interpret t (mergeFunction t)
|
||||
```
|
||||
|
||||
Please keep in mind that `mergeTree (e1 & e2)` refers to the **original** merge
|
||||
tree of this expression. It needs to be accessible even after evaluation by the
|
||||
implementation.
|
||||
|
||||
In practice, we can't update a thunk that contains `e1 & e2` with the result of
|
||||
the evaluation. This is already the case with `default` values currenlty
|
||||
(`default (1 + 1)` isn't updated to `2`, but to `default 2`, otherwise the
|
||||
semantics would change). A good view on this is that the semantics is inherently
|
||||
call-by-name, and that any caching mechanism (including call-by-need) is a
|
||||
practical semantics-preserving optimization.
|
83
rfcs/002-merge-types-terms-syntax.md
Normal file
83
rfcs/002-merge-types-terms-syntax.md
Normal file
@ -0,0 +1,83 @@
|
||||
---
|
||||
feature: merge types and terms syntax
|
||||
start-date: 2022-01-12
|
||||
author: Yann Hamdaoui
|
||||
---
|
||||
|
||||
# Merge types and terms syntax
|
||||
|
||||
The present RFC aims to make the experience of writing Nickel -- and in
|
||||
particular writing contracts -- more streamlined by using a common syntax for
|
||||
terms and types.
|
||||
|
||||
In a standard statically typed language, terms and types are usually two
|
||||
distinct syntactic categories. Most of the constructs on one side indeed don't
|
||||
make sense on the other side: what could mean the type `1 + 1`, or the term
|
||||
`forall a. a -> a`? Separating the two syntactically also allows for a better
|
||||
mental separation for the user as well. Finally, separating the two gives more
|
||||
room in the grammar of the language to reuse syntax in a different way. For
|
||||
example in Nickel we are using the record syntax for both record literals et
|
||||
record types, and we are using the pipe operator `|` both for metavalues (terms)
|
||||
and row tails (types).
|
||||
|
||||
However, Nickel is different from the standard typed functional language.
|
||||
Because of the duo static typing/contracts, we do have to:
|
||||
|
||||
- Give a static meaning to terms. This is the `#` operator, lifting a contracts
|
||||
(a term) to a type. Currently, this type is quite rigid: this is an opaque
|
||||
type.
|
||||
- Give a dynamic meaning to types. Because each type annotation gives rise
|
||||
to a contract, we have to derive a contract from a type. There is currently no
|
||||
first-class operator to do that in Nickel, but the interpreter does it
|
||||
internally in the `open_contract()` function. Throughout this document, we
|
||||
will use the `§` operator to denote this operation, pretending it exists in
|
||||
the language.
|
||||
|
||||
We could precisely give a meaning to a type `1 + 1` (`#(1 + 1)`) and to a term
|
||||
`forall a. a -> a` (`§(forall a. a -> a)`). Why we propose to do so is explained
|
||||
in the next section.
|
||||
|
||||
## Motivation
|
||||
|
||||
The original motivation is that, when writing contracts, we may want to mix both
|
||||
user-defined contracts as well as contracts derived from builtin types. A very
|
||||
simple example is the following nullable contract, that takes another contract
|
||||
as a parameter:
|
||||
|
||||
```nickel
|
||||
let Nullable = fun contract label value =>
|
||||
if value == null then null
|
||||
else contracts.apply contract label value
|
||||
in
|
||||
...
|
||||
```
|
||||
|
||||
Now, say we want to encode a nullable list:
|
||||
|
||||
```nickel
|
||||
{
|
||||
list_maybe | #(Nullable §(List Num))
|
||||
}
|
||||
```
|
||||
|
||||
This is already not looking great, but bearable. However, mixing types and
|
||||
contract even further is not improbable. For example, if we want to express that
|
||||
a field is either `null` or a list of ports:
|
||||
|
||||
```nickel
|
||||
{
|
||||
list_ports_maybe | #(Nullabe §(List #Port))
|
||||
}
|
||||
```
|
||||
|
||||
This starts to be really verbose and hard to parse. It's not only aesthetic:
|
||||
when peer-programming Nickel with other Nickel beginners, they happened to be
|
||||
confused as why sometimes we need a hash and sometimes not, why we used `List`
|
||||
and sometimes `list` (the equivalent of `§` list in this document).
|
||||
|
||||
What's more, the type application syntax (`List Num`) actually coincides to the
|
||||
standard application within contracts. That is, `§(List Num) ~ §List §Num`. It
|
||||
would thus make sense to define type application for user-defined contracts by
|
||||
`#Foo Type ~ #(Foo §Type)`, and to be able to write directly `#Nullable Num` or
|
||||
`#Nullable #Port` instead of `#(Nullable §Num)` and `#(Nullable Port)`
|
||||
respectively.
|
Loading…
Reference in New Issue
Block a user