1
1
mirror of https://github.com/tweag/nickel.git synced 2024-10-06 08:07:37 +03:00
nickel/rfcs/004-typechecking.md
Andreas Deininger 0f192ea0ae Fixes typos
2023-02-02 11:20:09 +01:00

18 KiB
Raw Blame History

feature start-date author
typechecking 2022-05-16 Yann Hamdaoui

Typechecking

The goals of this RFC are:

  1. Identify the main shortcomings of the current implementation of typechecking, in particular with respect to the interaction between typed and untyped code.
  2. Write a proper formalization of a type system and companion type inference algorithm which overcome those limitations as much as possible. If this proves too hard, a proper definition and formalization of the current type system would already be a positive outcome.

The motivation for the first step is self-explanatory: a more expressive type system directly translates to users being able to write more programs (or clearer versions of the same program, without superfluous annotations).

The second step is motivated by improving the experience of extending the type system in the future and maintaining the Nickel codebase. We already hit edge cases that led to unsatisfying ad-hoc fixes. The current implementation interleaves a number of different phases, which makes it harder to get into and to modify. Finally, a clean and well-designed specification often leads to a simpler implementation, by removing accumulations of ad-hoc treatments that become subsumed by a generic case.

Background

There is a substantial literature on type systems for programming languages, as well as implementations, some of them both cutting-edge and of industrial strength. For the purely static part, the ML languages family and their cousins (Haskell, OCaml, Scala, Purescript, etc.) has proved over time to be a solid and expressive foundation. The role of the static type system of Nickel is to be able to type various generic functions operating on primitive types, and doing so doesn't seem to require new developments or very fancy types. The current implementation, which supports polymorphism and row polymorphism, appears to already cover most cases.

However, the co-existence of statically typed code and dynamically typed code makes Nickel different from most of the aforementioned inspirations. Nickel is indeed a gradually typed language.

Note that there are different flavours of gradual typing. A cornerstone of gradual type systems derived from the original work of Siek and Taha1 is to statically accept potentially unsafe conversions from and to the dynamic type Dyn, and more complex types with Dyn inside like converting Dyn -> Num to Num -> Num. Such compatible types are said to be consistent with each others. These implicit conversions may or may not be guarded at runtime by a check (sound vs unsound gradual typing).

# If Nickel was gradually typed, this would be accepted
{
  add : Num -> Num -> Num = fun x y => x + y,
  mixed : Dyn -> Num -> Num = fun x y => add x (y + 1),
}

In Nickel, such implicit conversions are purposely not supported. Running this examples gives:

error: incompatible types
  ┌─ repl-input-0:3:46
  │
3 │   mixed : Dyn -> Num -> Num = fun x y => add x (y + 1),
  │                                              ^ this expression
  │
  = The type of the expression was expected to be `Num`
  = The type of the expression was inferred to be `Dyn`
  = These types are not compatible

A second — and directly related — peculiarity of Nickel are contract annotations. Gradually typed languages à la Siek and Taha usually use dynamic checks called casts to validate the implicit conversions at runtime, but those casts are rarely part of the source language and rather an implementation device. Contract annotations are in some way a first-class version of the implicit casts of gradual typing. Thus, it is possible to make the previous example work in Nickel by manually adding a contract annotation which indicates what type the user expects the expression to be:

{
  add : Num -> Num -> Num = fun x y => x + y,
  mixed : Dyn -> Num -> Num = fun x y => add (x | Num) (y + 1),
}

TypedRacket is an example of a gradually typed language that departs from the original design and which requires explicit contract annotations, as Nickel does.

Summary

The important dimensions of typing in Nickel are:

  • Statically typed code is checked using a standard ML-like/SystemF system with polymorphism and row polymorphism
  • Nickel is stricter than a vanilla gradual type system, in that it doesn't allow implicit casts from and to the dynamic type.
  • Nickel has contract annotations, which can be used to write type casts explicitly. They incur a runtime check.

Motivation

This section attempts to motivate this RFC practically: what are programs which are natural write and that we expect to be accepted by the typechecker, but aren't currently?

The dynamic type

The dynamic type acts as a top type statically (with a caveat though2), like Any or Object in some languages. Although casts from the dynamic type are unsafe in general, cast to the dynamic type are safe. But the typechecker currently isn't smart enough and requires explicit annotations:

# rejected because some_data has type `{script: Str, vars: Array Str}`, which is
# not equal to `Dyn`
{
  serialized : Array Str =
    let some_data = {script = "echo ${hello}", vars = ["hello"] } in
    let other_data = ["one", "two", "three"] in
    [
      builtin.serialize `Json some_data,
      builtin.serialize `Yaml other_data
    ]
}

To make it work, the user needs to explicitly add missing Dyn contract annotations:

{
  serialized : Array Str =
    let some_data = {script = "echo ${hello}", vars = ["hello"] } in
    let other_data = ["one", "two", "three"] in
    [
      builtin.serialize `Json (some_data | Dyn),
      builtin.serialize `Yaml (other_data | Dyn)
    ]
}

Dyn versus forall

Why do we need Dyn inside typed code at all? Untyped values are given the type Dyn, but this is a different matter. This RFC is concerned with functions consuming values of type Dyn. In statically typed code, we can already express operating over generic values using polymorphism, as in the definition of head:

head : forall a. Array a -> a

However, well-typed polymorphic functions are parametric. Concretely, they can't inspect their polymorphic arguments. The contract system also enforces this invariant dynamically. For example, the following function is rejected by the contract system:

let fake_id
  | forall a. a -> a
  = fun x => if builtin.is_num x then x + 1 else x in
fake_id 10

Remark: currently (as of version 0.2.1) this just returns false, but this is a bug, see #727.

Any typed function that needs to inspect its argument needs to use Dyn, or at least a less general type than a. A typical example is array.elem, which is of type Dyn -> Array Dyn -> Bool, because it compares elements. This is a source of Dyn inside typed functions:

nickel>
let has_one : Bool =
  let l = [1, 2, 4, 5] in
  array.elem 1 l

error: incompatible types
  ┌─ repl-input-7:3:14
  │
3 │   array.elem 1 l
  │              ^ this expression
  │
  = The type of the expression was expected to be `Dyn`
  = The type of the expression was inferred to be `Num`
  = These types are not compatible

Indeed, we need to insert explicit casts:

nickel>
let has_one : Bool =
  let l = [1, 2, 4, 5] in
  array.elem (1 | Dyn) (l | Array Dyn)
nickel> has_one
true

The following are two real life examples taken from the Nixpkgs list library, converted to Nickel and statically typed. They both use array.elem:

subtractLists: forall a. Array a -> Array a -> Array a
  | doc m%"
      Subtracts list 'e' from another list. O(nm) complexity.
     Example:
       subtractLists [ 3, 2 ] [ 1, 2, 3, 4, 5, 3 ]
       >> [ 1, 4, 5 ]
  "%
  = fun e => array.filter (fun x => !(array.elem (x | Dyn) (e | Array Dyn))),

mutuallyExclusive: forall a. Array a -> Array a -> Bool
  | doc m%"
      Test if two lists have no common element.
     It should be slightly more efficient than (intersectLists a b == [])
  "%
  = fun a b =>
    array.length a == 0
    || !(any (fun x => array.elem (x | Dyn) (a | Array Dyn)) b),

Flow-sensitive typing

Another possible future use-case is flow-sensitive typing, as in e.g. TypeScript. If we are to implement such a feature in Nickel, taking an argument of type Dyn and then type-casing (using builtin.typeof or builtin.is_xxx) would make sense inside statically typed code, naturally producing functions that consume values of type Dyn. Here is another example from the Nixpkgs list library which could use flow-sensitive typing:

flatten: Dyn -> Array Dyn
  | doc m%"
      Flatten the argument into a single list, that is, nested lists are
     spliced into the top-level lists.
     Example:
       flatten [1, [2, [3], 4], 5]
       >> [1, 2, 3, 4, 5]
       flatten 1
       >> [1]
  "%
  = fun x =>
     if builtin.is_array x then
       concatMap (fun y => flatten y) (x | Array Dyn)
     else
       [x],

Dictionaries

In essence, an upcast can be seen as forgetting part of the type information. The most drastic loss of information is casting to Dyn: we pretty much forget everything about the original type of the value.

There are other useful loss of information. One concerns dictionary types {_: T}. In practice, records serve several purposes in Nickel:

  • A key-value mapping with a statically known structure. This is usually the case for configurations: one knows in advance what key will appear in the final value and for each key, the type of allowed values.
  • A key-value mapping with a dynamic structure. Keys are added dynamically and depend on runtime values.

The first case is best modeled using record types. For example:

{
  name = "virtal",
  version = "1.0.1",
  enabled = true,
} : {
  name : Str,
  version : Str,
  enabled : Bool,
}

But for records which fields are not statically known, record types are too rigid. We use dictionary types instead:

(let data = {ten = 10} in
data
|> record.insert "one" 1
|> record.insert "two" 2
|> record.insert "three" 3) : {_: Num}

Unfortunately, this example doesn't work as it is:

error: incompatible types
  ┌─ repl-input-6:2:1
  │
2 │ data
  │ ^^^^ this expression
  │
  = The type of the expression was expected to be `{_: Num}`
  = The type of the expression was inferred to be `{ten: Num}`
  = These types are not compatible

Record literals are always inferred to have a precise record type (here, {ten : Num}) which is thus different from the expected dictionary type {_ : Num}. A special casing in the typechecker still makes this example pass with an additional annotation:

(let data : {_ : Num} = {ten = 10} in
data
|> record.insert "one" 1
|> record.insert "two" 2
|> record.insert "three" 3) : {_: Num}

Alas, this special case is fragile: it only works if we annotate directly the record literal. If we use a variable as a proxy, inference is once again broken:

nickel> (let x = {ten = 10} in
let data : {_ : Num} = x in
data
|> record.insert "one" 1
|> record.insert "two" 2
|> record.insert "three" 3) : {_: Num}

error: incompatible types
  ┌─ repl-input-8:2:24
  │
2 │ let data : {_ : Num} = x in
  │                        ^ this expression
  │
  = The type of the expression was expected to be `{_: Num}`
  = The type of the expression was inferred to be `{ten: Num}`
  = These types are not compatible

Proposal

Given the precedent sections, we propose to add a subtyping relation generated by T <: Dyn for all type T that is not a type variable, and {l1 : T, .., ln : T} <: {_: T} for the dictionaries use-case. Data structure are covariant, and the arrow has the usual co/contra-variance.

Dyn inference

As long as there is no Dyn involved by the type given by either the signature of a function or the annotation of an argument, we want subtyping to not change the current behavior. For the typechecker, this means following this guideline:

Never infer a Dyn type implicitly. Dyn, and dictionary subtyping, must always stem from an explicit annotation.

For example, the following program is expected to fail typechecking:

let uni_pair : forall a. a -> a -> {fst: a, snd: a} = fun x y =>
  { fst = x, snd = y } in

uni_pair 1 "a" : _

We could infer the type Dyn for the instantiation of a. But this is currently an error, and it should arguably stay this way. Dyn can make a lot of such cases pass, only to manifest as a later type error and probably a disconcerting one involving a Dyn coming from nowhere when the result of the function is actually used. However, if one of the argument is explicitly of type Dyn, then it is reasonable to allow the other one to be silently upcast:

let foo : Dyn = 1 in
uni_pair foo 1 : _

Unfortunately, we don't have yet a good characterization of this as a property of the declarative type system or declarative type inference. It is an informal guideline, which is only reflected in the algorithmic type inference for now.

Polymorphism and subtyping

The previous example shows an interesting aspect of combining subtyping and polymorphism: if we instantiate type variables at the first type we see, as we do currently, typechecking is sensitive to the order of arguments:

# ok, a is instantiated to Dyn, and then 1 : Dyn
uni_pair foo 1 : _
# would fail, a is instantiated to Num, but foo is not of type Num
uni_pair 1 foo : _

This problem is dubbed the Tabby-first problem in the review of bidirectional typing of Dunfield and Krishnaswami. Surprisingly, it doesn't seem to have been deeply studied (outside of the subtyping relation induced by polymorphism). For subtyping induced by polymorphism, there are different trade-offs: Dunfield and Krishnaswami restricts their system to predicative polymorphism, which is then order-independent 3. In exchange, the arrow type has the expected variance (that is, eta-expansion preserves typeability). On the other hand, Haskell deemed impredicative polymorphism to be more important and chose to abandon the variance of arrows and other type constructors. Coupled with a so-called QuickLook phase, Haskell is able to guess most common impredicative instantiations.

We can't apply the Haskell approach naively, because in the presence of Dyn, an impredicative instantiation is not the only possibility anymore, even if we don't do deep instantiation or deep skolemization:

let foo : Dyn = null in
let ids : Array (forall a. a -> a) = [fun x => x] in
let concat : forall b. Array b -> Array b -> Array b = (@) in

concat ids [null]

In the QuickLook setting, when we see the argument ids, we know that b must be instantiated with forall a. a -> a. This is not true here, as b could also be Dyn (and in this case, given the second argument, it should!). We can't delay solving constraints involving polymorphic function types as we could do with monomorphic types (in that case, just generating ?a >: forall a. a -> a at the first argument without solving right away): when applying a function with type ?a, we need to know if ?a must be instantiated with an arrow or if it has potential heading foralls which need to be instantiated.

We may extend the QuickLook phase to also cover Dyn, but this needs more investigation. Another fix could be to make all type constructors invariant also for the subtyping relation generated by Dyn, but that is quite restrictive in practice (several examples in this document would need Array T <: Array Dyn in order to get rid of the casts).

Packing up

To sum up:

  • We only want to infer Dyn because of an explicit type annotation.
  • Polymorphism and subtyping interact in a non-trivial way. It is not obvious how to apply QuickLook in the presence of our proposed subtyping relation.

Given that impredicativity looks complex to combine with subtyping, and isn't really a strong motivation for a language like Nickel, a reasonable choice is to rather follow the path of Dunfield and Krishnaswami. However, the recent uproar upon the removing of the subsumption rule for arrows in Haskell4 shows that such things (variance of arrows with respect to polymorphism-induced subtyping) are way easier and safer to add than to remove, from a user experience perspective. In consequence, we propose to take a more restrictive approach first, which is still open to either pursing one road or the other if there is such a need later.

The final proposal is a system that:

  • Has subtyping generated by the dynamic type and dictionaries.
  • Support higher-rank polymorphism
  • Is predicative
  • Doesn't do deep skolemization/instantiation, or equivalently, in which type constructors are invariant with respect to subtyping induced by polymorphism

A declarative specification as well as a constraint-solving algorithm are provided in this repository. See specs/type-system

Alternatives

No subtyping

An obvious alternative is to abandon the idea of subtyping. From there we can either choose to follow the QuickLook approach, the Dunfield and Krishnaswami's one, or the restricted one described above (but without subtyping, of course). We will have to put up with adding contract annotations for Dyn and dictionaries.

Dictionary-only subtyping

Another middle-ground could be to only keep the dictionary part of the subtyping relation. Because it is then guarded by the {_ : _} type constructor, it doesn't interact with polymorphism (in that a polymorphic type can never be a subtype of anything). We lose the upcasts to Dyn but we still improve on the current situation for dictionaries.


  1. Gradual Typing for Functional Languages ↩︎

  2. Dyn is not greater than type variables (also referred as to existentials, skolem variables or rigid type variables), because that would break parametricity. ↩︎

  3. Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism ↩︎

  4. This reddit discussion gather some reactions of Haskell users to the removal of arrow subsumption ↩︎