mirror of
https://github.com/idris-lang/Idris2.git
synced 2024-12-27 13:43:28 +03:00
494 lines
20 KiB
ReStructuredText
494 lines
20 KiB
ReStructuredText
***********************
|
|
Implementation Overview
|
|
***********************
|
|
|
|
These are some unsorted notes on aspects of the implementation. Sketchy, and
|
|
not always completely up to date, but hopefully give some hints as to what's
|
|
going on and some ideas where to look in the code to see how certain features
|
|
work.
|
|
|
|
Introduction
|
|
------------
|
|
|
|
Core language TT (defined in ``Core.TT``), based on quantitative type theory
|
|
(see https://bentnib.org/quantitative-type-theory.html). Binders have
|
|
"multiplicities" which are either *0*, *1* or *unlimited*.
|
|
|
|
Terms are indexed over the names in scope so that we know terms are always well
|
|
scoped. Values (i.e. normal forms) are defined in ``Core.Value`` as ``NF``;
|
|
constructors do not evaluate their arguments until explicitly requested.
|
|
|
|
Elaborate to *TT* from a higher level language *TTImp* (defined in ``TTImp.TTImp``),
|
|
which is TT with implicit arguments, local function definitions, case blocks,
|
|
as patterns, qualified names with automatic type-directed disambiguation, and
|
|
proof search.
|
|
|
|
Elaboration relies on unification (in ``Core.Unify``), which allows postponing
|
|
of unification problems. Essentially works the same way as Agda as described
|
|
in Ulf Norell's thesis.
|
|
|
|
General idea is that high level languages will provide a translation to TT.
|
|
In the ``Idris/`` namespace we define the high level syntax for Idris, which
|
|
translates to TTImp by desugaring operators, do notation, etc.
|
|
|
|
There is a separate linearity check after elaboration, which updates types of
|
|
holes (and is aware of case blocks). This is implemented in
|
|
``Core.LinearCheck``. During this check, we also recalculate the multiplicities
|
|
in hole applications so that they are displayed appropriately (e.g. if a
|
|
linear variable is unused elsewhere, it will always appear with multiplicity
|
|
1 in holes).
|
|
|
|
|
|
Where to find things:
|
|
|
|
* ``Core/`` -- anything related to the core TT, typechecking and unification
|
|
* ``TTImp/`` -- anything related to the implicit TT and its elaboration
|
|
|
|
+ ``TTImp/Elab/`` -- Elaboration state and elaboration of terms
|
|
+ ``TTImp/Interactive/`` -- Interactive editing infrastructure
|
|
|
|
* ``Parser/`` -- various utilities for parsing and lexing TT and TTImp (and other things)
|
|
* ``Utils/`` -- some generally useful utilities
|
|
* ``Idris/`` -- anything relating to the high level language, translating to TTImp
|
|
|
|
+ ``Idris/Elab/`` -- High level construct elaboration machinery (e.g. interfaces)
|
|
|
|
* ``Compiler/`` -- back ends
|
|
|
|
The Core Type, and Ref
|
|
----------------------
|
|
|
|
``Core`` is a "monad" (not really, for efficiency reasons, at the moment...)
|
|
supporting ``Error``'s and ``IO`` (I did originally plan to allow restricting this to
|
|
some specific IO operations, but haven't yet). The raw syntax is defined by a
|
|
type ``RawImp`` which has a source location at each node, and any errors in
|
|
elaboration note the location at the point where the error occurred, as
|
|
a file context ``FC``.
|
|
|
|
``Ref`` is essentially an ``IORef``. Typically we pass them implicitly and use
|
|
labels to disambiguate which one we mean. See ``Core.Core`` for their
|
|
definition. Again, ``IORef`` is for efficiency - even if it would be neater to
|
|
use a state monad this turned out to be about 2-3 times faster, so I'm
|
|
going with the "ugly" choice...
|
|
|
|
Term representation
|
|
-------------------
|
|
|
|
Terms in the core language are indexed by a list of the names in scope,
|
|
most recently defined first:
|
|
|
|
.. code-block:: idris
|
|
|
|
data Term : List Name -> Type
|
|
|
|
This means that terms are always well scoped, and we can use the type system
|
|
to keep us right when manipulating names. For example, we have:
|
|
|
|
.. code-block:: idris
|
|
|
|
Local : FC -> (isLet : Maybe Bool) ->
|
|
(idx : Nat) -> (0 p : IsVar name idx vars) -> Term vars
|
|
|
|
So local variables are represented by an index into the local context (a de
|
|
Bruijn index, ``idx``), and a proof, erased at run time, that the index
|
|
is valid. So everything is de Bruijn indexed, but the type checker still
|
|
keeps track of the indices so that we don't have to think too hard!
|
|
|
|
``Core.TT`` contains various handy tools for manipulating terms with their
|
|
indices, such as:
|
|
|
|
.. code-block:: idris
|
|
|
|
weaken : Term vars -> Term (n :: vars) -- actually in an interface, Weaken
|
|
embed : Term vars -> Term (ns ++ vars)
|
|
refToLocal : (x : Name) -> -- explicit name of a reference
|
|
(new : Name) -> -- name to bind as
|
|
Term vars -> Term (new :: vars)
|
|
|
|
Note that the types are explicit about when the ``vars`` needs to be passed at
|
|
run time, and when it isn't. Mostly where it's needed it's to help with
|
|
displaying names, or name generation, rather than any fundamental reason in
|
|
the core. In general, this isn't expensive at run time.
|
|
|
|
Environments, defined in ``Core.Env``, map local variables to binders:
|
|
|
|
.. code-block:: idris
|
|
|
|
data Env : (tm : List Name -> Type) -> List Name -> Type
|
|
|
|
A binders is typically a *lambda*, a *pi*, or a *let* (with a value), but can
|
|
also be a *pattern variable*. See the definition of ``TT`` for more details.
|
|
Where we have a term, we usually also need an ``Env``.
|
|
|
|
We also have values, which are in head normal form, and defined in
|
|
``Core.Value``:
|
|
|
|
.. code-block:: idris
|
|
|
|
data NF : List Name -> Type
|
|
|
|
We can convert a term to a value by normalising...
|
|
|
|
.. code-block:: idris
|
|
|
|
nf : {vars : _} ->
|
|
Defs -> Env Term vars -> Term vars -> Core (NF vars)
|
|
|
|
...and back again, by quoting:
|
|
|
|
.. code-block:: idris
|
|
|
|
quote : {vars : _} ->
|
|
Defs -> Env Term vars -> tm vars -> Core (Term vars)
|
|
|
|
Both ``nf`` and ``quote`` are defined in ``Core.Normalise``. We don't
|
|
always know whether we'll need to work with ``NF`` or ``Term``, so
|
|
we also have a "glued" representation, ``Glued vars``, again defined in
|
|
``Core.Normalise``, which lazily computes either a ``NF`` or ``Term`` as
|
|
required. Elaborating a term returns the type as a ``Glued vars``.
|
|
|
|
``Term`` separates ``Ref`` (global user defined names) from ``Meta``, which
|
|
are globally defined metavariables. For efficiency, metavariables are only
|
|
substituted into terms if they have non-0 multiplicity, to preserve sharing as
|
|
much as possible.
|
|
|
|
Unification
|
|
-----------
|
|
Unification is probably the most important part of the elaboration process,
|
|
and infers values for implicit arguments. That is, it finds values for the
|
|
things which are referred to by ``Meta`` in ``Term``. It is defined in
|
|
``Core.Unify``, as the top level unification function has the following
|
|
type:
|
|
|
|
.. code-block:: idris
|
|
|
|
unify : Unify tm =>
|
|
{vars : _} ->
|
|
{auto c : Ref Ctxt Defs} ->
|
|
{auto u : Ref UST UState} ->
|
|
UnifyInfo ->
|
|
FC -> Env Term vars ->
|
|
tm vars -> tm vars ->
|
|
Core UnifyResult
|
|
|
|
The ``Unify`` interface is there because it is convenient to be able to
|
|
define unification on ``Term`` and ``NF``, as well as ``Closure`` (which
|
|
is part of ``NF`` to represent unevaluated arguments to constructors).
|
|
|
|
This is one place where indexing over ``vars`` is extremely valuable: we
|
|
have to keep the environment consistent, so unification won't accidentally
|
|
introduce any scoping bugs!
|
|
|
|
Idris 2 implements pattern unification - see Adam Gundry's thesis for an
|
|
accessible introduction.
|
|
|
|
Context
|
|
-------
|
|
|
|
``Core.Context`` defines all the things needed for TT. Most importantly: ``Def``
|
|
gives definitions of names (case trees, builtins, constructors and
|
|
holes, mostly); ``GlobalDef`` is a definition with all the other information
|
|
about it (type, visibility, totality, etc); ``Context`` is a context mapping names
|
|
to ``GlobalDef``, and ``Defs`` is the core data structure with everything needed to
|
|
typecheck more definitions.
|
|
|
|
The main Context type stores definitions in an array, indexed by a "resolved
|
|
name id", an integer, for fast look up. This means that it also needs to be
|
|
able to convert between resolved names and full names. The ``HasNames``
|
|
interface defines methods for going back and forth between structures with
|
|
human readable names, and structures with resolved integer names.
|
|
|
|
Since we store names in an array, all the lookup functions need to be in the
|
|
``Core`` monad. This also turns out to help with loading checked files (see
|
|
below).
|
|
|
|
Elaboration Overview
|
|
--------------------
|
|
|
|
Elaboration of ``RawImp`` to ``TT`` is driven by ``TTImp.Elab``, with the
|
|
top level function for elaborating terms defined in ``TTImp.Elab.Term``,
|
|
support functions defined in ``TTImp.Elab.Check``, and elaborators for the
|
|
various TTImp constructs defined in separate files under ``TTImp.Elab.*``.
|
|
|
|
Laziness
|
|
--------
|
|
|
|
Like Idris 1, laziness is marked in types using ``Lazy``, ``Delay`` and ``Force``, or
|
|
``Inf`` (instead of ``Lazy``) for codata. Unlike Idris 1, these are language primitives
|
|
rather than special purpose names.
|
|
|
|
Implicit laziness resolution is handled during unification (in ``Core.Unify``).
|
|
When unification is invoked (by ``convert`` in ``TTImp.Elab.Check``) with
|
|
the ``withLazy`` flag set, it checks whether it is converting a lazy type
|
|
with a non-lazy type. If so, it continues with unification, but returning
|
|
that either a ``Force`` or ``Delay`` needs inserting as appropriate.
|
|
|
|
TTC format
|
|
----------
|
|
|
|
We can save things to binary if we have an implementation of the TTC interface
|
|
for it. See ``Utils.Binary`` to see how this is done. It uses a global reference
|
|
``Ref Bin Binary`` which uses ``Data.Buffer`` underneath.
|
|
|
|
When we load checked TTC files, we don't process the definitions immediately,
|
|
but rather store them as a ``ContextEntry``, which is either a ``Binary`` blob, or
|
|
a processed definition. We only process the definitions the first time they
|
|
are looked up, since converting Binary to the definition is fairly costly
|
|
(due to having to construct a lot of AST nodes), and often definitions in an
|
|
imported file are never used.
|
|
|
|
Bound Implicits
|
|
---------------
|
|
|
|
The ``RawImp`` type has a constructor ``IBindVar``. The first time we encounter an
|
|
``IBindVar``, we record the name as one which will be implicitly bound. At the
|
|
end of elaboration, we decide which holes should turn into bound variables
|
|
(Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by
|
|
looking at the list of names bound as ``IBindVar``, the things they depend on,
|
|
and sorting them so that they are bound in dependency order. This happens
|
|
in ``TTImp.Implicit.getToBind``.
|
|
|
|
Once we know what the bound implicits need to be, we bind them in
|
|
``bindImplicits``. Any application of a hole which stands for a bound implicit
|
|
gets turned into a local binding (either Pi or Pat as appropriate, or PLet for
|
|
@-patterns).
|
|
|
|
Unbound Implicits
|
|
-----------------
|
|
|
|
Any name beginning with a lower case letter is considered an unbound implicit.
|
|
They are elaborated as holes, which may depend on the initial environment of
|
|
the elaboration, and after elaboration they are converted to an implicit pi
|
|
binding, with multiplicity 0. So, for example:
|
|
|
|
.. code-block:: idris
|
|
|
|
map : {f : _} -> (a -> b) -> f a -> f b
|
|
|
|
becomes:
|
|
|
|
.. code-block:: idris
|
|
|
|
map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b
|
|
|
|
Bindings are ordered according to dependency. It'll infer any additional
|
|
names, e.g. in:
|
|
|
|
.. code-block:: idris
|
|
|
|
lookup : HasType i xs t -> Env xs -> t
|
|
|
|
... where ``xs`` is a ``Vect n a``, it infers bindings for ``n`` and ``a``.
|
|
|
|
The ``%unbound_implicits`` directive means that it will no longer automatically
|
|
bind names (that is, ``a`` and ``b`` in ``map`` above) but it will still
|
|
infer the types for any additional names, e.g. if you write:
|
|
|
|
.. code-block:: idris
|
|
|
|
lookup : forall i, x, t . HasType i xs t -> Env xs -> t
|
|
|
|
... it will still infer a type for ``xs`` and infer bindings for ``n`` and
|
|
``a``.
|
|
|
|
Implicit arguments
|
|
------------------
|
|
|
|
When we encounter an implicit argument (``_`` in the raw syntax, or added when
|
|
we elaborate an application and see that there is an implicit needed) we
|
|
make a new hole which is a fresh name applied to the current environment,
|
|
and return that as the elaborated term. This happens in ``TTImp.Elab.Check``,
|
|
with the function ``metaVar``. If there's enough information elsewhere we'll
|
|
find the definition of the hole by unification.
|
|
|
|
We never substitute holes in a term during elaboration and rely on
|
|
normalisation if we need to look inside it. If there are holes remaining after
|
|
elaboration of a definition, report an error (it's okay for a hole in a type
|
|
as long as it's resolved by the time the definition is done).
|
|
|
|
See ``Elab.App.makeImplicit``, ``Elab.App.makeAutoImplicit`` to see where we
|
|
add holes for the implicit arguments in applications.
|
|
|
|
``Elab.App`` does quite a lot of tricky stuff! In an attempt to help with
|
|
resolving ambiguous names and record updates, it will sometimes delay
|
|
elaboration of an argument (see ``App.checkRestApp``) so that it can get more
|
|
information about its type first.
|
|
|
|
``Core.Unify.solveConstraints`` revisits all of the currently unsolved holes
|
|
and constrained definitions, and tries again to unify any constraints which
|
|
they require. It also tries to resolve anything defined by proof search.
|
|
The current state of unification is defined in ``Core.UnifyState``, and
|
|
unification constraints record which metavariables are blocking them. This
|
|
improves performance, since we'll only retry a constraint if one of the
|
|
blocking metavariables has been resolved.
|
|
|
|
Additional type inference
|
|
-------------------------
|
|
|
|
A ``?`` in a type means "infer this part of the type". This is distinct from
|
|
``_`` in types, which means "I don't care what this is". The distinction is in
|
|
what happens when inference fails. If inference fails for ``_``, we implicitly
|
|
bind a new name (just like pattern matching on the lhs - i.e. it means match
|
|
anything). If inference fails for ``?``, we leave it as a hole and try to fill
|
|
it in later. As a result, we can say:
|
|
|
|
.. code-block:: idris
|
|
|
|
foo : Vect ? Int
|
|
foo = [1,2,3,4]
|
|
|
|
... and the ``?`` will be inferred to be 4. But if we say:
|
|
|
|
.. code-block:: idris
|
|
|
|
foo : Vect _ Int
|
|
foo = [1,2,3,4]
|
|
|
|
... we'll get an error, because the ``_`` has been bound as a new name.
|
|
Both ``?`` and ``_`` are represented in ``RawImp`` by the ``Implicit``
|
|
constructor, which has a boolean flag meaning "bind if unresolved".
|
|
|
|
So the meaning of ``_`` is now consistent on the lhs and in types (i.e. it
|
|
means infer a value and bind a variable on failure to infer anything). In
|
|
practice, using ``_`` will get you the old Idris behaviour, but ``?`` might
|
|
get you a bit more type inference.
|
|
|
|
Auto Implicits
|
|
--------------
|
|
|
|
Auto implicits are resolved by proof search, and can be given explicit
|
|
arguments in the same way as ordinary implicits: i.e. ``{x = exp}`` to give
|
|
``exp`` as the value for auto implicit ``x``. Interfaces are syntactic sugar for
|
|
auto implicits (it is the same resolution mechanism - interfaces translate into
|
|
records, and implementations translate into hints for the search).
|
|
|
|
The argument syntax ``@{exp}`` means that the value of the next auto implicit
|
|
in the application should be ``exp`` - this is the same as the syntax for
|
|
invoking named implementations in Idris 1, but interfaces and auto implicits
|
|
have been combined now.
|
|
|
|
Implicit search is defined in ``Core.AutoSearch``. It will only begin a
|
|
search if all the *determining arguments* of the goal are defined, meaning
|
|
that they don't contain *any* holes. This avoids committing too early to
|
|
the solution of a hole by resolving it by search, rather than unification,
|
|
unless a programmer has explicitly said (via a ``search`` option on a data
|
|
type) that that's what they want.
|
|
|
|
Dot Patterns
|
|
------------
|
|
|
|
``IMustUnify`` is a constructor of ``RawImp``. When we elaborate this, we generate a
|
|
hole, then elaborate the term, and add a constraint that the generated hole
|
|
must unify with the term which was explicitly given (in ``UnifyState.addDot``),
|
|
without resolving any holes. This is finally checked in ``UnifyState.checkDots``.
|
|
|
|
Proof Search
|
|
------------
|
|
|
|
A definition constructed with ``Core.Context.BySearch`` is a hole which will
|
|
be resolved by searching for something which fits the type. This happens in
|
|
``Core.AutoSearch``. It checks all possible hints for a term, to ensure that
|
|
only one is possible.
|
|
|
|
@-Patterns
|
|
----------
|
|
|
|
Names which are bound in types are also bound as @-patterns, meaning that
|
|
functions have access to them. For example, we can say:
|
|
|
|
.. code-block:: idris
|
|
|
|
vlength : {n : Nat} -> Vect n a -> Nat
|
|
vlength [] = n
|
|
vlength (x :: xs) = n
|
|
|
|
As patterns are implemented as a constructor of ``TT``, which makes a lot
|
|
of things more convenient (especially case tree compilation).
|
|
|
|
Linear Types
|
|
------------
|
|
|
|
Following Conor McBride and Bob Atkey's work, all binders have a multiplicity
|
|
annotation (``RigCount``). After elaboration in ``TTImp.Elab``, we do a
|
|
separate linearity check which: a) makes sure that linear variables are used
|
|
exactly once; b) updates hole types to properly reflect usage information.
|
|
|
|
Local definitions
|
|
-----------------
|
|
|
|
We elaborate relative to an environment, meaning that we can elaborate local
|
|
function definitions. We keep track of the names being defined in a nested
|
|
block of declarations, and ensure that they are lifted to top level definitions
|
|
in TT by applying them to every name in scope.
|
|
|
|
Since we don't know how many times a local definition will be applied, in
|
|
general, anything bound with multiplicity 1 is passed to the local definition
|
|
with multiplicity 0, so if you want to use it in a local definition, you need
|
|
to pass it explicitly.
|
|
|
|
Case blocks
|
|
-----------
|
|
|
|
Similar to local definitions, these are lifted to top level definitions which
|
|
represent the case block, which is immediately applied to the scrutinee of
|
|
the case. We don't attempt to calculate the multiplicities of arguments when
|
|
elaborating the case block, since we'll probably get it wrong - instead, these
|
|
are checked during linearity checking, which knows about case functions.
|
|
|
|
Case blocks in the scope of local definitions are tricky, because the names
|
|
need to match up, and the types might be refined, but we also still need to
|
|
apply the local names to the scope in which they were defined. This is a bit
|
|
fiddly, and dealt with by the ``ICaseLocal`` constructor of ``RawImp``.
|
|
|
|
Various parts of the system treat case blocks specially, even though they
|
|
aren't strictly part of the core. In particular, these are linearity checking
|
|
and totality checking.
|
|
|
|
Parameters
|
|
----------
|
|
|
|
The parameters to a data type are taken to be the arguments which appear,
|
|
unchanged, in the same position, everywhere across a data definition.
|
|
|
|
Erasure
|
|
-------
|
|
|
|
Unbound implicits are given ``0`` multiplicity, so the rule is now that if you
|
|
don't explicitly write it in the type of a function or constructor, the
|
|
argument is erased at run time.
|
|
|
|
Elaboration and the case tree compiler check ensure that 0-multiplicity
|
|
arguments are not inspected in case trees. In the compiler, 0-multiplicity
|
|
arguments to constructors are erased completely, whereas 0-multiplicity
|
|
arguments to functions are replaced with a placeholder erased value.
|
|
|
|
Namespaces and name visibility
|
|
------------------------------
|
|
|
|
Same rules mostly apply as in Idris 1. The difference is that visibility is
|
|
*per namespace* not *per file* (that is, files have no relevance other except
|
|
in that they introduce their own namespace, and in that they allow separate
|
|
typechecking).
|
|
|
|
One effect of this is that when a file defines nested namespaces, the inner
|
|
namespace can see what's in the outer namespace, but not vice versa unless
|
|
names defined in the inner namespace are explicitly exported. The visibility
|
|
modifiers ``export``, ``public export``, and ``private`` control whether the name
|
|
can be seen in any other namespace, and it's nothing to do with the file
|
|
they're defined in at all.
|
|
|
|
Unlike Idris 1, there is no restriction on whether public definitions can
|
|
refer to private names. The only restriction on ``private`` names is that
|
|
they can't be referred to directly (i.e. in code) outside the namespace.
|
|
|
|
Records
|
|
-------
|
|
|
|
Records are part of TTImp (rather than the surface language). Elaborating a
|
|
record declaration creates a data type and associated projection functions.
|
|
Record setters are generated on demand while elaborating TTImp (in
|
|
``TTImp.Elab.Record``). Setters are translated directly to ``case`` blocks,
|
|
which means that update of dependent fields works as one might expect (i.e.
|
|
it's safe as long as all of the fields are updated at the same time
|
|
consistently).
|