mirror of
https://github.com/idris-lang/Idris2.git
synced 2024-12-26 21:23:53 +03:00
Merge pull request #451 from edwinb/impl-docs
Add some implementation notes to the docs
This commit is contained in:
commit
606059e3d9
23
docs/source/implementation/index.rst
Normal file
23
docs/source/implementation/index.rst
Normal file
@ -0,0 +1,23 @@
|
||||
.. _sect-implnotes:
|
||||
|
||||
********************
|
||||
Implementation Notes
|
||||
********************
|
||||
|
||||
.. note::
|
||||
|
||||
The documentation for Idris has been published under the Creative
|
||||
Commons CC0 License. As such to the extent possible under law, *The
|
||||
Idris Community* has waived all copyright and related or neighboring
|
||||
rights to Documentation for Idris.
|
||||
|
||||
More information concerning the CC0 can be found online at: http://creativecommons.org/publicdomain/zero/1.0/
|
||||
|
||||
This section contains (or will contain, hopefully) a vareity of notes on
|
||||
aspects of the implementation of Idris 2, in the hope that they help with
|
||||
debugging and future contributions.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
overview
|
473
docs/source/implementation/overview.rst
Normal file
473
docs/source/implementation/overview.rst
Normal file
@ -0,0 +1,473 @@
|
||||
***********************
|
||||
Implementation Overview
|
||||
***********************
|
||||
|
||||
These are some unsorted notes on aspects of the implementation. Sketchy, and
|
||||
not always completely up to date, but hopefully give some hints as to what's
|
||||
going on and some ideas where to look in the code to see how certain features
|
||||
work.
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Core language TT (defined in Core.TT), based on quantitative type theory
|
||||
(see https://bentnib.org/quantitative-type-theory.html). Binders have
|
||||
"multiplicities" which are either 0, 1 or unlimited.
|
||||
|
||||
Terms are indexed over the names in scope so that we know terms are always well
|
||||
scoped. Values (i.e. normal forms) are defined in Core.Value as NF;
|
||||
constructors do not evaluate their arguments until explicitly requested.
|
||||
|
||||
Elaborate to TT from a higher level language TTImp (defined in ``TTImp.TTImp``),
|
||||
which is TT with implicit arguments, local function definitions, case blocks,
|
||||
as patterns, qualified names with automatic type-directed disambiguation, and
|
||||
proof search.
|
||||
|
||||
Elaboration relies on unification (in ``Core.Unify``), which allows postponing
|
||||
of unification problems. Essentially works the same way as Agda as described
|
||||
in Ulf Norell's thesis.
|
||||
|
||||
General idea is that high level languages will provide a translation to TT.
|
||||
In the ``Idris/`` namespace we define the high level syntax for Idris, which
|
||||
translates to TTImp by desugaring operators, do notation, etc.
|
||||
|
||||
There is a separate linearity check after elaboration, which updates types of
|
||||
holes (and is aware of case blocks). This is implemented in
|
||||
``Core.LinearCheck``. During this check, we also recalculate the multiplicities
|
||||
in hole applications so that they are displayed appropriately (e.g. if a
|
||||
linear variable is unused elsewhere, it will always appear with multiplicity
|
||||
1 in holes).
|
||||
|
||||
|
||||
Where to find things:
|
||||
|
||||
* ``Core/`` -- anything related to the core TT, typechecking and unification
|
||||
* ``TTImp/`` -- anything related to the implicit TT and its elaboration
|
||||
|
||||
+ ``TTImp/Elab/`` -- Elaboration state and elaboration of terms
|
||||
+ ``TTImp/Interactive/`` -- Interactive editing infrastructure
|
||||
|
||||
* ``Parser/`` -- various utilities for parsing and lexing TT and TTImp (and other things)
|
||||
* ``Utils/`` -- some generally useful utilities
|
||||
* ``Idris/`` -- anything relating to the high level language, translating to TTImp
|
||||
|
||||
+ ``Idris/Elab/`` -- High level construct elaboration machinery (e.g. interfaces)
|
||||
|
||||
* ``Compiler/`` -- back ends
|
||||
|
||||
The Core Type, and Ref
|
||||
----------------------
|
||||
Core is a "monad" (not really, for efficiency reasons, at the moment...)
|
||||
supporting Errors and IO (I did originally plan to allow restricting this to
|
||||
some specific IO operations, but haven't yet). The raw syntax is defined by a
|
||||
type ``RawImp`` which has a source location at each node, and any errors in
|
||||
elaboration note the location at the point where the error occurred, as
|
||||
a file context ``FC``.
|
||||
|
||||
``Ref`` is essentially an ``IORef``. Typically we pass them implicitly and use
|
||||
labels to disambiguate which one we mean. See Core.Core for their
|
||||
definition. Again, ``IORef`` is for efficiency - even if it would be neater to
|
||||
use a state monad this turned out to be about 2-3 times faster, so I'm
|
||||
going with the "ugly" choice...
|
||||
|
||||
Term representation
|
||||
-------------------
|
||||
|
||||
Terms in the core language are indexed by a list of the names in scope,
|
||||
most recently defined first:
|
||||
|
||||
::
|
||||
|
||||
data Term : List Name -> Type
|
||||
|
||||
This means that terms are always well scoped, and we can use the type system
|
||||
to keep us right when manipulating names. For example, we have:
|
||||
|
||||
::
|
||||
|
||||
Local : FC -> (isLet : Maybe Bool) ->
|
||||
(idx : Nat) -> (0 p : IsVar name idx vars) -> Term vars
|
||||
|
||||
So local variables are represented by an index into the local context (a de
|
||||
Bruijn index, ``idx``), and a proof, erased at run time, that the index
|
||||
is valid. So everything is de Bruijn indexed, but the type checker still
|
||||
keeps track of the indices so that we don't have to think too hard!
|
||||
|
||||
``Core.TT`` contains various handy tools for manipulating terms with their
|
||||
indices, such as:
|
||||
|
||||
::
|
||||
|
||||
weaken : Term vars -> Term (n :: vars) -- actually in an interface, Weaken
|
||||
embed : Term vars -> Term (ns ++ vars)
|
||||
refToLocal : (x : Name) -> -- explicit name of a reference
|
||||
(new : Name) -> -- name to bind as
|
||||
Term vars -> Term (new :: vars)
|
||||
|
||||
Note that the types are explicit about when the ``vars`` needs to be passed at
|
||||
run time, and when it isn't. Mostly where it's needed it's to help with
|
||||
displaying names, or name generation, rather than any fundamental reason in
|
||||
the core. In general, this isn't expensive at run time.
|
||||
|
||||
Environments, defined in ``Core.Env``, map local variables to binders:
|
||||
|
||||
::
|
||||
|
||||
data Env : (tm : List Name -> Type) -> List Name -> Type
|
||||
|
||||
A binders is typically a lambda, a pi, or a let (with a value), but can
|
||||
also be a pattern variable. See the definition of ``TT`` for more details.
|
||||
Where we have a term, we usually also need an ``Env``.
|
||||
|
||||
We also have values, which are in head normal form, and defined in
|
||||
``Core.Value``:
|
||||
|
||||
::
|
||||
|
||||
data NF : List Name -> Type
|
||||
|
||||
We can convert a term to a value by normalising...
|
||||
|
||||
::
|
||||
|
||||
nf : {vars : _} ->
|
||||
Defs -> Env Term vars -> Term vars -> Core (NF vars)
|
||||
|
||||
...and back again, by quoting:
|
||||
|
||||
::
|
||||
|
||||
quote : {vars : _} ->
|
||||
Defs -> Env Term vars -> tm vars -> Core (Term vars)
|
||||
|
||||
Both ``nf`` and ``quote`` are defined in ``Core.Normalise``. We don't
|
||||
always know whether we'll need to work with ``NF`` or ``Term``, so
|
||||
we also have a "glued" representation, ``Glued vars``, again defined in
|
||||
``Core.Normalise``, which lazily computes either a ``NF`` or ``Term`` as
|
||||
required. Elaborating a term returns the type as a ``Glued vars``.
|
||||
|
||||
``Term`` separates ``Ref`` (global user defined names) from ``Meta``, which
|
||||
are globally defined metavariables. For efficiency, metavariables are only
|
||||
substituted into terms if they have non-0 multiplicity, to preserve sharing as
|
||||
much as possible.
|
||||
|
||||
Unification
|
||||
-----------
|
||||
Unification is probably the most important part of the elaboration process,
|
||||
and infers values for implicit arguments. That is, it finds values for the
|
||||
things which are referred to by ``Meta`` in ``Term``. It is defined in
|
||||
``Core.Unify``, as the top level unification function has the following
|
||||
type:
|
||||
|
||||
::
|
||||
|
||||
unify : Unify tm =>
|
||||
{vars : _} ->
|
||||
{auto c : Ref Ctxt Defs} ->
|
||||
{auto u : Ref UST UState} ->
|
||||
UnifyInfo ->
|
||||
FC -> Env Term vars ->
|
||||
tm vars -> tm vars ->
|
||||
Core UnifyResult
|
||||
|
||||
The ``Unify`` interface is there because it is convenient to be able to
|
||||
define unification on ``Term`` and ``NF``, as well as ``Closure`` (which
|
||||
is part of ``NF`` to represent unevaluated arguments to constructors).
|
||||
|
||||
This is one place where indexing over ``vars`` is extremely valuable: we
|
||||
have to keep the environment consistent, so unification won't accidentally
|
||||
introduce any scoping bugs!
|
||||
|
||||
Idris 2 implements pattern unification - see Adam Gundry's thesis for an
|
||||
accessible introduction.
|
||||
|
||||
Context
|
||||
-------
|
||||
``Core.Context`` defines all the things needed for TT. Most importantly: Def
|
||||
gives definitions of names (case trees, builtins, constructors and
|
||||
holes, mostly); ``GlobalDef`` is a definition with all the other information
|
||||
about it (type, visibility, totality, etc); ``Context`` is a context mapping names
|
||||
to ``GlobalDef``, and ``Defs`` is the core data structure with everything needed to
|
||||
typecheck more definitions.
|
||||
|
||||
The main Context type stores definitions in an array, indexed by a "resolved
|
||||
name id", an integer, for fast look up. This means that it also needs to be
|
||||
able to convert between resolved names and full names. The ``HasNames``
|
||||
interface defines methods for going back and forth between structures with
|
||||
human readable names, and structures with resolved integer names.
|
||||
|
||||
Since we store names in an array, all the lookup functions need to be in the
|
||||
``Core`` monad. This also turns out to help with loading checked files (see
|
||||
below).
|
||||
|
||||
Elaboration Overview
|
||||
--------------------
|
||||
Elaboration of ``RawImp`` to ``TT`` is driven by ``TTImp.Elab``, with the
|
||||
top level function for elaborating terms defined in ``TTImp.Elab.Term``,
|
||||
support functions defined in ``TTImp.Elab.Check``, and elaborators for the
|
||||
various TTImp constructs defined in separate files under ``TTImp.Elab.*``.
|
||||
|
||||
Laziness
|
||||
--------
|
||||
Like Idris 1, laziness is marked in types using ``Lazy``, ``Delay`` and ``Force``, or
|
||||
``Inf`` (instead of ``Lazy``) for codata. Unlike Idris 1, these are language primitives
|
||||
rather than special purpose names.
|
||||
|
||||
Implicit laziness resolution is handled during unification (in Core.Unify).
|
||||
When unification is invoked (by ``convert`` in ``TTImp.Elab.Check``) with
|
||||
the ``withLazy`` flag set, it checks whether it is converting a lazy type
|
||||
with a non-lazy type. If so, it continues with unification, but returning
|
||||
that either a ``Force`` or ``Delay`` needs inserting as appropriate.
|
||||
|
||||
TTC format
|
||||
----------
|
||||
We can save things to binary if we have an implementation of the TTC interface
|
||||
for it. See ``Utils.Binary`` to see how this is done. It uses a global reference
|
||||
``Ref Bin Binary`` which uses ``Data.Buffer`` underneath.
|
||||
|
||||
When we load checked TTC files, we don't process the definitions immediately,
|
||||
but rather store them as a ``ContextEntry``, which is either a Binary blob, or
|
||||
a processed definition. We only process the definitions the first time they
|
||||
are looked up, since converting Binary to the definition is fairly costly
|
||||
(due to having to construct a lot of AST nodes), and often definitions in an
|
||||
imported file are never used.
|
||||
|
||||
Bound Implicits
|
||||
---------------
|
||||
The ``RawImp`` type has a constructor ``IBindVar``. The first time we encounter an
|
||||
``IBindVar``, we record the name as one which will be implicitly bound. At the
|
||||
end of elaboration, we decide which holes should turn into bound variables
|
||||
(Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by
|
||||
looking at the list of names bound as ``IBindVar``, the things they depend on,
|
||||
and sorting them so that they are bound in dependency order. This happens
|
||||
in ``TTImp.Implicit.getToBind``.
|
||||
|
||||
Once we know what the bound implicits need to be, we bind them in
|
||||
'bindImplicits'. Any application of a hole which stands for a bound implicit
|
||||
gets turned into a local binding (either Pi or Pat as appropriate, or PLet for
|
||||
@-patterns).
|
||||
|
||||
Unbound Implicits
|
||||
-----------------
|
||||
Any name beginning with a lower case letter is considered an unbound implicit.
|
||||
They are elaborated as holes, which may depend on the initial environment of
|
||||
the elaboration, and after elaboration they are converted to an implicit pi
|
||||
binding, with multiplicity 0. So, for example:
|
||||
|
||||
::
|
||||
|
||||
map : {f : _} -> (a -> b) -> f a -> f b
|
||||
|
||||
becomes:
|
||||
|
||||
::
|
||||
|
||||
map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b
|
||||
|
||||
Bindings are ordered according to dependency. It'll infer any additional
|
||||
names, e.g. in:
|
||||
|
||||
::
|
||||
|
||||
lookup : HasType i xs t -> Env xs -> t
|
||||
|
||||
...where ``xs`` is a ``Vect n a``, it infers bindings for ``n`` and ``a``.
|
||||
|
||||
The ``%auto_implicits`` directive means that it will no longer automatically
|
||||
bind names (that is, ``a`` and ``b`` in ``map`` above) but it will still
|
||||
infer the types for any additional names, e.g. if you write:
|
||||
|
||||
::
|
||||
|
||||
lookup : forall i, x, t . HasType i xs t -> Env xs -> t
|
||||
|
||||
...it will still infer a type for ``xs`` and infer bindings for ``n`` and
|
||||
``a``.
|
||||
|
||||
Implicit arguments
|
||||
------------------
|
||||
When we encounter an implicit argument (``_`` in the raw syntax, or added when
|
||||
we elaborate an application and see that there is an implicit needed) we
|
||||
make a new hole which is a fresh name applied to the current environment,
|
||||
and return that as the elaborated term. This happens in ``TTImp.Elab.Check``,
|
||||
with the function ``metaVar``. If there's enough information elsewhere we'll
|
||||
find the definition of the hole by unification.
|
||||
|
||||
We never substitute holes in a term during elaboration and rely on
|
||||
normalisation if we need to look inside it. If there are holes remaining after
|
||||
elaboration of a definition, report an error (it's okay for a hole in a type
|
||||
as long as it's resolved by the time the definition is done).
|
||||
|
||||
See ``Elab.App.makeImplicit``, ``Elab.App.makeAutoImplicit`` to see where we
|
||||
add holes for the implicit arguments in applications.
|
||||
|
||||
``Elab.App`` does quite a lot of tricky stuff! In an attempt to help with
|
||||
resolving ambiguous names and record updates, it will sometimes delay
|
||||
elaboration of an argument (see ``App.checkRestApp``) so that it can get more
|
||||
information about its type first.
|
||||
|
||||
``Core.Unify.solveConstraints`` revisits all of the currently unsolved holes
|
||||
and constrained definitions, and tries again to unify any constraints which
|
||||
they require. It also tries to resolve anything defined by proof search.
|
||||
The current state of unification is defined in ``Core.UnifyState``, and
|
||||
unification constraints record which metavariables are blocking them. This
|
||||
improves performance, since we'll only retry a constraint if one of the
|
||||
blocking metavariables has been resolved.
|
||||
|
||||
Additional type inference
|
||||
-------------------------
|
||||
A ``?`` in a type means "infer this part of the type". This is distinct from
|
||||
``_`` in types, which means "I don't care what this is". The distinction is in
|
||||
what happens when inference fails. If inference fails for ``_``, we implicitly
|
||||
bind a new name (just like pattern matching on the lhs - i.e. it means match
|
||||
anything). If inference fails for ``?``, we leave it as a hole and try to fill
|
||||
it in later. As a result, we can say:
|
||||
|
||||
::
|
||||
|
||||
foo : Vect Int ?
|
||||
foo = [1,2,3,4]
|
||||
|
||||
...and the ``?`` will be inferred to be 4. But if we say:
|
||||
|
||||
::
|
||||
|
||||
foo : Vect Int _
|
||||
foo = [1,2,3,4]
|
||||
|
||||
...we'll get an error, because the ``_`` has been bound as a new name.
|
||||
Both ``?`` and ``_`` are represented in ``RawImp`` by the ``Implicit``
|
||||
constructor, which has a boolean flag meaning "bind if unresolved".
|
||||
|
||||
So the meaning of ``_`` is now consistent on the lhs and in types (i.e. it
|
||||
means infer a value and bind a variable on failure to infer anything). In
|
||||
practice, using ``_`` will get you the old Idris behaviour, but ``?`` might
|
||||
get you a bit more type inference.
|
||||
|
||||
Auto Implicits
|
||||
--------------
|
||||
Auto implicits are resolved by proof search, and can be given explicit
|
||||
arguments in the same way as ordinary implicits: i.e. ``{x = exp}`` to give
|
||||
``exp`` as the value for auto implicit ``x``. Interfaces are syntactic sugar for
|
||||
auto implicits (it is the same resolution mechanism - interfaces translate into
|
||||
records, and implementations translate into hints for the search).
|
||||
|
||||
The argument syntax ``@{exp}`` means that the value of the next auto implicit
|
||||
in the application should be ``exp`` - this is the same as the syntax for
|
||||
invoking named implementations in Idris 1, but interfaces and auto implicits
|
||||
have been combined now.
|
||||
|
||||
Implicit search is defined in ``Core.AutoSearch``. It will only begin a
|
||||
search if all the *determining arguments* of the goal are defined, meaning
|
||||
that they don't contain *any* holes. This avoids committing too early to
|
||||
the solution of a hole by resolving it by search, rather than unification,
|
||||
unless a programmer has explicitly said (via a ``search`` option on a data
|
||||
type) that that's what they want.
|
||||
|
||||
Dot Patterns
|
||||
------------
|
||||
``IMustUnify`` is a constructor of RawImp. When we elaborate this, we generate a
|
||||
hole, then elaborate the term, and add a constraint that the generated hole
|
||||
must unify with the term which was explicitly given (in ``UnifyState.addDot``),
|
||||
without resolving any holes. This is finally checked in ``UnifyState.checkDots``.
|
||||
|
||||
Proof Search
|
||||
------------
|
||||
A definition constructed with ``Core.Context.BySearch`` is a hole which will
|
||||
be resolved by searching for something which fits the type. This happens in
|
||||
``Core.AutoSearch``. It checks all possible hints for a term, to ensure that
|
||||
only one is possible.
|
||||
|
||||
@-Patterns
|
||||
----------
|
||||
Names which are bound in types are also bound as @-patterns, meaning that
|
||||
functions have access to them. For example, we can say:
|
||||
|
||||
::
|
||||
|
||||
vlength : Vect n a -> Nat
|
||||
vlength [] = n
|
||||
vlength (x :: xs) = n
|
||||
|
||||
As patterns are implemented as a constructor of ``TT``, which makes a lot
|
||||
of things more convenient (especially case tree compilation).
|
||||
|
||||
Linear Types
|
||||
------------
|
||||
Following Conor McBride and Bob Atkey's work, all binders have a multiplicity
|
||||
annotation (``RigCount``). After elaboration in ``TTImp.Elab``, we do a
|
||||
separate linearity check which: a) makes sure that linear variables are used
|
||||
exactly once; b) updates hole types to properly reflect usage information.
|
||||
|
||||
Local definitions
|
||||
-----------------
|
||||
We elaborate relative to an environment, meaning that we can elaborate local
|
||||
function definitions. We keep track of the names being defined in a nested
|
||||
block of declarations, and ensure that they are lifted to top level definitions
|
||||
in TT by applying them to every name in scope.
|
||||
|
||||
Since we don't know how many times a local definition will be applied, in
|
||||
general, anything bound with multiplicity 1 is passed to the local definition
|
||||
with multiplicity 0, so if you want to use it in a local definition, you need
|
||||
to pass it explicitly.
|
||||
|
||||
Case blocks
|
||||
-----------
|
||||
Similar to local definitions, these are lifted to top level definitions which
|
||||
represent the case block, which is immediately applied to the scrutinee of
|
||||
the case. We don't attempt to calculate the multiplicities of arguments when
|
||||
elaborating the case block, since we'll probably get it wrong - instead, these
|
||||
are checked during linearity checking, which knows about case functions.
|
||||
|
||||
Case blocks in the scope of local definitions are tricky, because the names
|
||||
need to match up, and the types might be refined, but we also still need to
|
||||
apply the local names to the scope in which they were defined. This is a bit
|
||||
fiddly, and dealt with by the ``ICaseLocal`` constructor of ``RawImp``.
|
||||
|
||||
Various parts of the system treat case blocks specially, even though they
|
||||
aren't strictly part of the core. In particular, these are linearity checking
|
||||
and totality checking.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
The parameters to a data type are taken to be the arguments which appear,
|
||||
unchanged, in the same position, everywhere across a data definition.
|
||||
|
||||
Erasure
|
||||
-------
|
||||
Unbound implicits are given ``0`` multiplicity, so the rule is now that if you
|
||||
don't explicitly write it in the type of a function or constructor, the
|
||||
argument is erased at run time.
|
||||
|
||||
Elaboration and the case tree compiler check ensure that 0-multiplicity
|
||||
arguments are not inspected in case trees. In the compiler, 0-multiplicity
|
||||
arguments to constructors are erased completely, whereas 0-multiplicity
|
||||
arguments to functions are replaced with a placeholder erased value.
|
||||
|
||||
Namespaces and name visibility
|
||||
------------------------------
|
||||
Same rules mostly apply as in Idris 1. The difference is that visibility is
|
||||
*per namespace* not *per file* (that is, files have no relevance other except
|
||||
in that they introduce their own namespace, and in that they allow separate
|
||||
typechecking).
|
||||
|
||||
One effect of this is that when a file defines nested namespaces, the inner
|
||||
namespace can see what's in the outer namespace, but not vice versa unless
|
||||
names defined in the inner namespace are explicitly exported. The visibility
|
||||
modifiers "export", "public export", and "private" control whether the name
|
||||
can be seen in any other namespace, and it's nothing to do with the file
|
||||
they're defined in at all.
|
||||
|
||||
Unlike Idris 1, there is no restriction on whether public definitions can
|
||||
refer to private names. The only restriction on ``private`` names is that
|
||||
they can't be referred to directly (i.e. in code) outside the namespace.
|
||||
|
||||
Records
|
||||
-------
|
||||
Records are part of TTImp (rather than the surface language). Elaborating a
|
||||
record declaration creates a data type and associated projection functions.
|
||||
Record setters are generated on demand while elaborating TTImp (in
|
||||
``TTImp.Elab.Record``). Setters are translated directly to ``case`` blocks,
|
||||
which means that update of dependent fields works as one might expect (i.e.
|
||||
it's safe as long as all of the fields are updated at the same time
|
||||
consistently).
|
@ -26,4 +26,5 @@ Documentation for the Idris 2 Language
|
||||
ffi/index
|
||||
proofs/index
|
||||
faq/faq
|
||||
implementation/index
|
||||
reference/index
|
||||
|
Loading…
Reference in New Issue
Block a user