Add some notes

Mostly copied from Blodwen and brought up to date (more or less).
This commit is contained in:
Edwin Brady 2019-06-24 13:05:19 +01:00
parent af9dc69c61
commit 18f269bbef
6 changed files with 576 additions and 0 deletions

43
INSTALL.md Normal file
View File

@ -0,0 +1,43 @@
# Installation
Idris 2 is built using Idris, and provides support for three default code
generation targets: Chez, Chicken, and Racket. It currently requires the
(not yet released) latest Idris master.
## Idris
There are several sets of instructions on how to install Idris from source and
binary.
+ [Official ipkg installer for macOS](http://www.idris-lang.org/pkgs/idris-current.pkg)
+ [Source build instructions from the Idris GitHub Wiki](https://github.com/idris-lang/Idris-dev/wiki/Installation-Instructions)
+ [Binary installation using Cabal from the Idris Manual](https://idris.readthedocs.io/en/latest/tutorial/starting.html)
## Code Generation Targets
Only one of these is absolutely necessary, and you can type check code even
with none of them installed. The default code generator targets Chez Scheme.
### Chez
Chez Scheme is available to build from source:
+ https://cisco.github.io/ChezScheme/
Many popular package managers will provide a binary distribution for Chez.
### Chicken
Chicken scheme offers binary distributions (and source tar balls) from
+ https://code.call-cc.org/
You can find chicken in many a package manager.
After installing chicken scheme you may need to install the 'numbers' package.
### Racket
Racket is available from:
+ https://download.racket-lang.org/

101
Notes/Directives.md Normal file
View File

@ -0,0 +1,101 @@
# Compiler Directives
There are a number of directives (instructions to the compiler, beginning with
a `%` symbol) which are not part of the Idris 2 language, but nevertheless
provide useful information to the compiler. Mostly these will be useful for
library (especially Prelude) authors. They are ordered here approximately
by "likelihood of being useful to most programmers".
## Language directives
### %name
Syntax: `%name <name> <name_list>`
For interactive editing purposes, use `<name_list>` as the preferred names
for variables with type `<name>`.
### %hide
Syntax: `%hide <name>`
Throughout the rest of the file, consider the `<name>` to be private to its
own namespace.
### %auto_lazy
Syntax: `%auto_lazy <on | off>`
Turn the automatic insertion of laziness annotations on or off. Default is
`on` as long as `%lazy` names have been defined (see below).
### %runElab
Syntax: `%runElab <expr>`
NOT YET IMPLEMENTED
Run the elaborator reflection expression `<expr>`, which must be of type
`Elab a`. Note that this is only minimally implemented at the moment.
### %pair
Syntax: `%pair <pair_type> <fst_name> <snd_name>`
Use the given names in `auto` implicit search for projecting from pair types.
### %rewrite
Syntax: `%rewrite <rewrite_name>`
Use the given name as the default rewriting function in the `rewrite` syntax.
### %integerLit
Syntax: `%integerLit <fromInteger_name>`
Apply the given function to all integer literals when elaborating.
The default Prelude sets this to `fromInteger`.
### %stringLit
Syntax: `%stringLit <fromString_name>`
Apply the given function to all string literals when elaborating.
The default Prelude does not set this.
### %charLit
Syntax: `%charLit <fromChar_name>`
Apply the given function to all character literals when elaborating.
The default Prelude does not set this.
### %allow_overloads
Syntax: `%allow_overloads <name>`
This is primarily for compatibility with the Idris 1 notion of name overloading
which allows in particular `(>>=)` and `fromInteger` to be overloaded in an
ad-hoc fashion as well as via the `Monad` and `Num` interfaces respectively.
It effect is: If `<name>` is one of the possibilities in an ambiguous
application, and one of the other possibilities is immediately resolvable by
return type, remove `<name>` from the list of possibilities.
If this sounds quite hacky, it's because it is. It's probably better not to
use it other than for the specific cases where we need it for compatibility.
It might be removed, if we can find a better way to resolve ambiguities with
`(>>=)` and `fromInteger` in particular!
## Implementation/debugging directives
### %logging
Syntax: `%logging <level>`
Set the logging level. In general `1` tells you which top level declaration
the elaborator is working on, up to `5` gives you more specific details of
each stage of the elaboration, and up to `10` gives you full details of
type checking including progress on unification problems

43
Notes/IDE-mode.md Normal file
View File

@ -0,0 +1,43 @@
IDE protocol, version 2
=======================
The IDE protocol is (or rather, will be) compatible with the IDE protocol for
Idris 1, as described here:
http://docs.idris-lang.org/en/latest/reference/ide-protocol.html
On start up, it reports:
`000018(:protocol-version 2 0)`
So far, there are two extended commands and one new command.
Extended commands
-----------------
`(:type-of STRING LINE COLUMN)`
If `type-of` is given a line and a column, this looks up the type of the name
at that location, which may be a local variable or a specialisation of a
global definition.
`(:proof-search LINE NAME HINTS :all)`
The optional additional argument `:all` means that the expression search will
return all of the results it finds, one per line (up to a currently hard coded
search depth limit, which will probably be settable as an option in the
future).
`(:case-split LINE COLUMN NAME)`
Case splitting can take an optional additional `COLUMN` argument.
New commands
------------
`(:generate-def LINE NAME)`
Generates a pattern matching definition, if it can find one, for the function
declared with the given name, on the given line. It will only return the
first definition it finds, as a list of pattern clauses. This works via a
combination of case splitting and expression search.

13
Notes/Makefile Normal file
View File

@ -0,0 +1,13 @@
# Makefile to render documents
CC=pandoc
DOCS=implementation-notes.md Directives.md IDE-mode.md
PDFS=$(patsubst %.md,%.pdf,$(DOCS))
%.pdf: %.md
$(CC) $< -o $@
all: $(PDFS)
clean:
rm $(PDFS)

View File

@ -0,0 +1,288 @@
Some unsorted notes on aspects of the implementation. Sketchy, and not always
completely up to date, but hopefully give some hints as to what's going on and
some ideas where to look in the code to see how certain features work.
Overview
--------
Core language TT (defined in Core.TT), based on quantitative type theory
(see https://bentnib.org/quantitative-type-theory.html). Binders have
"multiplicities" which are either 0, 1 or unlimited.
Terms are indexed over the names in scope so that we know terms are always well
scoped. Values (i.e. normal forms) are defined in Core.Value as NF;
constructors do not evaluate their arguments until explicitly requested.
Elaborate to TT from a higher level language TTImp (defined in TTImp.TTImp),
which is TT with implicit arguments, local function definitions, case blocks,
as patterns, qualified names with automatic type-directed disambiguation, and
proof search.
Elaboration relies on unification (in Core.Unify), which allows postponing
of unification problems. Essentially works the same way as Agda as described
in Ulf Norell's thesis.
General idea is that high level languages will provide a translation to TT.
In the Idris/ namespace we define the high level syntax for Idris, which
translates to TTImp by desugaring operators, do notation, etc.
TT separates 'Ref' (global user defined names) from 'Meta', which are globally
defined metavariables. For efficiency, metavariables are only substituted into
terms if they have non-0 multiplicity, to preserve sharing as much as possible
There is a separate linearity check after elaboration, which updates types of
holes (and is aware of case blocks).
Where to find things:
* Core/ -- anything related to the core TT, typechecking and unification
* TTImp/ -- anything related to the implicit TT and its elaboration
* TTImp/Elab/ -- Elaboration state and elaboration of terms
* TTImp/Interactive/ -- Interactive editing infrastructure
* Parser/ -- various utilities for parsing and lexing TT and TTImp (and other things)
* Utils/ -- some generally useful utilities
* Idris/ -- anything relating to the high level language, translating to TTImp
* Idris/Elab/ -- High level construct elaboration machinery (e.g. interfaces)
The Core Type, and Ref
----------------------
Core is a "monad" (not really, for efficiency reasons, at the moment...)
supporting Errors and IO [TODO: Allow restricting to specific IO operations]
The raw syntax is defined by a type RawImp which has a source location at each
node, and any errors in elaboration note the location at the point where the
error occurred.
'Ref' is essentially an IORef. Typically we pass them implicitly and use
labels to disambiguate which one we mean. See Core.Core for their
definition. Again, IORef is for efficiency - even if it would be neater to
use a state monad this turned out to be about 2-3 times faster, so I'm
going with the "ugly" choice...
Context
-------
Core.Context defines all the things needed for TT. Most importantly: Def
gives definitions of names (case trees, builtins, constructors and
holes, mostly); GlobalDef is a definition with all the other information
about it (type, visibility, totality, etc); Gamma is a context mapping names
to GlobalDef, and 'Defs' is the core data structure with everything needed to
typecheck more definitions.
The main Context type stores definitions in an array, indexed by a "resolved
name id" for fast look up. This means that it also needs to be able to convert
between resolved names and full names.
Since we store names in an array, all the lookup functions need to be in the
Core monad. This also turns out to help with loading checked files (see below).
Laziness
--------
Like Idris 1, laziness is marked in types using Lazy, Delay and Force, or
Inf (instead of Lazy) for codata. Unlike Idris 1, these are language primitives
rather than special purpose names.
TTC format
----------
We can save things to binary if we have an implementation of the TTC interface
for it. See Utils.Binary to see how this is done. It uses a global reference
'Ref Bin Binary' which uses Data.Buffer underneath.
When we load checked TTC files, we don't process the definitions immediately,
but rather store them as a 'ContextEntry', which is either a Binary blob, or
a processed definition. We only process the definitions the first time they
are looked up, since converting Binary to the definition is fairly costly,
and often definitions in an imported file are never used.
Bound Implicits
---------------
The RawImp type has a constructor IBindVar. The first time we encounter an
IBindVar, we record the name as one which will be implicitly bound. At the
end of elaboration, we decide which holes should turn into bound variables
(Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by
looking at the list of names bound as IBindVar, the things they depend on,
and sorting them so that they are bound in dependency order. This happens
in State.getToBind.
Once we know what the bound implicits need to be, we bind them in
'bindImplicits'. Any application of a hole which stands for a bound implicit
gets turned into a local binding (either Pi or Pat as appropriate, or PLet for
@-patterns).
Unbound Implicits
-----------------
Any name beginning with a lower case letter is considered an unbound implicit.
They are elaborated as holes, which may depend on the initial environment of
the elaboration, and after elaboration they are converted to an implicit pi
binding, with multiplicity 0. So, for example:
```idris
map : {f : _} -> (a -> b) -> f a -> f b
```
becomes
```idris
map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b
```
Bindings are ordered according to dependency. It'll infer any additional
names, e.g. in
```idris
lookup : HasType i xs t -> Env xs -> t
```
...where 'xs' is a Vect n a, it infers bindings for n and a.
(TODO: %auto_implicits directive)
Implicit arguments
------------------
When we encounter an implicit argument ('\_' in the raw syntax, or added when
we elaborate an application and see that there is an implicit needed) we
make a new hole which is a fresh name applied to the current environment,
and return that as the elaborated term. If there's enough information elsewhere
we'll find the definition of the hole by unification.
We never substitute holes in a term during elaboration and rely on normalisation
if we need to look inside it. If there are holes remaining after elaboration of a definition, report an
error (it's okay for a hole in a type as long as it's resolved by the time the
definition is done).
See Elab.App.makeImplicit Elab.App.makeAutoImplicit to see where we add holes
for the implicit arguments in applications.
Elab.App does quite a lot of tricky stuff! In an attempt to help with resolving
ambiguous names and record updates, it will sometimes delay elaboration of an
argument (see App.checkRestApp) so that it can get more information about its
type first.
Core.Unify.solveConstraints revisits all of the currently unsolved holes and
constrained definitions, and tries again to unify any constraints which they
require. It also tries to resolve anything defined by proof search.
Additional type inference
-------------------------
A '?' in a type means "infer this part of the type". This is distinct from "\_"
in types, which means "I don't care what this is". The distinction is in what
happens when inference fails. If inference fails for "\_", we implicitly bind a
new name (just like pattern matching on the lhs - i.e. it means match
anything). If inference fails for "?", we leave it as a hole and try to fill it
in later. As a result, we can say:
```idris
foo : Vect Int ?
foo = [1,2,3,4]
```
...and the ? will be inferred to be 4. But if we say
```idris
foo : Vect Int _
foo = [1,2,3,4]
```
...we'll get an error, because the '\_' has been bound as a new name.
So the meaning of "\_" is now consistent on the lhs and in types (i.e. it
means infer a value and bind a variable on failure to infer anything). In
practice, using "\_" will get you the old Idris behaviour, but "?" might get
you a bit more type inference.
Auto Implicits
--------------
Auto implicits are resolved by proof search, and can be given explicit
arguments in the same way as ordinary implicits: i.e. {x = exp} to give
'exp' as the value for auto implicit 'x'. Interfaces are syntactic sugar for
auto implicits (it uses the resolution mechanism - interfaces translate into
records, and implementations translate into hints for the search).
The argument syntax `@{exp}` means that the value of the next auto implicit in
the application should be 'exp' - this is the same as the syntax for invoking
named implementations in Idris 1, but interfaces and auto implicits have been
combined now.
Dot Patterns
------------
IMustUnify is a constructor of RawImp. When we elaborate this, we generate a
hole, then elaborate the term, and add a constraint that the generated hole
must unify with the term which was explicitly given (in UnifyState.addDot),
finally checked in 'UnifyState.checkDots'
Proof Search
------------
A definition with the body 'BySearch' is a hole which will be resolved
by searching for something which fits the type. This happens in
Core.AutoSearch. It checks all possible hints for a term, to ensure that only
one is possible.
@-Patterns
----------
Names which are bound in types are also bound as @-patterns, meaning that
functions have access to them. For example, we can say:
```idris
vlength : Vect n a -> Nat
vlength [] = n
vlength (x :: xs) = n
```
Linear Types
------------
Following Conor McBride and Bob Atkey's work, all binders have a multiplicity
annotation ("RigCount"). After elaboration in TTImp.Elab, we do a separate
linearity check which: a) makes sure that linear variables are used exactly
once; b) updates hole types to properly reflect usage information.
Local definitions
-----------------
We elaborate relative to an environment, meaning that we can elaborate local
function definitions. We keep track of the names being defined in a nested
block of declarations, and ensure that they are lifted to top level definitions
in TT by applying them to every name in scope.
Since we don't know how many times a local definition will be applied, in general,
anything bound with multiplicity 1 is passed to the local definition with
multiplicity 0, so if you want to use it in a local definition, you need to
pass it explicitly.
Case blocks
-----------
Similar to local definitions, these are lifted to top level definitions which
represent the case block, which is immediately applied to the scrutinee of
the case. The function which defines the block takes as arguments: the entire
current environment (so that it can use any name in scope); any names in
the environment which the scrutinee's type depends on (to support dependent
case, but not counting parameters which are unchanged across the structure).
Parameters
----------
The parameters to a data type are taken to be the arguments which appear,
unchanged, in the same position, everywhere across a data definition.
Erasure
-------
Unbound implicits are given '0' multiplicity, so the rule is now that if you
don't explicitly write it in the type of a function or constructor, the
argument is erased at run time.
Elaboration and the case tree compiler check ensure that 0-multiplicity
arguments are not inspected in case trees.
Namespaces and name visibility
------------------------------
Same rules mostly apply as in Idris 1. The difference is that visibility is
*per namespace* not *per file* (that is, files have no relevance other except
in that they introduce their own namespace, and in that they allow separate
typechecking).
One effect of this is that when a file defines nested namespaces, the inner
namespace can see what's in the outer namespace, but not vice versa unless
names defined in the inner namespace are explicitly exported. The visibility
modifiers "export", "public export", and "private" control whether the name
can be seen in any other namespace, and it's nothing to do with the file
they're defined in at all.
Records
-------
Records are part of TTImp (rather than the surface language). Elaborating a
record declaration creates a data type and associated projection functions.
Record setters are generated on demand while elaborating TTImp (in
TTImp.Elab.Record). Setters are translated directly to 'case' blocks, which
means that update of dependent fields works as one might expect (i.e. it's safe
as long as all of the fields are updated at the same time consistently).
In TTImp, unlike in Idris 1, records are not implicitly put into their own
namespace, but higher level languages (e.g. Idris itself) can do so explicitly
themselves.

88
README.md Normal file
View File

@ -0,0 +1,88 @@
Idris 2
=======
This is a pre-alpha implementation of Idris 2, the successor to Idris.
Idris 2 is mostly backwards compatible with Idris 1, with some minor
exceptions. The most notable user visible differences, which might cause Idris
1 programs to fail to type check, are:
+ Unbound implicit arguments are always erased, so it is a type error to
attempt to pattern match on one.
+ Simplified resolution of ambiguous names.
+ Minor differences in the meaning of export modifiers `private`, `export`,
and `public export`, which now refer to visibility of names from other
*namespaces* rather than visibility from other *files*.
+ Module names must match the filename in which they are defined (unless
the module's name is "Main").
+ Anything which uses a `%language` pragma in Idris 1 is likely to be different.
Notably, elaborator reflection will exist, but most likely in a slightly
different form because the internal details of the elaborator are different.
+ The `Prelude` is much smaller (and easier to replace with an alternative).
Watch this space for more details and the rationale for the changes, as I
get around to writing it...
Summary of new features:
+ A core language based on "Quantitative Type Theory" which allows explicit
annotation of erased types, and linear types.
+ `let` bindings are now more expressive, and can be used to define pattern
matching functions locally.
+ Names which are in scope in a type are also always in scope in the body of
the corresponding definition.
+ Better inference. Holes are global to a source file, rather than local to
a definition, meaning that some holes can be left in function types to be
inferred by the type checker. This also gives better inference for the types
of `case` expressions, and means fewer annotations are needed in interface
declarations.
+ Better type checker implementation which minimises the need for compile
time evaluation.
+ New Chez Scheme based back end which both compiles and runs faster than the
default Idris 1 back end. (Also, optionally, Chicken Scheme and Racket can
be used as targets).
+ Everything works faster :).
A significant change in the implementation is that there is an intermediate
language `TTImp`, which is essentially a desugared Idris, and is cleanly
separated from the high level language which means it is potentially usable
as a core language for other high level syntaxes.
Installation
============
To build and install what exists of it so far:
+ Optionally, set the `PREFIX` in `Makefile`
+ `make idris2`
+ `make install`
You'll need to set your `PATH` to `$PREFIX/bin`
You may also want to set `IDRIS_CC` to `clang`, since this seems to build
the generated C significantly faster.
Note: If you edit `idris2.ipkg` to use the `opts` with optimisation set
(`--cg-opt -O2`) you'll find it runs about twice as fast, at the cost of
taking a couple of minutes to generate the `idris2` executable.
You can check that building succeeded by running
- `make test`
I make no promises how well this works yet, but you are welcome to have a
play. Good luck :).
Information about external dependencies are presented in [INSTALL.md](INSTALL.md).
Things still missing
====================
+ Some high level syntax, notably numeric ranges
+ 'using' blocks
+ Cumulativity
+ 'rewrite' doesn't yet work on dependent types
+ Some details of 'with' not yet done (notably recursive with call syntax)
+ Parts of the ide-mode, particularly syntax highlighting
+ Documentation strings and HTML documentation generation
+ ':search' and ':apropos' at the REPL
+ The rest of this "Things still missing" list