Add some notes

Mostly copied from Blodwen and brought up to date (more or less).
2024-09-11 16:05:51 +03:00 · 2019-06-24 13:05:19 +01:00 · 2019-06-24 13:05:19 +01:00 · 18f269bbef
commit 18f269bbef
parent af9dc69c61
6 changed files with 576 additions and 0 deletions
--- a/INSTALL.md
+++ b/INSTALL.md
@ -0,0 +1,43 @@
+# Installation
+
+Idris 2 is built using Idris, and provides support for three default code
+generation targets: Chez, Chicken, and Racket. It currently requires the
+(not yet released) latest Idris master.
+
+## Idris
+
+There are several sets of instructions on how to install Idris from source and
+binary.
+
+ [Official ipkg installer for macOS](http://www.idris-lang.org/pkgs/idris-current.pkg)
+ [Source build instructions from the Idris GitHub Wiki](https://github.com/idris-lang/Idris-dev/wiki/Installation-Instructions)
+ [Binary installation using Cabal from the Idris Manual](https://idris.readthedocs.io/en/latest/tutorial/starting.html)
+
+## Code Generation Targets
+
+Only one of these is absolutely necessary, and you can type check code even
+with none of them installed. The default code generator targets Chez Scheme.
+
+### Chez
+
+Chez Scheme is available to build from source:
+
+ https://cisco.github.io/ChezScheme/
+
+Many popular package managers will provide a binary distribution for Chez.
+
+### Chicken
+
+Chicken scheme offers binary distributions (and source tar balls) from
+
+ https://code.call-cc.org/
+
+You can find chicken in many a package manager.
+
+After installing chicken scheme you may need to install the 'numbers' package.
+
+### Racket
+
+Racket is available from:
+
+ https://download.racket-lang.org/
--- a/Notes/Directives.md
+++ b/Notes/Directives.md
@ -0,0 +1,101 @@
+# Compiler Directives
+
+There are a number of directives (instructions to the compiler, beginning with
+a `%` symbol) which are not part of the Idris 2 language, but nevertheless
+provide useful information to the compiler. Mostly these will be useful for
+library (especially Prelude) authors. They are ordered here approximately
+by "likelihood of being useful to most programmers".
+
+## Language directives
+
+### %name
+
+Syntax: `%name <name> <name_list>`
+
+For interactive editing purposes, use `<name_list>` as the preferred names
+for variables with type `<name>`.
+
+### %hide
+
+Syntax: `%hide <name>`
+
+Throughout the rest of the file, consider the `<name>` to be private to its
+own namespace.
+
+### %auto_lazy
+
+Syntax: `%auto_lazy <on | off>`
+
+Turn the automatic insertion of laziness annotations on or off. Default is
+`on` as long as `%lazy` names have been defined (see below).
+
+### %runElab
+
+Syntax: `%runElab <expr>`
+
+NOT YET IMPLEMENTED
+
+Run the elaborator reflection expression `<expr>`, which must be of type
+`Elab a`. Note that this is only minimally implemented at the moment.
+
+### %pair
+
+Syntax: `%pair <pair_type> <fst_name> <snd_name>`
+
+Use the given names in `auto` implicit search for projecting from pair types.
+
+### %rewrite
+
+Syntax: `%rewrite <rewrite_name>`
+
+Use the given name as the default rewriting function in the `rewrite` syntax.
+
+### %integerLit
+
+Syntax: `%integerLit <fromInteger_name>`
+
+Apply the given function to all integer literals when elaborating.
+The default Prelude sets this to `fromInteger`.
+
+### %stringLit
+
+Syntax: `%stringLit <fromString_name>`
+
+Apply the given function to all string literals when elaborating.
+The default Prelude does not set this.
+
+### %charLit
+
+Syntax: `%charLit <fromChar_name>`
+
+Apply the given function to all character literals when elaborating.
+The default Prelude does not set this.
+
+### %allow_overloads
+
+Syntax: `%allow_overloads <name>`
+
+This is primarily for compatibility with the Idris 1 notion of name overloading
+which allows in particular `(>>=)` and `fromInteger` to be overloaded in an
+ad-hoc fashion as well as via the `Monad` and `Num` interfaces respectively.
+
+It effect is: If `<name>` is one of the possibilities in an ambiguous
+application, and one of the other possibilities is immediately resolvable by
+return type, remove `<name>` from the list of possibilities.
+
+If this sounds quite hacky, it's because it is. It's probably better not to
+use it other than for the specific cases where we need it for compatibility.
+It might be removed, if we can find a better way to resolve ambiguities with
+`(>>=)` and `fromInteger` in particular!
+
+## Implementation/debugging directives
+
+### %logging
+
+Syntax: `%logging <level>`
+
+Set the logging level. In general `1` tells you which top level declaration
+the elaborator is working on, up to `5` gives you more specific details of
+each stage of the elaboration, and up to `10` gives you full details of
+type checking including progress on unification problems
+
--- a/Notes/IDE-mode.md
+++ b/Notes/IDE-mode.md
@ -0,0 +1,43 @@
+IDE protocol, version 2
+=======================
+
+The IDE protocol is (or rather, will be) compatible with the IDE protocol for
+Idris 1, as described here:
+http://docs.idris-lang.org/en/latest/reference/ide-protocol.html
+
+On start up, it reports:
+
+`000018(:protocol-version 2 0)`
+
+So far, there are two extended commands and one new command.
+
+Extended commands
+-----------------
+
+`(:type-of STRING LINE COLUMN)`
+
+If `type-of` is given a line and a column, this looks up the type of the name
+at that location, which may be a local variable or a specialisation of a
+global definition.
+
+`(:proof-search LINE NAME HINTS :all)`
+
+The optional additional argument `:all` means that the expression search will
+return all of the results it finds, one per line (up to a currently hard coded
+search depth limit, which will probably be settable as an option in the
+future).
+
+`(:case-split LINE COLUMN NAME)`
+
+Case splitting can take an optional additional `COLUMN` argument.
+
+New commands
+------------
+
+`(:generate-def LINE NAME)`
+
+Generates a pattern matching definition, if it can find one, for the function
+declared with the given name, on the given line. It will only return the
+first definition it finds, as a list of pattern clauses. This works via a
+combination of case splitting and expression search.
+
--- a/Notes/Makefile
+++ b/Notes/Makefile
@ -0,0 +1,13 @@
+# Makefile to render documents
+
+CC=pandoc
+DOCS=implementation-notes.md Directives.md IDE-mode.md
+PDFS=$(patsubst %.md,%.pdf,$(DOCS))
+
+%.pdf: %.md
+	$(CC) $< -o $@
+
+all: $(PDFS)
+
+clean:
+	rm $(PDFS)
--- a/Notes/implementation-notes.md
+++ b/Notes/implementation-notes.md
@ -0,0 +1,288 @@
+Some unsorted notes on aspects of the implementation. Sketchy, and not always
+completely up to date, but hopefully give some hints as to what's going on and
+some ideas where to look in the code to see how certain features work.
+
+Overview
+--------
+Core language TT (defined in Core.TT), based on quantitative type theory
+(see https://bentnib.org/quantitative-type-theory.html). Binders have
+"multiplicities" which are either 0, 1 or unlimited.
+
+Terms are indexed over the names in scope so that we know terms are always well
+scoped. Values (i.e. normal forms) are defined in Core.Value as NF;
+constructors do not evaluate their arguments until explicitly requested.
+
+Elaborate to TT from a higher level language TTImp (defined in TTImp.TTImp),
+which is TT with implicit arguments, local function definitions, case blocks,
+as patterns, qualified names with automatic type-directed disambiguation, and
+proof search.
+
+Elaboration relies on unification (in Core.Unify), which allows postponing
+of unification problems. Essentially works the same way as Agda as described
+in Ulf Norell's thesis.
+
+General idea is that high level languages will provide a translation to TT.
+In the Idris/ namespace we define the high level syntax for Idris, which
+translates to TTImp by desugaring operators, do notation, etc.
+
+TT separates 'Ref' (global user defined names) from 'Meta', which are globally
+defined metavariables. For efficiency, metavariables are only substituted into
+terms if they have non-0 multiplicity, to preserve sharing as much as possible
+
+There is a separate linearity check after elaboration, which updates types of
+holes (and is aware of case blocks).
+
+Where to find things:
+
+* Core/ -- anything related to the core TT, typechecking and unification
+* TTImp/ -- anything related to the implicit TT and its elaboration
+  * TTImp/Elab/ -- Elaboration state and elaboration of terms 
+  * TTImp/Interactive/ -- Interactive editing infrastructure
+* Parser/ -- various utilities for parsing and lexing TT and TTImp (and other things)
+* Utils/ -- some generally useful utilities
+* Idris/ -- anything relating to the high level language, translating to TTImp
+  * Idris/Elab/ -- High level construct elaboration machinery (e.g. interfaces)
+
+The Core Type, and Ref
+----------------------
+Core is a "monad" (not really, for efficiency reasons, at the moment...)
+supporting Errors and IO [TODO: Allow restricting to specific IO operations]
+The raw syntax is defined by a type RawImp which has a source location at each
+node, and any errors in elaboration note the location at the point where the
+error occurred.
+
+'Ref' is essentially an IORef. Typically we pass them implicitly and use
+labels to disambiguate which one we mean. See Core.Core for their
+definition. Again, IORef is for efficiency - even if it would be neater to
+use a state monad this turned out to be about 2-3 times faster, so I'm
+going with the "ugly" choice...
+
+Context
+-------
+Core.Context defines all the things needed for TT. Most importantly: Def 
+gives definitions of names (case trees, builtins, constructors and
+holes, mostly); GlobalDef is a definition with all the other information
+about it (type, visibility, totality, etc); Gamma is a context mapping names
+to GlobalDef, and 'Defs' is the core data structure with everything needed to
+typecheck more definitions.
+
+The main Context type stores definitions in an array, indexed by a "resolved
+name id" for fast look up. This means that it also needs to be able to convert
+between resolved names and full names.
+
+Since we store names in an array, all the lookup functions need to be in the
+Core monad. This also turns out to help with loading checked files (see below).
+
+Laziness
+--------
+Like Idris 1, laziness is marked in types using Lazy, Delay and Force, or
+Inf (instead of Lazy) for codata. Unlike Idris 1, these are language primitives
+rather than special purpose names.
+
+TTC format
+----------
+We can save things to binary if we have an implementation of the TTC interface
+for it. See Utils.Binary to see how this is done. It uses a global reference
+'Ref Bin Binary' which uses Data.Buffer underneath.
+
+When we load checked TTC files, we don't process the definitions immediately,
+but rather store them as a 'ContextEntry', which is either a Binary blob, or
+a processed definition. We only process the definitions the first time they
+are looked up, since converting Binary to the definition is fairly costly,
+and often definitions in an imported file are never used.
+
+Bound Implicits
+---------------
+The RawImp type has a constructor IBindVar. The first time we encounter an
+IBindVar, we record the name as one which will be implicitly bound. At the
+end of elaboration, we decide which holes should turn into bound variables
+(Pi bound in types, Pattern bound on a LHS, still holes on the RHS) by
+looking at the list of names bound as IBindVar, the things they depend on,
+and sorting them so that they are bound in dependency order. This happens
+in State.getToBind.
+
+Once we know what the bound implicits need to be, we bind them in 
+'bindImplicits'. Any application of a hole which stands for a bound implicit
+gets turned into a local binding (either Pi or Pat as appropriate, or PLet for
+@-patterns).
+
+Unbound Implicits
+-----------------
+Any name beginning with a lower case letter is considered an unbound implicit.
+They are elaborated as holes, which may depend on the initial environment of
+the elaboration, and after elaboration they are converted to an implicit pi
+binding, with multiplicity 0. So, for example:
+```idris
+map : {f : _} -> (a -> b) -> f a -> f b
+```
+becomes
+```idris
+map : {f : _} -> {0 a : _} -> {0 b : _} -> (a -> b) -> f a -> f b
+```
+
+Bindings are ordered according to dependency.  It'll infer any additional
+names, e.g. in
+```idris
+lookup : HasType i xs t -> Env xs -> t
+```
+...where 'xs' is a Vect n a, it infers bindings for n and a.
+
+(TODO: %auto_implicits directive)
+
+Implicit arguments
+------------------
+When we encounter an implicit argument ('\_' in the raw syntax, or added when
+we elaborate an application and see that there is an implicit needed) we
+make a new hole which is a fresh name applied to the current environment,
+and return that as the elaborated term. If there's enough information elsewhere
+we'll find the definition of the hole by unification.
+
+We never substitute holes in a term during elaboration and rely on normalisation
+if we need to look inside it. If there are holes remaining after elaboration of a definition, report an
+error (it's okay for a hole in a type as long as it's resolved by the time the
+definition is done).
+
+See Elab.App.makeImplicit Elab.App.makeAutoImplicit to see where we add holes
+for the implicit arguments in applications.
+
+Elab.App does quite a lot of tricky stuff! In an attempt to help with resolving
+ambiguous names and record updates, it will sometimes delay elaboration of an
+argument (see App.checkRestApp) so that it can get more information about its
+type first.
+
+Core.Unify.solveConstraints revisits all of the currently unsolved holes and
+constrained definitions, and tries again to unify any constraints which they
+require. It also tries to resolve anything defined by proof search.
+
+Additional type inference
+-------------------------
+A '?' in a type means "infer this part of the type".  This is distinct from "\_"
+in types, which means "I don't care what this is". The distinction is in what
+happens when inference fails.  If inference fails for "\_", we implicitly bind a
+new name (just like pattern matching on the lhs - i.e. it means match
+anything). If inference fails for "?", we leave it as a hole and try to fill it
+in later. As a result, we can say:
+
+```idris
+foo : Vect Int ?
+foo = [1,2,3,4]
+```
+...and the ? will be inferred to be 4. But if we say
+
+```idris
+foo : Vect Int _
+foo = [1,2,3,4]
+```
+...we'll get an error, because the '\_' has been bound as a new name.
+
+So the meaning of "\_" is now consistent on the lhs and in types (i.e. it
+means infer a value and bind a variable on failure to infer anything). In
+practice, using "\_" will get you the old Idris behaviour, but "?" might get
+you a bit more type inference.
+
+Auto Implicits
+--------------
+Auto implicits are resolved by proof search, and can be given explicit
+arguments in the same way as ordinary implicits: i.e. {x = exp} to give
+'exp' as the value for auto implicit 'x'. Interfaces are syntactic sugar for
+auto implicits (it uses the resolution mechanism - interfaces translate into
+records, and implementations translate into hints for the search).
+
+The argument syntax `@{exp}` means that the value of the next auto implicit in
+the application should be 'exp' - this is the same as the syntax for invoking
+named implementations in Idris 1, but interfaces and auto implicits have been
+combined now.
+
+Dot Patterns
+------------
+IMustUnify is a constructor of RawImp. When we elaborate this, we generate a
+hole, then elaborate the term, and add a constraint that the generated hole
+must unify with the term which was explicitly given (in UnifyState.addDot),
+finally checked in 'UnifyState.checkDots'
+
+Proof Search
+------------
+A definition with the body 'BySearch' is a hole which will be resolved
+by searching for something which fits the type. This happens in
+Core.AutoSearch. It checks all possible hints for a term, to ensure that only
+one is possible.
+
+@-Patterns
+----------
+Names which are bound in types are also bound as @-patterns, meaning that
+functions have access to them. For example, we can say:
+
+```idris
+vlength : Vect n a -> Nat
+vlength [] = n
+vlength (x :: xs) = n
+```
+
+Linear Types
+------------
+Following Conor McBride and Bob Atkey's work, all binders have a multiplicity
+annotation ("RigCount"). After elaboration in TTImp.Elab, we do a separate
+linearity check which: a) makes sure that linear variables are used exactly
+once; b) updates hole types to properly reflect usage information.
+
+Local definitions
+-----------------
+We elaborate relative to an environment, meaning that we can elaborate local
+function definitions. We keep track of the names being defined in a nested
+block of declarations, and ensure that they are lifted to top level definitions
+in TT by applying them to every name in scope.
+
+Since we don't know how many times a local definition will be applied, in general,
+anything bound with multiplicity 1 is passed to the local definition with
+multiplicity 0, so if you want to use it in a local definition, you need to
+pass it explicitly.
+
+Case blocks
+-----------
+Similar to local definitions, these are lifted to top level definitions which
+represent the case block, which is immediately applied to the scrutinee of
+the case.  The function which defines the block takes as arguments: the entire
+current environment (so that it can use any name in scope); any names in
+the environment which the scrutinee's type depends on (to support dependent
+case, but not counting parameters which are unchanged across the structure).
+
+Parameters
+----------
+The parameters to a data type are taken to be the arguments which appear,
+unchanged, in the same position, everywhere across a data definition.
+
+Erasure
+-------
+Unbound implicits are given '0' multiplicity, so the rule is now that if you
+don't explicitly write it in the type of a function or constructor, the 
+argument is erased at run time.
+
+Elaboration and the case tree compiler check ensure that 0-multiplicity
+arguments are not inspected in case trees.
+
+Namespaces and name visibility
+------------------------------
+Same rules mostly apply as in Idris 1. The difference is that visibility is
+*per namespace* not *per file* (that is, files have no relevance other except
+in that they introduce their own namespace, and in that they allow separate
+typechecking).
+
+One effect of this is that when a file defines nested namespaces, the inner
+namespace can see what's in the outer namespace, but not vice versa unless
+names defined in the inner namespace are explicitly exported. The visibility
+modifiers "export", "public export", and "private" control whether the name
+can be seen in any other namespace, and it's nothing to do with the file
+they're defined in at all.
+
+Records
+-------
+Records are part of TTImp (rather than the surface language). Elaborating a
+record declaration creates a data type and associated projection functions.
+Record setters are generated on demand while elaborating TTImp (in
+TTImp.Elab.Record). Setters are translated directly to 'case' blocks, which
+means that update of dependent fields works as one might expect (i.e. it's safe
+as long as all of the fields are updated at the same time consistently).
+
+In TTImp, unlike in Idris 1, records are not implicitly put into their own
+namespace, but higher level languages (e.g. Idris itself) can do so explicitly
+themselves.
--- a/README.md
+++ b/README.md
@ -0,0 +1,88 @@
+Idris 2
+=======
+
+This is a pre-alpha implementation of Idris 2, the successor to Idris.
+
+Idris 2 is mostly backwards compatible with Idris 1, with some minor
+exceptions. The most notable user visible differences, which might cause Idris
+1 programs to fail to type check, are:
+
+ Unbound implicit arguments are always erased, so it is a type error to
+  attempt to pattern match on one.
+ Simplified resolution of ambiguous names.
+ Minor differences in the meaning of export modifiers `private`, `export`,
+  and `public export`, which now refer to visibility of names from other
+  *namespaces* rather than visibility from other *files*.
+ Module names must match the filename in which they are defined (unless
+  the module's name is "Main").
+ Anything which uses a `%language` pragma in Idris 1 is likely to be different.
+  Notably, elaborator reflection will exist, but most likely in a slightly
+  different form because the internal details of the elaborator are different.
+ The `Prelude` is much smaller (and easier to replace with an alternative).
+
+Watch this space for more details and the rationale for the changes, as I
+get around to writing it...
+
+Summary of new features:
+
+ A core language based on "Quantitative Type Theory" which allows explicit
+  annotation of erased types, and linear types.
+ `let` bindings are now more expressive, and can be used to define pattern
+  matching functions locally.
+ Names which are in scope in a type are also always in scope in the body of
+  the corresponding definition.
+ Better inference. Holes are global to a source file, rather than local to
+  a definition, meaning that some holes can be left in function types to be
+  inferred by the type checker. This also gives better inference for the types
+  of `case` expressions, and means fewer annotations are needed in interface
+  declarations.
+ Better type checker implementation which minimises the need for compile
+  time evaluation.
+ New Chez Scheme based back end which both compiles and runs faster than the
+  default Idris 1 back end. (Also, optionally, Chicken Scheme and Racket can
+  be used as targets).
+ Everything works faster :).
+
+A significant change in the implementation is that there is an intermediate
+language `TTImp`, which is essentially a desugared Idris, and is cleanly
+separated from the high level language which means it is potentially usable
+as a core language for other high level syntaxes.
+
+Installation
+============
+
+To build and install what exists of it so far:
+
+ Optionally, set the `PREFIX` in `Makefile`
+ `make idris2`
+ `make install`
+
+You'll need to set your `PATH` to `$PREFIX/bin`
+You may also want to set `IDRIS_CC` to `clang`, since this seems to build
+the generated C significantly faster.
+
+Note: If you edit `idris2.ipkg` to use the `opts` with optimisation set
+(`--cg-opt -O2`) you'll find it runs about twice as fast, at the cost of
+taking a couple of minutes to generate the `idris2` executable.
+
+You can check that building succeeded by running
+
+- `make test`
+
+I make no promises how well this works yet, but you are welcome to have a
+play. Good luck :).
+
+Information about external dependencies are presented in [INSTALL.md](INSTALL.md).
+
+Things still missing
+====================
+
+ Some high level syntax, notably numeric ranges
+ 'using' blocks
+ Cumulativity
+ 'rewrite' doesn't yet work on dependent types
+ Some details of 'with' not yet done (notably recursive with call syntax)
+ Parts of the ide-mode, particularly syntax highlighting
+ Documentation strings and HTML documentation generation
+ ':search' and ':apropos' at the REPL
+ The rest of this "Things still missing" list