Idris2/docs/source/backends/backend-cookbook.rst
2021-10-13 14:25:32 +01:00

764 lines
34 KiB
ReStructuredText

Custom backend cookbook
=======================
This document addresses the details on how to implement a custom code generation
backend for the Idris compiler.
This part has no insights about how to implement the dependently typed bits.
For that part of the compiler Edwin Brady gave lectures at SPLV20_ which are
available online.
The architecture of the Idris2 compiler makes it easy to implement a custom code
generation back-end.
The way to extend Idris with new back-ends is to use it as a library.
The module ``Idris.Driver`` exports the function ``mainWithCodegens``,
that takes a list of ``(String, Codegen)``, starting idris with these codegens
in addition to the built-in ones.
The first codegen in the list will be set as the default codegen.
Anyone who is interested in implementing a custom back-end needs to answer the
following questions:
- Which Intermediate Representation (IR) should be consumed by the custom back-end?
- How to represent primitive values defined by the ``Core.TT.Constant`` type?
- How to represent Algebraic Data Types?
- How to implement special values?
- How to implement primitive operations?
- How to compile IR expressions?
- How to compile Definitions?
- How to implement Foreign Function Interface?
- How to compile modules?
- How to embed code snippets?
- What should the runtime system support?
First of all, we should know that Idris2 is not an optimizing compiler.
Currently its focus is only to compile dependently typed functional code
in a timely manner.
Its main purpose is to check if the given program is correct in a dependently
typed setting and generate code in form of a lambda-calculus like IR where
higher-order functions are present.
Idris has 3 intermediate representations for code generation.
At every level we get a simpler representation, closer to machine code, but it
should be stressed that all the aggressive code optimizations should happen in
the custom back-ends.
The quality and readability of the generated back-end code is on the shoulders
of the implementor of the back-end. Idris erases type information, in the IRs as
it compiles to scheme by default, and there is no need to keep the type
information around.
With this in mind let's answer the questions above.
The architecture of an Idris back-end
-------------------------------------
Idris compiles its dependently typed front-end language into a representation
which is called ``Compile.TT.Term`` .
This data type has a few constructors and it represents a dependently typed
term.
This ``Term`` is transformed to ``Core.CompileExpr.CExp`` which has more
constructors than ``Term`` and it is a very similar construct to a lambda
calculus with let bindings, structured and tagged data representation,
primitive operations, external operations, and case expressions.
The ``CExp`` is closer in the compiling process to code generation.
The custom code generation back-end gets
a context of definitions,
a template directory and an output directory,
a ``Core.TT.ClosedTerm`` to compile and a path to an output file.
.. code-block:: idris
compile : Ref Ctxt Defs -> (tmpDir : String) -> (outputDir : String)
-> ClosedTerm -> (outfile : String) -> Core (Maybe String)
compile defs tmpDir outputDir term file = ?
The ``ClosedTerm`` is a special ``Term`` where the list of the unbound
variables is empty.
This technicality is not important for the code generation of the custom
back-end as the back-end needs to call the ``getCompileData`` function
which produces the ``Compiler.Common.CompileData`` record.
The ``CompileData`` contains:
- A main expression that will be the entry point for the program in ``CExp``
- A list of ``Core.CompileExpr.NamedDef``
- A list of lambda-lifted definitions ``Compiler.LambdaLift.LiftedDef``
- A list of ``Compiler.ANF.ANFDef``
- A list of ``Compiler.VMCode.VMDef`` definitions
These lists contain:
- Functions
- Top-level data definitions
- Runtime crashes which represent unfilled holes,
explicit calls by the user to ``idris_crash``,
and unreachable branches in case trees
- Foreign call constructs
The job of the custom code generation back-end is to transform one of the phase
encoded definitions (``NamedDef``, ``LiftedDef``, ``CExp``, ``ANF``, or ``VM``)
into the intermediate representation of the code generator.
It can then run optimizations and generate some form of executable.
In summary, the code generator has to understand how to represent tagged data
and function applications (even if the function application is partial), how
to handle let expressions, how to implement and invoke primitive operations,
how to handle ``Erased`` arguments, and how to do runtime crashes.
The implementor of the custom back-end should pick the closest Idris IR which
fits to the abstraction of the technology that is aimed to compile to.
The implementor should also consider how to transform the simple main expression
which is represented in CExp.
As Idris does not focus on memory management and threading. The custom back-end
should model these concepts for the program that is compiled.
One possible approach is to target a fairly high level language and reuse as
much as possible from it for the custom back-end.
Another possibility is to implement a runtime that is capable of handling memory
management and threading.
Which Intermediate Representation (IR) should be consumed by the custom back-end?
---------------------------------------------------------------------------------
Now lets turn our attention to the different intermediate representations (IRs)
that Idris provides.
When the ``getCompiledData`` function is invoked with the ``Phase`` parameter
it will produce a ``CompileData`` record, which will contain lists of top-level
definitions that needs to be compiled. These are:
- ``NamedDef``
- ``LiftedDef``
- ``ANFDef``
- ``VMDef``
The question to answer here is: Which one should be picked?
Which one fits to the custom back-end?
How to represent primitive values defined by the ``Core.TT.Constant`` type?
---------------------------------------------------------------------------
After one selects the IR to be used during code generation, the next question
to answer is how primitive types should be represented in the back-end.
Idris has the following primitive types:
- ``Int``
- ``Integer`` (arbitrary precision)
- ``Bits(8/16/32/64)``
- ``Char``
- ``String``
- ``Double``
- ``WorldVal`` (token for IO computations)
And as Idris allows pattern matching on types all the primitive types have
their primitive counterpart for describing a type:
- ``IntType``
- ``IntegerType``
- ``Bits(8/16/32/64)Type``
- ``StringType``
- ``CharType``
- ``DoubleType``
- ``WorldType``
The representation of these primitive types should be a well-thought out
design decision as it affects many parts of the code generation, such as
conversion from the back-end values when FFI is involved, big part of the
data during the runtime is represented in these forms.
Representation of primitive types affect the possible optimisation techniques,
and they also affect the memory management and garbage collection.
There are two special primitive types: String and World.
**String**
As its name suggest this type represent a string of characters. As mentioned in
`Primitive FFI Types <https://idris2.readthedocs.io/en/latest/ffi/ffi.html#primitive-ffi-types>`_,
Strings are encoded in UTF-8.
It is not always clear who is responsible for freeing up a ``String`` created by
a component other than the Idris runtime. Strings created in Idris will
always have value, unlike possible String representation of the host technology,
where for example NULL pointer can be a value, which can not happen on the Idris side.
This creates constraints on the possible representations of the Strings in the
custom back-end and diverging from the Idris representation is not a good idea.
The best approach here is to build a conversion layer between the string
representation of the custom back-end and the runtime.
**World**
In pure functional programming, causality needs to be represented whenever we
want to maintain the order in which subexpressions are executed.
In Idris a token is used to chain IO function calls.
This is an abstract notion about the state of the world. For example this
information could be the information that the runtime needs for bookkeeping
of the running program.
The ``WorldVal`` value in Idris programs is accessed via the ``primIO``
construction which leads us to the ``PrimIO`` module.
Let's see the relevant snippets:
.. code-block:: idris
data IORes : Type -> Type where
MkIORes : (result : a) -> (1 x : %World) -> IORes a
fromPrim : (1 fn : (1 x : %World) -> IORes a) -> IO a
fromPrim op = MkIO op
primIO : HasIO io => (1 fn : (1 x : %World) -> IORes a) -> io a
primIO op = liftIO (fromPrim op)
The world value is referenced as ``%World`` in Idris.
It is created by the runtime when the program starts.
Its content is changed by the custom runtime.
More precisely, the World is created when the ``WorldVal`` is evaluated during
the execution of the program.
This can happen when the program gets initialized or when an ``unsafePerformIO``
function is executed.
How to represent Algebraic Data Types?
--------------------------------------
In Idris there are two different ways to define a data type: tagged unions are
introduced using the ``data`` keyword while structs are declared via the
``record`` keyword.
Declaring a ``record`` amounts to defining a named collection of fields.
Let's see examples for both:
.. code-block:: idris
data Either a b
= Left a
| Right b
.. code-block:: idris
record Pair a b
constructor MkPair
fst : a
snd : b
Idris offers not only algebraic data types but also indexed families. These
are tagged union where different constructors may have different return types.
Here is ``Vect`` an example of a data type which is an indexed family
corresponding to a linked-list whose length is known at compile time.
It has one index (of type ``Nat``) representing the length of the list (the
value of this index is therefore different for the ``[]`` and ``(::)``
constructors) and a parameter (of type ``Type``) corresponding to the type
of values stored in the list.
.. code-block:: idris
data Vect : (size : Nat) -> Type -> Type where
Nil : Vect 0 a -- empty list: size is 0
(::) : a -> Vect n a -> Vect (1 + n) a -- extending a list of size n: size is 1+n
Both data and record are compiled to constructors in the intermediate
representations. Two examples of such Constructors are
``Core.CompileExpr.CExp.CCon`` and ``Core.CompileExpr.CDef.MkCon``.
Compiling the ``Either`` data type will produce three constructor definitions
in the IR:
- One for the ``Either`` type itself, with the arity of two.
Arity tells how many parameters of the constructor should have.
Two is reasonable in this case as the original Idris ``Either`` type has
two parameters.
- One for the ``Left`` constructor with arity of three.
Three may be surprising, as the constructor only has one argument in Idris,
but we should keep in mind the type parameters for the data type too.
- One for the ``Right`` constructor with arity of three.
In the IR constructors have unique names. For efficiency reasons,
Idris assigns a unique integer tag to each data constructors so that constructor
matching is reduced to comparisons of integers instead of strings.
In the ``Either`` example above ``Left`` gets tag 0 and ``Right`` gets tag 1.
Constructors can be considered structured information: a name
together with parameters.
The custom back-end needs to decide how to represent such data.
For example using ``Dict`` in Python, ``JSON`` in JavaScript, etc.
The most important aspect to consider is that these structured values
are heap related values, which should be created and stored dynamically.
If there is an easy way to map in the host technology, the memory management
for these values could be inherited. If not, then the host technology is
responsible for implementing an appropriate memory management.
For example ``RefC`` is a C backend that implements its own memory management
based on reference counting.
How to implement special values?
--------------------------------
Apart from the data constructors there are two special kind of values present
in the Idris IRs: type constructors and ``Erased``.
Type constructors
~~~~~~~~~~~~~~~~~
Type and data constructors that are not relevant for the program's runtime
behaviour may be used at compile butand will be erased from the intermediate
representation.
However some type constructors need to be kept around even at runtime
because pattern matching on types is allowed in Idris:
.. code-block:: idris
notId : {a : Type} -> a -> a
notId {a=Int} x = x + 1
notId x = x
Here we can pattern match on ``a`` and ensure that ``notId`` behaves differently
on ``Int`` than all the other types.
This will generate an IR that will contain a ``Case`` expression with two
branches:
one ``Alt`` matching on the ``Int`` type constructor
and a default for the non-``Int`` matching part of the ``notId`` function.
This is not that special: ``Type`` is a bit like an infinite data type that
contains all of the types a user may ever declare or use.
This can be handled in the back-end and host language using the same mechanisms
that were mobilised to deal with data constructors.
The reason for using the same approach is that in dependently typed languages,
the same language is used to form both type and value level expressions.
Compilation of type level terms will be the same as that of value level terms.
This is one of the things that make dependently typed abstraction elegant.
``Erased``
~~~~~~~~~~
The other kind of special value is ``Erased``.
This is generated by the Idris compiler and part of the IR if the original value
is only needed during the type elaboration process. For example:
.. code-block:: idris
data Subset : (type : Type)
-> (pred : type -> Type)
-> Type
where
Element : (value : type)
-> (0 prf : pred value)
-> Subset type pred
Because ``prf`` has quantity ``0``, it is guaranteed to be erased during
compilation and thus not present at runtime.
Therefore ``prf`` will be represented as ``Erased`` in the IR.
The custom back-end needs to represent this value too as any other data value,
as it could occur in place of normal values.
The simplest approach is to implement it as a special data constructor and let
the host technology provided optimizations take care of its removal.
How to implement primitive operations?
--------------------------------------
Primitive operations are defined in the module ``Core.TT.PrimFn``.
The constructors of this data type represent the primitive operations that
the custom back-end needs to implement.
These primitive operations can be grouped as:
- Arithmetic operations (``Add``, ``Sub``, ``Mul``, ``Div``, ``Mod``, ``Neg``)
- Bit operations (``ShiftL``, ``ShiftR``, ``BAnd``, ``BOr``, ``BXor``)
- Comparison operations (``LT``, ``LTE``, ``EQ``, ``GTE``, ``GT``)
- String operations
(``Length``, ``Head``, ``Tail``, ``Index``, ``Cons``, ``Append``,
``Reverse``, ``Substr``)
- Double precision floating point operations
(``Exp``, ``Log``, ``Sin``, ``Cos``, ``Tan``, ``ASin``, ``ACos``, ``ATan``,
``Sqrt``, ``Floor``, ``Ceiling``)
- Casting of numeric and string values
- An unsafe cast operation ``BelieveMe``
- A ``Crash`` operation taking a type and a string and creating a value at that
type by raising an error.
BelieveMe
~~~~~~~~~
The primitive ``believe_me`` is an unsafe cast that allows users to bypass the
typechecker when they know something to be true even though it cannot be proven.
For instance, assuming that Idris' primitives are correctly implemented, it
should be true that if a boolean equality test on two ``Int`` ``i`` and ``j``
returns ``True`` then ``i`` and ``j`` are equal.
Such a theorem can be implemented by using ``believe_me`` to cast ``Refl``
(the constructor for proofs of a propositional equality) from ``i === i`` to
``i === j``. In this case, it should be safe to implement.
Boxing
~~~~~~
Idris assumes that the back-end representation of the data is not strongly typed
and that all the data type have the same kind of representation.
This could introduce a constraint on the representation of the primitives and
constructor represented data types.
One possible solution is that the custom back-end should represent primitive
data types the same way it does constructors, using special tags.
This is called boxing.
Official backends represent primitive data types as boxed ones.
- RefC: Boxes the primitives, which makes them easy to put on the heap.
- Scheme: Prints the values that are a ``Constant`` as Scheme literals.
How to compile top-level definitions?
-------------------------------------
As mentioned earlier, Idris has 4 different IRs that are available in
the ``CompileData`` record: ``Named``, ``LambdaLifted``, ``ANF``, and ``VMDef``.
When assembling the ``CompileData`` we have to tell the Idris compiler which
level we are interested in.
The ``CompileData`` contains lists of definitions that can be considered as top
level definitions that the custom back-end need to generate functions for.
There are four types of top-level definitions that the code generation back-end
needs to support:
- Function
- Constructor
- Foreign call
- Error
**Function** contains a lambda calculus like expression.
**Constructor** represents a data or a type constructor, and it should
be implemented as a function creating the corresponding data structure
in the custom back-end.
A top-level **foreign call** defines an entry point for calling functions
implemented outside the Idris program under compilation.
The Foreign construction contains a list of Strings which are the snippets
defined by the programmer, the type of the arguments and the return type of
the foreign function. The custom back-end should generate a wrapper function.
More on this on `How to implement the Foreign Function Interface?`_
A top-level **error** definition represents holes in Idris programs, uses of
``idris_crash``, or unreachable branches in a case tree.
Users may want to execute incomplete programs for testing purposes which is
fine as long as we never actually need the value of any of the holes.
Library writers may want to raise an exception if an unrecoverable error has
happened.
Finally, Idris compiles the unreachable branches of a case tree to runtime
error as it is dead code anyway.
How to compile IR expressions?
------------------------------
The custom back-end should decide which intermediate representation
is used as a starting point. The result of the transformation should be
expressions and functions of the host technology.
Definitions in ``ANF`` and ``Lifted`` are represented as a tree like expression,
where control flow is based on the ``Let`` and ``Case`` expressions.
Case expressions
~~~~~~~~~~~~~~~~
There are two types of case expressions,
one for matching and branching on primitive values such as ``Int``,
and the second one is matching and branching on constructor values.
The two types of case expressions will have two different representation for
alternatives of the cases. These are ``ConstCase`` (for matching on constant
values) and ``ConCase`` (for matching on constructors).
Matching on constructors can be implemented as matching on their tags or,
less efficiently, as matching on the name of the constructor. In both cases
a match should bind the values of the constructor's arguments to variables
in the body of the matching branch.
This can be implemented in various ways depending on the host technology:
switch expressions, case with pattern matching, or if-then-else chains.
When pattern matching binds variables, the number of arguments can be different
from the arity of the constructor defined in top-level definitions and in
``GlobalDef``. This is because all the arguments are kept around at typechecking
time, but the code generator for the case tree removes the ones which are marked
as erased. The code generator of the custom back-end also needs to remove the
erased arguments in the constructor implementation.
In ``GlobalDef``, ``eraseArg`` contains this information, which can be used to
extract the number of arguments which needs to be kept around.
Creating values
~~~~~~~~~~~~~~~
Values can be created in two ways.
If the value is a primitive value, it will be handed to the back-end as
a ``PrimVal``. It should be compiled to a constant in the host language
following the design decisions made in
the 'How to represent primitive values?' section.
If it is a structured value (i.e. a ``Con``) it should be compiled to a function
in the host language which creates a dynamic value. Design decisions made for
'How to represent constructor values?' is going to have effect here.
Function calls
~~~~~~~~~~~~~~
There are four types of function calls:
- Saturated function calls (all the arguments are there)
- Under-applied function calls (some arguments are missing)
- Primitive function calls (necessarily saturated, ``PrimFn`` constructor)
- Foreign Function calls (referred to by its name)
The ``ANF`` and ``Lifted`` intermediate representations support under-applied
function calls (using the ``UnderApp`` constructor in both IR).
The custom back-end needs to support partial application of functions and
creating closures in the host technology.
This is not a problem with back-ends like Scheme where we get the partial
application of a function for free.
But if the host language does not have this tool in its toolbox, the custom
back-end needs to simulate closures.
One possible solution is to manufacture a closure as a special object storing
the function and the values it is currently applied to and wait until all the
necessary arguments have been received before evaluating it.
The same approach is needed if the ``VMCode`` IR was chosen for code generation.
Let bindings
~~~~~~~~~~~~
Both the ``ANF`` and ``Lifted`` intermediate representations have a
``Let`` construct that lets users assign values to local variables.
These two IRs differ in their representation of bound variables.
``Lifted`` is a type family indexed by the ``List Name`` of local variables
in scope. A variable is represented using ``LLocal``, a constructor that
stores a ``Nat`` together with a proof that it points to a valid name in
the local scope.
``ANF`` is a lower level representation where this kind of guarantees are not
present anymore. A local variable is represented using the ``AV`` constructor
which stores an ``AVar`` whose definition we include below.
The ``ALocal`` constructor stores an ``Int`` that corresponds to the ``Nat``
we would have seen in ``Lifted``.
The ``ANull`` constructor refers to an erased variable and its representation
in the host language will depend on the design choices made in
the 'How to represent ``Erased`` values' section.
.. .code-block:: idri
data AVar : Type where
ALocal : Int -> AVar
ANull : AVar
VMDef specificities
~~~~~~~~~~~~~~~~~~~
``VMDef`` is meant to be the closest IR to machine code.
In ``VMDef``, all the definitions have been compiled to instructions for a small
virtual machine with registers and closures.
Instead of ``Let`` expressions, there only are ``ASSIGN`` statements
at this level.
Instead of ``Case`` expressions binding variables when they successfully match
on a data constructor, ``CASE`` picks a branch based on the constructor itself.
An extra operation called ``PROJECT`` is introduced to explicitly extract a
constructor's argument based on their position.
There are no ``App`` or ``UnderApp``. Both are replaced by ``APPLY`` which
applies only one value and creates a closure from the application. For erased
values the operation ``NULL`` assigns an empty/null value for the register.
How to implement the Foreign Function Interface?
------------------------------------------------
The Foreign Function Interface (FFI) plays a big role in running Idris programs.
The primitive operations which are mentioned above are functions for
manipulating values and those functions aren't meant for complex interaction
with the runtime system.
Many of the primitive types can be thought of as abstract types provided via
``external`` and foreign functions to manipulate them.
The responsibility of the custom back-end and the host technology is
to represent these computations the operationally correct way.
The design decisions with respect to representing primitive types in the host
technology will inevitably have effects on the design of the FFI.
Foreign Types
~~~~~~~~~~~~~
Originally Idris had an official back-end implementation in C. Even though
this has changed, the names in the types for the FFI kept their C prefix.
The ``Core.CompileExpr.CFType`` contains the following definitions, many of
them one-to-one mapping from the corresponding primitive type, but some of
them needs explanation.
The foreign types are:
- ``CFUnit``
- ``CFInt``
- ``CFUnsigned(8/16/32/64)``
- ``CFString``
- ``CFDouble``
- ``CFChar``
- ``CFFun`` of type ``CFType -> CFType -> CFType``
Callbacks can be registered in the host technology via parameters that have
CFFun type. The back-end should be able to handle functions that are
defined in Idris side and compiled to the host technology. If the custom
back-end supports higher order functions then it should
be used to implement the support for this kind of FFI type.
- ``CFIORes`` of type ``CFType -> CFType``
Any ``PrimIO`` defined computation will have this extra layer.
Pure functions shouldn't have any observable IO effect on the program state
in the host technology implemented runtime.
NOTE: ``IORes`` is also used when callback functions are registered in the
host technology.
- ``CFWorld``
Represents the current state of the world. This should refer to a token that
is passed around between function calls.
The implementation of the World value should contain back-end
specific values and information about the state of the Idris runtime.
- ``CFStruct`` of type ``String -> List (String, CFType) -> CFType`` is the
foreign type associated with the ``System.FFI.Struct``.
It represents a C like structure in the custom back-end.
``prim__getField`` and ``prim__setField`` primitives should be implemented
to support this CFType.
- ``CFUser`` of type ``Name -> List CFType -> CFType``
Types defined with [external] are represented with ``CFUser``. For example
``data MyType : Type where [external]`` will be represented as
``CFUser Module.MyType []``
- ``CFBuffer``
Foreign type defined for ``Data.Buffer``.
Although this is an external type, Idris builds on a random access buffer.
- ``CFPtr`` The ``Ptr t`` and ``AnyPtr`` are compiled to ``CFPtr``
Any complex structured data that can not be represented as a simple primitive
can use this CFPtr to keep track where the value is used.
In Idris ``Ptr t`` is defined as external type.
- ``CFGCPtr`` The ``GCPtr t`` and ``GCAnyPtr`` are compiled to ``CFGCPtr``.
``GCPtr`` is inferred from a Ptr value calling the ``onCollect`` function and
has a special property. The ``onCollect`` attaches a finalizer for the pointer
which should run when the pointer is freed.
Examples
~~~~~~~~
Let's step back and look into how this is represented at the Idris source level.
The simplest form of a definition involving the FFI a function definition with
a ``%foreign`` pragma. The pragma is passed a list of strings corresponding to
a mapping from backends to names for the foreign calls. For instance:
.. .code-block:: idris
%foreign "C:add,libsmallc"
prim__add : Int -> Int -> Int
this function should be translated by the C back end as a call to the ``add``
function defined in the ``smallc.c`` file. In the FFI, ``Int`` is translated to
``CFInt``. The back-end assumes that the data representation specified in the
library file correspond to that of normal Idris values.
We can also define ``external`` types like in the following examples:
.. .code-block:: idris
data ThreadID : Type where [external]
%foreign "scheme:blodwen-thread"
prim__fork : (1 prog : PrimIO ()) -> PrimIO ThreadID
Here ``ThreadID`` is defined as an external type and this type will be
represented as ``CFUser "ThreadID" []`` internally. The value which is
created by the scheme runtime will be considered as a black box.
The type of ``prim__fork``, once translated as a foreign type, is
``[%World -> IORes Unit, %World] -> IORes Main.ThreadID``
Here we see that the ``%World`` is added to the IO computations.
The ``%World`` parameter is always the last in the argument list.
For the FFI functions, the type information and the user defined string can
be found in the top-level definitions.
The custom back-end should use the definitions to generate wrapper code,
which should convert the types that are described by the ``CFType`` to the
types that the function in the ``%foreign`` directive needs..
How to compile modules?
-----------------------
The Idris compiler generates intermediate files for modules, the content of
the files are neither part of ``Lifted``, ``ANF``, nor ``VMCode``.
Because of this, when the compilation pipeline enters the stage of code
generation, all the information will be in one instance of the ``CompileData``
record and the custom code generator back-end can process them as it would
see the whole program.
The custom back-end has the option to introduce some hierarchy for the functions
in different namespaces and organize some module structure to let the host
technology process the bits and pieces in different sized chunks.
However, this feature is not in the scope of the Idris compiler.
It is worth noting that modules can be mutually recursive in Idris.
So a direct compilation of Idris modules to modules in the host language
may be unsuccessful.
How to embed code snippets?
---------------------------
A possible motivation for implementing a custom back-end for Idris is to
generate code that is meant to be used in a larger project. This project
may be bound to another language that has many useful librarie but could
benefit from relying on Idris' strong type system in places.
When writing a code generator for this purpose, the interoperability of the
host technology and Idris based on the Foreign Interface can be inconvenient.
In this situation, the need to embed code of the host technology arises
naturally. Elaboration can be an answer for that.
Elaboration is a typechecking time code generation technique.
It relies on the ``Elab`` monad to write scripts that can interact with the
typechecking machinery to generate Idris code in ``Core.TT``.
When code snippets need to be embedded a custom library should be provided
with the custom back-end to turn the valid code snippets into their
representation in ``Core.TT``.
What should the runtime system support?
---------------------------------------
As a summary, a custom back-end for the Idris compiler should create an environment
in the host technology that is able to run Idris programs. As Idris is part of
the family of functional programming languages, its computation model is based
on graph reduction. Programs represented as simple graphs in the memory are based
on the closure creation mechanism during evaluation. Closure creation exist even
on the lowest levels of IRs. For that reason any runtime in
any host technology needs to support some kind of representation of closures
and be able to store them on the heap, thus the responsibility of memory management
falls on the lap of the implementor of the custom back-end. If the host technology
has memory management, the problem is not difficult. It is also likely
that storing closures can be easily implemented via the tools of the host technology.
Although it is not clear how much functionality a back-end should support.
Tools from the Scheme back-end are brought into the Idris world via external types and primitive operations
around them. This is a good practice and gives the community the ability to focus on
the implementation of a quick compiler for a dependently typed language.
One of these hidden features is the concurrency primitives. These are part of the
different libraries that could be part of the compiler or part of the
contribution package. If the threading model is different for the host technology
that the Idris default back-end inherits currently from the Scheme technology it could be a bigger
piece of work.
IO in Idris is implemented using an abstract ``%World`` value, which serves as token for
functions that operate interactively with the World through simple calls to the
underlying runtime system. The entry point of the program is the main function, which
has the type of the IO unit, such as ``main : IO ()``. This means that every
program which runs, starts its part of some IO computation. Under the hood this is
implemented via the creation of the ``%World`` abstract value, and invoking the main
function, which is compiled to pass the abstract %World value for IO related
foreign or external operations.
There is an operation called ``unsafePerformIO`` in the ``PrimIO`` module.
The type signature of ``unsafePerformIO`` tells us that it is capable of
evaluating an ``IO`` computation in a pure context.
Under the hood it is run in exactly the same way the ``main`` function is.
It manufactures a fresh ``%World`` token and passes it to the ``IO`` computations.
This leads to a design decision: How to
represent the state of the World, and how to
represent the world that is instantiated for the sake of the ``unsafePerformIO`` operation via the
``unsafeCreateWorld``? Both the mechanisms of ``main`` and ``unsafeCreateWorld``
use the ``%MkWorld`` constructor, which will be compiled to ``WorldVal`` and
its type to ``WorldType``, which means the implementation of the runtime
is responsible for creating the abstraction around the World. Implementation of an
abstract World value could be based on a singleton pattern, where we can have
just one world, or we could have more than one world, resulting in parallel
universes for ``unsafePerformIO``.
.. _SPLV20: https://www.youtube.com/playlist?list=PLmYPUe8PWHKqBRJfwBr4qga7WIs7r60Ql
.. _Elaboration: https://github.com/stefan-hoeck/idris2-elab-util/blob/main/src/Doc/Index.md