mirror of
https://github.com/urbit/ares.git
synced 2024-11-22 15:08:54 +03:00
125 lines
9.0 KiB
Markdown
125 lines
9.0 KiB
Markdown
|
# Compiling Nock to LLVM
|
||
|
|
||
|
Nock formulae are nouns, which are binary trees whose leaves are natural numbers.
|
||
|
The LLVM IR is a mostly sequential language comprised of operations arranged into basic blocks and functions.
|
||
|
|
||
|
In order to generate LLVM IR which executes a Nock formula, we can conceptually lower Nock to NockIR.
|
||
|
NockIR is a mostly-sequential language which contains a set of instructions optimized as a target for Nock.
|
||
|
It abstracts memory management and is intended to run with the stack allocator described in [memory.md](memory.md).
|
||
|
|
||
|
## The NockIR VM
|
||
|
The machine semantically has three registers.
|
||
|
Two of these registers, `sub` and `res`, contain noun values.
|
||
|
The third, `pc`, contains a function pointer.
|
||
|
|
||
|
- `sub` contains the subject of the current Nock computation.
|
||
|
- `res` stores the result of a completed Nock computation.
|
||
|
- `pc` can semantically be read to saved on the stack, and set to jump.
|
||
|
|
||
|
It is important to note that we do not actually implement `pc` directly in LLVM, rather we create new functions at save points and tail call to jump.
|
||
|
|
||
|
The machine also has a stack with the number of "slots" in a frame specified per-frame when it is pushed.
|
||
|
Each slot can hold a noun, the first slot (slot 0) can optionally hold a function pointer.
|
||
|
We reference slots in the current frame as `frame[n]` where `n` is the 0-based index of the slot.
|
||
|
|
||
|
We denote the NockIR for a nock formula as `ir[x]` where `x` is a formula.
|
||
|
|
||
|
NockIR provides the following instructions:
|
||
|
|
||
|
| Instruction | Operation
|
||
|
|-------------|----------------------------------------------------------------
|
||
|
| `axe[x]` | `res := /[x sub]`
|
||
|
| `cel[s]` | `res := [frame[s] res]`
|
||
|
| `puh[x]` | push a frame with x slots
|
||
|
| `pop` | pop a frame
|
||
|
| `put[s]` | `frame[s] := res`
|
||
|
| `get[s]` | `res := frame[s]`
|
||
|
| `sub` | `sub := res`
|
||
|
| `noc` | `sub := /[2 res]; res := /[3 res]`
|
||
|
| `sav[s]` | `frame[s] := sub`
|
||
|
| `reo[s]` | `sub := frame[s]`
|
||
|
| `con[x]` | `res := x`
|
||
|
| `cl?` | if res is a cell, `res := 0` else `res := 1`
|
||
|
| `inc` | `res := res + 1` (crash if cell)
|
||
|
| `eq?[s]` | if `frame[s] == res` then `res :=0` else `res := 1`
|
||
|
| `edt[x]` | `res := #[x res sub]`
|
||
|
| `ext` | `sub := [res sub]`
|
||
|
| `lnt` | `pc := ir[res]`
|
||
|
| `lnk` | `frame[0] := pc; pc := ir[res]`
|
||
|
| `don` | `pc = frame[0]`
|
||
|
| `br0[x,y]` | if `res` is 0, `pc = x`, if res is 1, `pc = y`, crash otherwise
|
||
|
| `spy` | External jump to a runtime-provided function producing a noun.
|
||
|
| `hns[b]` | Look up a static hint in the hint table, jump if entry
|
||
|
| `hnd[b]` | Look up a dynamic hint from res in the table, jump if entry.
|
||
|
|
||
|
The translation of Nock to NockIR takes the Nock formula plus two extra input bits, both of which are 0 if unspecified:
|
||
|
|
||
|
| Code generation | Generated code
|
||
|
|-------------------------|------------------------------------------------------------------------------
|
||
|
| `ir[0 0 [[b c] d]]` | `puh[1]; ir[1 1 [b c]]; put[0]; ir[0 1 d]; cel[0]; pop; don`
|
||
|
| `ir[s 1 [[b c] d]]` | `puh[1]; ir[1 1 [b c]]; put[0]; ir[s 1 d]; cel[0]; pop`
|
||
|
| `ir[0 0 [0 b]]` | `axe[b]; don`
|
||
|
| `ir[s 1 [0 b]]` | `axe[b]`
|
||
|
| `ir[0 0 [1 x]]` | `con[x]; don`
|
||
|
| `ir[s 1 [1 x]]` | `con[x]; don`
|
||
|
| `ir[0 0 [2 b c]]` | `puh[1]; ir[1 1 c]; put[0]; ir[0 1 b]; cel[0]; pop; noc; lnt`
|
||
|
| `ir[0 1 [2 b c]]` | `puh[2]; ir[1 1 c]; put[1]; ir[0 1 b]; cel[1]; noc; lnk; pop`
|
||
|
| `ir[1 1 [2 b c]] | `puh[2]; ir[1 1 c]; put[1]; ir[1 1 b]; cel[1]; sav[1]; noc; lnk; reo[1]; pop`
|
||
|
| `ir[0 0 [3 b]]` | `ir[0 1 b]; cl?; don`
|
||
|
| `ir[s 1 [3 b]]` | `ir[s 1 b]; cl?`
|
||
|
| `ir[0 0 [4 b]]` | `ir[0 1 b]; inc; don`
|
||
|
| `ir[s 1 [4 b]]` | `ir[s 1 b]; inc;`
|
||
|
| `ir[0 0 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[0 1 c]; eq?[0]; pop; don`
|
||
|
| `ir[s 1 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[s 1 c]; eq?[0]; pop`
|
||
|
| `ir[s t [6 b c d]]` | `ir[1 1 b]; br0[ir[s t c] ir[s t d]]`
|
||
|
| `ir[0 t [7 b c]]` | `ir[0 1 b]; sub; ir[0 t c]`
|
||
|
| `ir[1 1 [7 b c]]` | `puh[1]; ir[0 1 b]; sav[0]; sub; ir[0 1 c]; reo[0]; pop`
|
||
|
| `ir[0 t [8 b c]]` | `ir[1 1 b]; ext; ir[0 t c]`
|
||
|
| `ir[1 1 [8 b c]]` | `puh[1]; ir[0 1 b]; sav[0]; ext; ir[0 1 c]; reo[0]; pop`
|
||
|
| `ir[0 0 [9 b c]]` | `ir[0 1 c]; sub; axe[b]; lnt`
|
||
|
| `ir[0 1 [9 b c]]` | `puh[1]; ir[0 1 c]; sub; axe[b]; lnk; pop`
|
||
|
| `ir[1 1 [9 b c]]` | `puh[2]; sav[1]; ir[0 1 c]; sub; axe[b]; lnk; reo[1]; pop`
|
||
|
| `ir[0 0 [10 [b c] d]]` | `puh[1]; ir[1 1 d]; put[0]; ir[0 1 c]; reo[0]; edt[b]; pop; don`
|
||
|
| `ir[0 1 [10 [b c] d]]` | `puh[1]; ir[1 1 d]; put[0]; ir[0 1 c]; reo[0]; edt[b]; pop;`
|
||
|
| `ir[1 1 [10 [b c] d]]` | `puh[2]; sav[1]; ir[0 1 d]; put[0]; reo[1]; ir[0 1 c]; reo[0]; edt[b]; pop
|
||
|
| `ir[s t [11 [b c] d]]` | `ir[1 1 b]; hnd[b]; ir[s t d]`
|
||
|
| `ir[s t [11 b c]] | `hns[b]; ir[s t c]`
|
||
|
| `ir[0 0 [12 b c]] | `puh[1]; ir[1 1 b]; put[0]; ir[0 1 c]; cel[0]; spy; pop; don
|
||
|
| `ir[s 1 [12 b c]] | `puh[1]; ir[1 1 b]; put[0]; ir[s 1 c]; cel[0]; spy; pop
|
||
|
|
||
|
## From NockIR to LLVM IR
|
||
|
|
||
|
NockIR is not intended to be generated separately. Rather, each NockIR instruction is implemented as a builder for some sequence of LLVM IR.
|
||
|
|
||
|
The registers for the memory allocator (the stack and frame pointers) and the VM (the subject and result) are implemented in the LLVM IR by making each basic block an LLVM function. Within a function, each mutation to a register results in a new SSA register, and the previous registers are not used after the new assignment.
|
||
|
Branching is accomplished by means of a conditional tail cail to basic blocks which contain static, unconditional jumps to the common suffix of the branch, or `don` instructions otherwise.
|
||
|
|
||
|
### Calling convention for basic blocks
|
||
|
|
||
|
We employ the `cc 10` convention, supplied for use by the Glasgow Haskell Compiler, as it matches our own needs.
|
||
|
This ensures no registers are spilled to the (LLVM) stack, that parameters are passed in registers, and that tail calls are always tail-call-optimized, i.e. compiled to jumps.
|
||
|
|
||
|
Each function representing a basic block, takes the current stack pointer, current frame pointer, current subject, and current value of the result register as arguments.
|
||
|
|
||
|
Instructions which need to be implemented as functions, such as `axe` or `pop`, take these 4 arguments, plus a function pointer to tail-call to continue the basic block.
|
||
|
|
||
|
|
||
|
|
||
|
### Calls and returns
|
||
|
|
||
|
The NockIR provides the `lnt` and `lnk` instructions for tail and non-tail calls, respectively.
|
||
|
The `lnt` instruction is relatively the simpler: it invokes the runtime system to generate or fetch cached code for the noun in the `res` register, then executes a dynamic jump to this code.
|
||
|
The `lnk` instruction will be followed by another basic block, thus, it first saves the function pointer for that basic block in `frame[0]`; and then jumps to the code in question.
|
||
|
The `don` instruction simply looks up the function pointer in `frame[0]` and jumps to it.
|
||
|
Thus the outer frame for a computation installs a handler, matching the calling convention for basic blocks, which returns the result noun to the runtime system.
|
||
|
|
||
|
|
||
|
|
||
|
### `axe`
|
||
|
|
||
|
The `axe` instruction looks up an axis in the subject and places it in the result register.
|
||
|
It may crash if the axis is not valid for the noun in the subject register, the axis is 0, or the axis is a cell.
|
||
|
|
||
|
|
||
|
(TBW: other instructions)
|