From 70916720b44cba874c36845b9d1b219ea9fd38aa Mon Sep 17 00:00:00 2001 From: Edward Amsden Date: Tue, 29 Mar 2022 17:12:08 -0500 Subject: [PATCH] Design docs --- docs/heap.md | 62 +++++++++++ docs/llvm.md | 93 ++++++++++++++--- docs/memory.md | 213 -------------------------------------- docs/stack.md | 80 ++++++++++++++ docs/subject-knowledge.md | 128 +++++++++++++++++++++++ 5 files changed, 350 insertions(+), 226 deletions(-) create mode 100644 docs/heap.md delete mode 100644 docs/memory.md create mode 100644 docs/stack.md create mode 100644 docs/subject-knowledge.md diff --git a/docs/heap.md b/docs/heap.md new file mode 100644 index 0000000..077dcba --- /dev/null +++ b/docs/heap.md @@ -0,0 +1,62 @@ +# Heaps and persistence + +However, there are several reasons why we may wish to copy a noun off the stack and reference it during a computation. +Copying a large noun of the stack and maintaining a reference instead prevents repeated copying up the stack. +Fixing a location for a noun enables pointer equality to short-circuit equality checks. +Snapshotting and paging requires a fixed location in memory for a noun, unperturbed by computation. +Finally, the ability to attach metadata (hints, compiled code, reference counts, etc) to nouns is extremely helpful for runtime implementation, but costly if provided for every noun. +Allowing metadata to be attached to heap entries + +## Memory arenas +### Large objects +To prevent repeated copying of large objects, a temporary older-generation heap may suffice. +This is a heap used alongside a Nock stack (see [stack.md](stack.md)) to which a noun may be copied instead of copying it up to the calling frame. + +Each allocation entry records the location of the most senior pointer to the entry, and if that frame is popped or object freed without a copy of the pointer, the object can be freed. +Thus, we can use a traditional alloc/free approach with a free list for allocations, while not losing automatic and predictable memory management. + +#### Aside: unifying equality +The current vere (u3) implements unifying equality, meaning that when two nouns are discovered to be equal, a reference to one is replaced with a reference to the other. +This is not required by the Nock specification, which demands only structural equality, but is an obvious and nearly costless optimization to make to structural equality. + +However, we must be careful not to undo invariants that our memory system depends on. For the stack, this means that when replacing a reference to a noun with a structurally equal one, +we must take care to replace the reference to more junior (more recently pushed) stack frames with references to more senior stack frames. We must also take care not to introduce +references from any heap back to the stack, thus any heap must be treated as more senior (and thus chosen for unifying equality) than any stack frame. + +### Persistence and paging +The canonical persistence layer for any Urbit ship is the event log. +However, exclusive reliance on the event log requires us to permit the event log to grow without bound and to replay a ship's entire history every time it is restarted. +Fast restarts and event log pruning both rely on snapshots, which are persistent copies of the current Arvo state. + +Further, all data stored on an Urbit ship and accessible to it is present as nouns in the Arvo state. +Unless we wish to limit Urbit forever to persisting only as much data as can reasonably be stored in RAM, we need some mechanism for storing large parts of the Arvo state on disk, and loading them into memory only when needed. + +Nouns which need to be persisted, or which require metadata, are "canonicalized." +That is, they are assigned a memory address which is permanent for the lifetime of the noun. +Once this is done, the noun can be persisted on disk, and the physical memory freed for other uses. +The *virtual memory* space remains reserved for the noun at least as long as the noun is still reachable from the set of roots (see below). + +The persistent heap maintains a queue of which persisted nouns were recently touched, and evicts the pages for least recently used nouns (likely using `madvise()` with the `MADV_DONTNEED` advice on Linux) when the resident set size is too large. +User-space page fault handling (using `userfaultfd` on Linux or equivalent page fault trapping on other systems) is used to detect subsequent reads in a noun's virtual memory space and page the noun back in. + +The persistent heap also maintains a metadata record adjacent to each noun, which allows for: +- Reference counting, to ensure we can predictably drop nouns which are no longer referenced by roots. +- Code caching, for nouns representing Nock formulas +- Any other data helpful to the implementation. + +## Snapshotting +Snapshotting makes use of the current arena, storing a root for each snapshot. The snapshotting process ensures all nouns reachable from the snapshot synced to disk. + +## Event logging +TODO: Can we also use persistent noun storage for the event log? + +(TODO: strategy for maintaining use times which doesn't require faulting on every access) + +## Roots +### Always +- Any undiscarded snapshot +- Any unpruned event log entry (if using noun persistence for event loggin) + +### During running computations +- Any result or subject register +- Any slot in a Nock stack diff --git a/docs/llvm.md b/docs/llvm.md index 8604b6b..ca4d3e3 100644 --- a/docs/llvm.md +++ b/docs/llvm.md @@ -39,9 +39,9 @@ NockIR provides the following instructions: | `sav[s]` | `frame[s] := sub` | `reo[s]` | `sub := frame[s]` | `con[x]` | `res := x` -| `cl?` | if res is a cell, `res := 0` else `res := 1` +| `clq` | if res is a cell, `res := 0` else `res := 1` | `inc` | `res := res + 1` (crash if cell) -| `eq?[s]` | if `frame[s] == res` then `res :=0` else `res := 1` +| `eqq[s]` | if `frame[s] == res` then `res :=0` else `res := 1` | `edt[x]` | `res := #[x res sub]` | `ext` | `sub := [res sub]` | `lnt` | `pc := ir[res]` @@ -65,12 +65,12 @@ The translation of Nock to NockIR takes the Nock formula plus two extra input bi | `ir[0 0 [2 b c]]` | `puh[1]; ir[1 1 c]; put[0]; ir[0 1 b]; cel[0]; pop; noc; lnt` | `ir[0 1 [2 b c]]` | `puh[2]; ir[1 1 c]; put[1]; ir[0 1 b]; cel[1]; noc; lnk; pop` | `ir[1 1 [2 b c]] | `puh[2]; ir[1 1 c]; put[1]; ir[1 1 b]; cel[1]; sav[1]; noc; lnk; reo[1]; pop` -| `ir[0 0 [3 b]]` | `ir[0 1 b]; cl?; don` -| `ir[s 1 [3 b]]` | `ir[s 1 b]; cl?` +| `ir[0 0 [3 b]]` | `ir[0 1 b]; clq; don` +| `ir[s 1 [3 b]]` | `ir[s 1 b]; clq` | `ir[0 0 [4 b]]` | `ir[0 1 b]; inc; don` | `ir[s 1 [4 b]]` | `ir[s 1 b]; inc;` -| `ir[0 0 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[0 1 c]; eq?[0]; pop; don` -| `ir[s 1 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[s 1 c]; eq?[0]; pop` +| `ir[0 0 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[0 1 c]; eqq[0]; pop; don` +| `ir[s 1 [5 b c]]` | `puh[1]; ir[1 1 b]; put[0]; ir[s 1 c]; eqq[0]; pop` | `ir[s t [6 b c d]]` | `ir[1 1 b]; br0[ir[s t c] ir[s t d]]` | `ir[0 t [7 b c]]` | `ir[0 1 b]; sub; ir[0 t c]` | `ir[1 1 [7 b c]]` | `puh[1]; ir[0 1 b]; sav[0]; sub; ir[0 1 c]; reo[0]; pop` @@ -94,7 +94,7 @@ NockIR is not intended to be generated separately. Rather, each NockIR instructi The registers for the memory allocator (the stack and frame pointers) and the VM (the subject and result) are implemented in the LLVM IR by making each basic block an LLVM function. Within a function, each mutation to a register results in a new SSA register, and the previous registers are not used after the new assignment. Branching is accomplished by means of a conditional tail cail to basic blocks which contain static, unconditional jumps to the common suffix of the branch, or `don` instructions otherwise. -### Calling convention for basic blocks +## Calling convention for basic blocks We employ the `cc 10` convention, supplied for use by the Glasgow Haskell Compiler, as it matches our own needs. This ensures no registers are spilled to the (LLVM) stack, that parameters are passed in registers, and that tail calls are always tail-call-optimized, i.e. compiled to jumps. @@ -103,9 +103,7 @@ Each function representing a basic block, takes the current stack pointer, curre Instructions which need to be implemented as functions, such as `axe` or `pop`, take these 4 arguments, plus a function pointer to tail-call to continue the basic block. - - -### Calls and returns +## Calls and returns The NockIR provides the `lnt` and `lnk` instructions for tail and non-tail calls, respectively. The `lnt` instruction is relatively the simpler: it invokes the runtime system to generate or fetch cached code for the noun in the `res` register, then executes a dynamic jump to this code. @@ -113,12 +111,81 @@ The `lnk` instruction will be followed by another basic block, thus, it first sa The `don` instruction simply looks up the function pointer in `frame[0]` and jumps to it. Thus the outer frame for a computation installs a handler, matching the calling convention for basic blocks, which returns the result noun to the runtime system. - +## Instructions ### `axe` The `axe` instruction looks up an axis in the subject and places it in the result register. -It may crash if the axis is not valid for the noun in the subject register, the axis is 0, or the axis is a cell. +It will nock crash if the axis is not valid for the noun in the subject register. +### `cel` -(TBW: other instructions) +Creates a cell by allocating on the stack, with head from the given stack frame slot, and tail from the result register. +Cell is placed in the result register. + +### `puh` + +Pushes a new frame with a given number of slots onto the stack + +### `pop` +Pops a frame from the stack. This will cause a copy of stack-allocated nouns from within the current frame, +traced from the result register as a root, so that the result register remains valid. + +The noun in the result register will remain the same, but since it must be copied up the stack, the memory +representation may change + +### `put` +Save the noun in the result register to a frame slot. + +### `get` +Load the noun in a frame slot to the result register. + +### `sub` +Set the subject register to the noun in the result register. + +### `noc` +Set the subject register to the head of the cell in the result register + +**This operation assumes that the result register is a cell, and its behavior is undefined (e.g. not a nock crash) +if the result register is not a cell.** + +### `sav` +Save the noun in the subject register to a frame slot. + +### `reo` +Load a noun from a frame slot to the subject register + +### `con` +Load a noun immediate into the result register + +### `clq` +If the result register is a cell, set the result register to 0, else set it to 1 + +### `inc` +If the result register is an atom, increment it by one. If it is a cell, nock crash + +### `eqq` +Unifying equality between the noun in the given slot and the noun in the result register. +Result register set to `0` if equal and `1` if not equal + +### `edt` +Edit the noun in the subject register by replacing the subnoun at the given axis with the noun in the result register. +Place the resulting noun in the result register (subject register is unmodified). + +Nock crash if axis is not valid for the noun. + +### `ext` +Allocate a cell with the noun in the result register as a head, and the noun in the subject register as a tail. +Place the cell in the subject register. + +### `lnt` +Codegen and evaluate the noun in the result register with the current subject, in tail position. + +### `lnk` +Codegen and evaluate the noun in the result register, returning to the instruction following `lnk` + +### `don` +Return to just past the calling `lnk` instruction or the outer virtualization context. + +### `br1` +Continue if the result register is 0, branch to the given label if it is 1. Nock crash otherwise diff --git a/docs/memory.md b/docs/memory.md deleted file mode 100644 index 1d710e6..0000000 --- a/docs/memory.md +++ /dev/null @@ -1,213 +0,0 @@ -# New Mars: Memory layout and representation - -This document describes the current (2022/01/13) state of the memory layout and noun representation for New Mars. - -## Noun representation - -Nouns are represented as machine words with the MSBs as tag bits. -By treating a 0 MSB as the tag for a direct atom, we can compute directly with direct atoms without masking the tag. - -|------|-----------------------| -| MSBs | Noun | -| 0 | Direct Atom | -| 10 | Cell Pointer | -| 110 | Indirect Atom Pointer | -|------|-----------------------| - -A direct atom is an atom which fits in a machine word, less one bit for the tag. It is stored directly. - -An indirect atom is an atom which is too big to be a direct atom. -It is thus represented as a tagged pointer. -The memory referenced is at least 64-bit aligned. -The first 64 bits are the size of the atom in bytes. -This is followed by that number of bytes containing the atom in little-endian order. - -A cell is represented as a tagged pointer. -The memory referenced is adjacent machine words. -The machine word at the pointer is the noun representation of the head (left side/first) of the cell. -The machine word directly following (higher in memory) is the noun representation of the tail (right side/second) of the cell. - -During collection the memory for indirect atoms and cells may be rewritten to indicate a _forwarding pointer_. -This is discussed further in the "Memory Layout" section. - -## Memory Layout - -Computation proceeds in a memory arena, using two stacks growing towards each other from opposite ends. -The stack growing from lower memory to higher is termed the "west" stack and lower memory is directionally "west." -The stack growing from higher memory to lower is termed the "east" stack and, similarly, higher memory is directionally "east." - -Computational frames are always pushed onto the opposing stack from the current frame. -Thus the two stacks logically form one single computational stack. -The stack on which the current frame is located is termed the "polarity". -Thus, the current stack frame may be said to have "west" or "east" polarity. - -Two pointers are kept in registers. -The "frame pointer" is changed only when pushing a new stack frame. -When the current polarity is west, it points to the westernmost byte in the current frame. -Slots relative to the frame (for register saves or return addresses) are addressed by offsetting eastward. -When the current polarity is east, it points to the easternmost byte _west of_ the current frame. -Slots relative to the frame are in this case addressed by offsetting westward. -Note that we do not reverse which "end" of a machine word we address by the polarity. - -The "stack pointer" is changed when pushing a new stack frame, and also to allocate memory. -When the current polarity is west, it points to the westernmost byte east of the current frame. -When the current polarity is east, it points to the westernmost byte *of* the current frame. - -When pushing a new frame, the frame pointer is always set to the stack pointer of the *previous* frame to the current. -This achieves the polarity reversal. -The stack pointer is offset east of the frame pointer by the machine words necessary (for a west frame) or west of the frame pointer (for an east frame). -Note that the bytecode instruction for a push implicitly adds two machine words to the requested stack frame size. -This is because each frame saves the previous frame's stack and frame pointer as its first and second word slots, -respectively. -This provides both a return mechanism (restore stack and frame pointer from current frame, flip polarity), and -a polarity-swapping push (frame pointer is previous stack pointer, stack pointer offset from frame pointer, flip polarity. - -Allocation is achieved by saving the stack pointer, moving it east, and returning the saved pointer (for a west frame) or by moving the stack pointer west and returning the result (for an east frame). - -When popping a frame, the result register is examined to provide a collection root. -Pointers which fall within the current frame are recursively chased and copied to the parent frames allocation area, bumping the saved stack pointer as this progresses. This ensures that data allocated in this frame (or a child frame and then copied here) which is live from the returned result is not lost when the frame is popped. - -The collector makes use of memory just beyond the returning frame as an ephemeral stack, containing noun words to -be copied and a destination to copy them to. -Copying a cell results in a new cell allocation on the parent memory, and pushing both constituent nouns onto the -stack with the head and tail of the cell as destinations. -Copying a direct atom is just a matter of writing it to the destination. -Copying an indirect atom involves an allocation, copying of the atom's memory, and writing the new allocation to the saved destination. - -Frames to which a call may return should reserve the first frame slot (after the frame and stack pointers) for the return address. -The `lnk` instruction (see Bytecode Representation, below) will save the current program counter in this slot. -The `don` instruction will read the program counter from this register and jump to it. -_Note that this makes it the ***caller's*** responsibility to push a new frame for a non-tail call._ - -## Bytecode Representation - -The virtual machine which runs our bytecode provides a layer of abstraction from some details of the memory layout. -Allocation is performed by bytecode operations and the stack discipline above is relied on to free unused memory. -This allows us to simply consider registers and machine-word-sized frame slots as containing nouns. - -The virtual machine for the bytecode has three registers: `sub`, for current subject noun, `res`, for the -result of running a formula, and `pc`, for the current position in the bytecode. -The contents of the PC register are opaque to bytecode programs and can only be stored to memory as part of an -`lnk` instruction or loaded from memory as part of a `don` instruction. - -Note that Nock allows for an arbitrary noun to be read computed and then treated as a Nock formula. -Of course, if a noun is not a valid nock formula (e.g. `2` or `[14 0]`) this causes a crash. -But if it is a valid Nock formula, we may not have bytecode already generated for it. - -New Mars looks forward to caching bytecode, but for now we will simply assume that it is generated each time a nock noun is evaluated as a formula. -In either case, there is need of some mechanism to transfer control between linear sequences of bytecode, invoking -a sequence from some point within another, and returning to that point when the invoked sequence completes. -Further, we need to support *tail calls* to permit recursion without linear memory usage for stale frames. - -The `puh` instruction permits us to push a frame with a certain number of "slots". -In the underlying representation, this number will be increased by 2, in order to store the previous stack and frame -pointers. -However this detail is hidden from programs in the bytecode, for which slot 0 is the first slot after the -stack and frame pointers, slot one the second, and so on. - -The `pop` instruction pops a frame, and performs the limited copying collection described above, using the contents of the `res` register as a root. - -The `lnk` instruction saves the current `pc` register into the `0` slot of the frame, and then jumps to the bytecode for the noun in the `res` register. -(This may involve generating the bytecode, or it may be cached.) -By doing so, it sets up the bytecode for the formula in `res` to return to the following instruction, with its result in `res`. - -The `don` instruction accomplishes this return. -It simply loads the `0` slot of the frame into the PC register. - -The `lnt` instruction jumps to bytecode for a formula, as in `lnk`, but *does not save the pc*. -It is intended for tail calls where the result of the called formula is the result of the current formula. - -Thus any the bytecode for any formula compiled either as the initial computation or as the result of being invoked from the `res` register by `lnk` or `lnt` ends with either `lnt` or `don`. - -The bytecode instructions are as follows: - -| Name | Opcode | Description -|----------|--------------|------------ -| `nop` | 0x0 | Do nothing -| `axe[x]` | 0xC1 | Look up an axis in the subject, place in the result -| `cel[x]` | 0x82 | Allocate a cell, use the given frame slot as the head and the `res` register as the tail, writing the allocated cell to `res`. -| `con[x]` | 0xC3 | Place a constant in the `res` register -| `cl?` | 0x4 | Write `0` to `res` if `res` is a cell, `1` otherwise -| `inc` | 0x5 | Increment `res` if `res` is an atom. Crash otherwise -| `eq?[x]` | 0x86 | Write `0` to `res` if the nouns in the given frame slot and `res` are structurally equal -| `puh[x]` | 0x87 | Push a frame with the given number of slots -| `pop` | 0x8 | Pop a frame -| `put[x]` | 0x89 | Write the noun in `res` to the given frame slot -| `get[x]` | 0x8A | Read the noun in the given frame slot to `res` -| `sav[x]` | 0x8B | Write the noun in `sub` to the given frame slot -| `reo[x]` | 0x8C | Read the noun in the given frame slot to `sub -| `sub[x]` | 0xD | Set the `sub` register equal to the `res` register (1) -| `ext[x]` | 0xE | Allocate a cell, use the `res` register as the head and the `sub` register as the tail, writing the allocated cell to `sub`. -| `edt[x]` | 0xCF | Overwrite the given axis in the noun in `sub` with the noun in `res`, placing the result in `res`. -| `br1[x]` | 0x90 or 0xD0 | Skip over the given number of following instructions if `res` is 1, skip none if res is 0. Crash otherwise. (2) -| `bru[x]` | 0x91 or 0xD1 | Skip over the given number of following instructions. (2) -| `lnt` | 0x12 | Compile or read cache of bytecode for formula noun in `res` and jump -| `lnk` | 0x13 | Compile or read cache of bytecode for formula noun in `res`, save pc to frame slot `0`, and jump -| `don` | 0x14 | Restore pc from frame slot 0 -| `spy` | 0x15 | Ask runtime to query its namespace with cell from `res` and place result in `res`. - -### Compiling formulae to bytecode - -The bytecode compiler uses a "caller-save" convention for registers, including the program counter. -This means that we must track whether it is necessary to save the program counter (tail position or not) and the subject register. -Thus, the abstract procedure for compiling Nock formulae to bytecode is specified with two additional bits of input. - -One bit, when 0, specifies that the subject need not be saved prior to modification. -This is true either in tail position (since when the code is finished its subject will no longer be needed) or when the code will be immediately followed by a `sub` or `reo` instruction which will overwrite the subject. -When this bit is 1, it specifies that the subject should be saved prior to modification, and thus any overwriting of the subject should be preceded by a `sav` instruction and followed eventually by a `reo` instruction. - -The other bit, when 0, specifies that this code is being generated for "tail position", meaning that its result will be the final result either of an external invocation or a dynamic call. -Code is tail position when it is the code generated for the outermost formula of a compilation, or it is a recursive invocation to compile a subformula with no additional instructions appended after the recursive invocation. - -For instance, when compiling Nock 7, if the Nock 7 is in tail position, then firstly, no frame is pushed as no registers need to be saved. -The first subformula is compiled with bits 0,1 specifying that the subject may be mangled. -A `sub` instruction is appended, and then the second subformula is compiled with bits 0,0, specifying both that the subject may be mangled and that the second subformula is in tail position. -The result of this subcomputation will be the result of the outer computation, with no additional steps. - -When compiling Nock 2 or Nock 9 in tail position, we take care that any frames pushed are popped (necessary for 2 as we must save the first subformula's result while we compute the second), properly set the `sub` and `res` registers, and then generate an `lnt` instruction which ends the code for our current outer formula. -Thus the result of our outer formula will be the result of whichever formula was invoked by the Nock 2 or Nock 9. - -When compiling Nock 2 or Nock 9 in a non-tail position, we must push a frame to ensure that frame slot 0 is available for the `lnk` instruction to save the return in, and pop this frame only after control returns from the invoked formula. -If it is necessary that the subject should not be mangled, it is the caller's responsibility (i.e. the responsibility of the code generated for Nock 2 or Nock 9) to save the subject in this frame and restore it after control returns. - -The following is the complete definition of the abstract function for compiling Nock formulae to this bytecode. -A `;` represents concatenation of bytecode sequences. -Note that when this function is applied to produce bytecode for an external computation of a Nock formula, or a formula invoked by Nock 2 or Nock 9, it is invoked with both extra bits true/unset/0. - -The `len[]` function gives the offset which when provided to br1 or bru would skip over the input instruction sequence. - -| Nock | S | T | bc[s t n] -|----------------|---|---|--------- -| `[[b c] d]` | * | 0 | `puh[1]; bc[1 1 [b c]]; put[0]; bc[s 1 d]; cel[0]; pop; don` -| `[[b c] d]` | * | 1 | `puh[1]; bc[1 1 [b c]]; put[0]; bc[s 1 d]; cel[0]; pop` -| `[0 b]` | * | 0 | `axe[b]; don` -| `[0 b]` | * | 1 | `axe[b]` -| `[1 c]` | * | 0 | `con[c]; don` -| `[1 c]` | * | 1 | `con[c]` -| `[2 b c]` | 0 | 0 | `puh[1]; bc[1 1 [c]]; put[0]; bc[0 1 b]; sub; get[0]; pop; lnt` COMMENT: does this require we treat sub as a GC root? probably? -| `[2 b c]` | 0 | 1 | `puh[2]; bc[1 1 [c]]; put[1]; bc[0 1 b]; sub; get[0]; lnk; pop` -| `[2 b c]` | 1 | 1 | `puh[3]; sav[2]; bc[0 1 c]; put[1]; reo[2]; bc[0 1 b]; sub; get[1]; lnk; reo[2]; pop` -| `[3 b]` | 0 | 0 | `bc[0 1 b]; cl?; don` -| `[3 b]` | * | 1 | `bc[s 1 b]; cl?` -| `[4 b]` | 0 | 0 | `bc[0 1 b]; inc; don` -| `[4 b]` | * | 1 | `bc[s 1 b]; inc; don` -| `[5 b c]` | 0 | 0 | `puh[1]; bc[1 1 b]; put[0]; bc[0 1 c]; eq?[0]; pop; don` -| `[5 b c]` | * | 1 | `puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; eq?[0]; pop` -| `[6 b c d] | 0 | 0 | `bc[1 1 b]; br1[len[bc[0 0 c]]]; bc[0 0 c]; bc[0 0 d]` -| `[6 b c d] | * | 1 | `bc[1 1 b]; br1[len[bc[s 1 c]; bru[len[bc[s 1 d]]]]]; bc[s 1 c]; bru[len[bc[s 1 d]]]; bc[s 1 d]` -| `[7 b c]` | 0 | * | `bc[0 1 b]; sub; bc[0 t c]` -| `[7 b c]` | 1 | 1 | `puh[1]; sav[0]; bc[0 1 b]; sub; bc[0 1 c]; reo[0]; pop` -| `[8 b c]` | 0 | * | `bc[1 1 b]; ext; bc[0 t c]` -| `[8 b c]` | 1 | 1 | `puh[1]; sav[0]; bc[0 1 b]; reo[0]; ext; bc[0 1 c]; reo[0]; pop` -| `[9 b c]` | 0 | 0 | `bc[0 1 c]; sub; axe[b]; lnt` -| `[9 b c]` | 0 | 1 | `puh[1]; bc[0 1 c]; sub; axe[b]; lnk; pop` -| `[9 b c]` | 1 | 1 | `puh[2]; sav[1]; bc[0 1 c]; sub; axe[b]; lnk; reo[1]; pop` -| `[10 [b c] d]` | 0 | 0 | `puh[1]; bc[1 1 d]; put[0]; bc[0 1 c]; reo[0]; edt[b]; pop; don` -| `[10 [b c] d]` | 0 | 1 | `puh[1]; bc[1 1 d]; put[0]; bc[0 1 c]; reo[0]; edt[b]; pop` -| `[10 [b c] d]` | 1 | 1 | `puh[2]; sav[1]; bc[0 1 d]; put[0]; reo[1]; bc[0 1 c]; reo[0]; edt[b]; reo[1]; pop` -| `[11 [b c] d]` | * | * | `bc[1 1 c]; hns[b]; hnd; bc[s t d]` -| `[11 b c]` | * | * | `hns[b]; bc[s t c]` -| `[12 b c]` | 0 | 0 | `puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; cel[0]; pop; spy; don` -| `[12 b c]` | * | 1 | `puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; cel[0]; pop; spy` - - diff --git a/docs/stack.md b/docs/stack.md new file mode 100644 index 0000000..4082e82 --- /dev/null +++ b/docs/stack.md @@ -0,0 +1,80 @@ +# New Mars: Memory layout and representation + +This document describes the current (2022/01/13) state of the memory layout and noun representation for New Mars. + +## Noun representation + +Nouns are represented as machine words with the MSBs as tag bits. +By treating a 0 MSB as the tag for a direct atom, we can compute directly with direct atoms without masking the tag. + +|------|-----------------------| +| MSBs | Noun | +| 0 | Direct Atom | +| 10 | Cell Pointer | +| 110 | Indirect Atom Pointer | +|------|-----------------------| + +A direct atom is an atom which fits in a machine word, less one bit for the tag. It is stored directly. + +An indirect atom is an atom which is too big to be a direct atom. +It is thus represented as a tagged pointer. +The memory referenced is at least 64-bit aligned. +The first 64 bits are the size of the atom in bytes. +This is followed by that number of bytes containing the atom in little-endian order. + +A cell is represented as a tagged pointer. +The memory referenced is adjacent machine words. +The machine word at the pointer is the noun representation of the head (left side/first) of the cell. +The machine word directly following (higher in memory) is the noun representation of the tail (right side/second) of the cell. + +During collection the memory for indirect atoms and cells may be rewritten to indicate a _forwarding pointer_. +This is discussed further in the "Memory Layout" section. + +## Memory Layout + +Computation proceeds in a memory arena, using two stacks growing towards each other from opposite ends. +The stack growing from lower memory to higher is termed the "west" stack and lower memory is directionally "west." +The stack growing from higher memory to lower is termed the "east" stack and, similarly, higher memory is directionally "east." + +Computational frames are always pushed onto the opposing stack from the current frame. +Thus the two stacks logically form one single computational stack. +The stack on which the current frame is located is termed the "polarity". +Thus, the current stack frame may be said to have "west" or "east" polarity. + +Two pointers are kept in registers. +The "frame pointer" is changed only when pushing a new stack frame. +When the current polarity is west, it points to the westernmost byte in the current frame. +Slots relative to the frame (for register saves or return addresses) are addressed by offsetting eastward. +When the current polarity is east, it points to the easternmost byte _west of_ the current frame. +Slots relative to the frame are in this case addressed by offsetting westward. +Note that we do not reverse which "end" of a machine word we address by the polarity. + +The "stack pointer" is changed when pushing a new stack frame, and also to allocate memory. +When the current polarity is west, it points to the westernmost byte east of the current frame. +When the current polarity is east, it points to the westernmost byte *of* the current frame. + +When pushing a new frame, the frame pointer is always set to the stack pointer of the *previous* frame to the current. +This achieves the polarity reversal. +The stack pointer is offset east of the frame pointer by the machine words necessary (for a west frame) or west of the frame pointer (for an east frame). +Note that the bytecode instruction for a push implicitly adds two machine words to the requested stack frame size. +This is because each frame saves the previous frame's stack and frame pointer as its first and second word slots, +respectively. +This provides both a return mechanism (restore stack and frame pointer from current frame, flip polarity), and +a polarity-swapping push (frame pointer is previous stack pointer, stack pointer offset from frame pointer, flip polarity. + +Allocation is achieved by saving the stack pointer, moving it east, and returning the saved pointer (for a west frame) or by moving the stack pointer west and returning the result (for an east frame). + +When popping a frame, the result register is examined to provide a collection root. +Pointers which fall within the current frame are recursively chased and copied to the parent frames allocation area, bumping the saved stack pointer as this progresses. This ensures that data allocated in this frame (or a child frame and then copied here) which is live from the returned result is not lost when the frame is popped. + +The collector makes use of memory just beyond the returning frame as an ephemeral stack, containing noun words to +be copied and a destination to copy them to. +Copying a cell results in a new cell allocation on the parent memory, and pushing both constituent nouns onto the +stack with the head and tail of the cell as destinations. +Copying a direct atom is just a matter of writing it to the destination. +Copying an indirect atom involves an allocation, copying of the atom's memory, and writing the new allocation to the saved destination. + +Frames to which a call may return should reserve the first frame slot (after the frame and stack pointers) for the return address. +_Note that this makes it the ***caller's*** responsibility to push a new frame for a non-tail call._ + + diff --git a/docs/subject-knowledge.md b/docs/subject-knowledge.md new file mode 100644 index 0000000..34de825 --- /dev/null +++ b/docs/subject-knowledge.md @@ -0,0 +1,128 @@ +# Subject Knowledge Analysis + +Nock is inherently a very dynamic language. +Nock has no concept corresponding directly to a procedure call, only to an "eval" of a dynamically-computed noun as a Nock formula. +Actual procedure calls are implemented by looking up a noun corresponding to a formula from a core (a pair of a "battery" of Nock formulas and a pair of "sample" and context), and evaluating that formula with the core as the subject. +This is the semantics of Nock 9. + +This presents a very difficult challenge for jetting and code generation of Nock, as in general the semantics of any formula can depend arbitrarily on its subject. +However, it is not sufficient to also fix the subject, as the subject must vary to store state and allow a core to receive input (e.g. a "gate" or one-armed core is "slammed" by altering the sample prior to invoking the arm of the core). +In general, we need to know which parts of the subject are fixed and which can vary while retaining the correspondence of a formula to a jet or to generated code. + +## Partial nouns +The subject of any Nock computation is a noun. +In order to do any analysis which partially fixes the subject, we must have a partial representation of nouns. +We can represent this as a Hoon type, which we denote `sock`: + +``` +|% ++$ sock + $% [%know k=*] + [%bets p=sock q=sock] + [%dice ~] + [%gues ~] + == +-- +``` + +The `%know` case represents full knowledge of a noun, and simply records the noun. +The `%bets` case records partial knowledge of a cell, recursively described by the `p` (head) and `q` (tail) faces. +The `%dice` case records knowledge that a noun is an atom, but without knowledge of its value. +The `%gues` case records the absence of knowledge of a noun, it could be any noun whatsoever. + +A `sock` can be recursively normalized: a `%bets` cell consisting of two `%know` nouns normalizes to a `%know` of the cell. +Other cases can be optimized but this throws away structural information. + +The object of our analysis is to discover such a partial noun representing what can be known at the subject during a particular Nock computation. + +## Subject Knowledge Analysis +**TODO: safety analysis** + +Subject Knowledge Analysis (SKA) is meant to run alongside code generation and produce a representation of what is known of the subject at each point as code generation traverses the Nock formula. +Thus, intermediate results should not be stored, and the analysis does not add an extra iteration of the tree. Rather, it describes how knowledge propagates through the Nock formula tree. + +The *input* to each analysis is the knowledge of the subject, and the *output* is knowledge of the result, given the input subject knowledge. +This may seem counterintuitive, but the algorithm describes how this knowledge is propagated such that it may be used by code generation, so it describes how the input to successive traversal steps is generated. + +The Nock primitives `/[a b]` and `#[[a b] c]` extend to socks in the obvious way, giving a crash when it can be *known* that the computation would crash, e.g. when attempting to access a non-1 axis of an atom, or +when an axis is not an atom. + +### Cell +SKA of a cell formula applies the analysis to both formulae with the same subject knowledge input, then combines their inputs into a `%bets` cell and normalizes the result. + +### 0 +SKA of 0 applies the `/` primitive as far as it can be applied. + +### 1 +SKA of 1 returns `%know` of the supplied constant, ignoring the input. + +### 2 +SKA of 2 analyzes the second subformula first. If it is `%know` of some noun, then the first subformula is analyzed and its result used as the subject to continue the analysis into the known formula. +If the formula isn't known, then the result is `[%gues ~]`, though the first subformula may still be analyzed for code-generation purposes. + +### 3 +SKA of 3 recursively analyzes the subformula, then examines what is known of the structure of the noun. Specifically, for the %know, %bets, or %dice cases, it is known whether the result is a cell or an atom, and thus +the analysis can produce `[%know 0]` for a known cell or `[%know 1]` for a known atom. For the `[%gues ~]` result, however, we must produce `[%dice ~]` + +### 4 +SKA of 4 *intentionally* does not produce the obvious result, in order to short-circuit full analysis of functions which should be jetted. +If the result of the subformula is known to be a cell, we can produce knowledge of a crash. +Otherwise the result is `[%dice ~]` whether or not the atom result of the subformula is known. +This permits memoization to prevent analysis of recursive Nock 4 from degenerating into full evaluation of a function. + +### 5 +SKA of 5 analyzes both subformulas. +If the results of both are fully known, then `[%know 0]` or `[%know 1]` can be returned for equality or inequality. +If either result is not fully known, then the result is `[%dice ~]` + +### 6 +SKA of 6 first analyzes the first (test) subformula. +If the result is known to be 0, it must only analyze the second (true-branch) subformula. +If the result is known to be 1, it must only analyze the second (false-branch) subformula. +If the result is known to be some other atom, or a cell, it must produce knowledge of a crash. +Otherwise, the result is the _intersection_ of the knowledge produced by analyzing both branches. + +The intersection of two nouns produces knowledge where they agree, and ignorance where they disagree. +This is true for both structure and content. See _Sock intersection_ below for the full semantics. + +### 7 +SKA of 7 analyzes the first subformula, then uses its result as the subject input to the analysis of the second formula. + +### 8 +SKA of 8 is similar to SKA of 7 but the result of the subject formula must then be consed with the input subject, in the manner described for cell formulas. + +### 9 +SKA of 9 analyzes the subformula, then applies the `/` primitive as in 0. +If the result is known, we can continue to analyze the resulting noun as a formula, using the computed core as a subject. +Otherwise we must simply return `[%gues ~]`. + +### 10 +SKA of 10 analyzes both the tree and patch subformulae, and returns a result representing the partial knowledge of the patch inserted into the partial knowledge of the tree. + +### 11 +SKA of 11 may analyze dynamic hints, but does not strictly need to as SKA is not required to inform of crashes. + +### 12 +SKA treets all Nock 12 computations as returning `[%gues ~]` + +## Operations on socks + +### Cell construction +When analyzing a Nock computation, it may be necessary to combine partial knowledge of two nouns into partial knowledge of a cell constructed from those two nouns. +This is done by creating a `%bets` sock combining the two and then normalizing. + +### Axis lookup +Looking up an axis into a sock should return whatever can be known about what looking up an axis into the full noun would return. +The case for `%know` is trivial, and `%bets` likewise has an obvious interpretation. +Looking up a non-1 axis into a known atom or into a `[%dice ~]` sock should produce knowledge of a crash. +Looking up an axis into `[%gues ~]` should produce `[%gues ~]` + +### Edit +Editing a sock at an axis involves constructing the path to the patch (described by the edit axis) from `%bets` cells, the last of which has as its head or tail (depending on the axis) the patch sock. +Heads or tails at other levels are straightforwardly constructed from the sock being edited. +Descending into a `[%dice ~]` case produces knowledge of a crash. +Descending into a `[%gues ~]` case causes each branch off the path to the path to also be `[%gues ~]`. + +### Sock intersection +Intersecting two socks produces a sock with knowledge where the intersected socks agree, and ignorance where one represents ignorance or the socks disagree. +Intersection descends into `%know` cases in order to produce the maximum degree of agreed-upon knowledge.