ares/docs/memory.md
2022-03-23 11:59:45 -05:00

16 KiB

New Mars: Memory layout and representation

This document describes the current (2022/01/13) state of the memory layout and noun representation for New Mars.

Noun representation

Nouns are represented as machine words with the MSBs as tag bits. By treating a 0 MSB as the tag for a direct atom, we can compute directly with direct atoms without masking the tag.

|------|-----------------------| | MSBs | Noun | | 0 | Direct Atom | | 10 | Cell Pointer |

110 Indirect Atom Pointer

A direct atom is an atom which fits in a machine word, less one bit for the tag. It is stored directly.

An indirect atom is an atom which is too big to be a direct atom. It is thus represented as a tagged pointer. The memory referenced is at least 64-bit aligned. The first 64 bits are the size of the atom in bytes. This is followed by that number of bytes containing the atom in little-endian order.

A cell is represented as a tagged pointer. The memory referenced is adjacent machine words. The machine word at the pointer is the noun representation of the head (left side/first) of the cell. The machine word directly following (higher in memory) is the noun representation of the tail (right side/second) of the cell.

During collection the memory for indirect atoms and cells may be rewritten to indicate a forwarding pointer. This is discussed further in the "Memory Layout" section.

Memory Layout

Computation proceeds in a memory arena, using two stacks growing towards each other from opposite ends. The stack growing from lower memory to higher is termed the "west" stack and lower memory is directionally "west." The stack growing from higher memory to lower is termed the "east" stack and, similarly, higher memory is directionally "east."

Computational frames are always pushed onto the opposing stack from the current frame. Thus the two stacks logically form one single computational stack. The stack on which the current frame is located is termed the "polarity". Thus, the current stack frame may be said to have "west" or "east" polarity.

Two pointers are kept in registers. The "frame pointer" is changed only when pushing a new stack frame. When the current polarity is west, it points to the westernmost byte in the current frame. Slots relative to the frame (for register saves or return addresses) are addressed by offsetting eastward. When the current polarity is east, it points to the easternmost byte west of the current frame. Slots relative to the frame are in this case addressed by offsetting westward. Note that we do not reverse which "end" of a machine word we address by the polarity.

The "stack pointer" is changed when pushing a new stack frame, and also to allocate memory. When the current polarity is west, it points to the westernmost byte east of the current frame. When the current polarity is east, it points to the westernmost byte of the current frame.

When pushing a new frame, the frame pointer is always set to the stack pointer of the previous frame to the current. This achieves the polarity reversal. The stack pointer is offset east of the frame pointer by the machine words necessary (for a west frame) or west of the frame pointer (for an east frame). Note that the bytecode instruction for a push implicitly adds two machine words to the requested stack frame size. This is because each frame saves the previous frame's stack and frame pointer as its first and second word slots, respectively. This provides both a return mechanism (restore stack and frame pointer from current frame, flip polarity), and a polarity-swapping push (frame pointer is previous stack pointer, stack pointer offset from frame pointer, flip polarity.

Allocation is achieved by saving the stack pointer, moving it east, and returning the saved pointer (for a west frame) or by moving the stack pointer west and returning the result (for an east frame).

When popping a frame, the result register is examined to provide a collection root. Pointers which fall within the current frame are recursively chased and copied to the parent frames allocation area, bumping the saved stack pointer as this progresses. This ensures that data allocated in this frame (or a child frame and then copied here) which is live from the returned result is not lost when the frame is popped.

The collector makes use of memory just beyond the returning frame as an ephemeral stack, containing noun words to be copied and a destination to copy them to. Copying a cell results in a new cell allocation on the parent memory, and pushing both constituent nouns onto the stack with the head and tail of the cell as destinations. Copying a direct atom is just a matter of writing it to the destination. Copying an indirect atom involves an allocation, copying of the atom's memory, and writing the new allocation to the saved destination.

Frames to which a call may return should reserve the first frame slot (after the frame and stack pointers) for the return address. The lnk instruction (see Bytecode Representation, below) will save the current program counter in this slot. The don instruction will read the program counter from this register and jump to it. Note that this makes it the caller's responsibility to push a new frame for a non-tail call.

Bytecode Representation

The virtual machine which runs our bytecode provides a layer of abstraction from some details of the memory layout. Allocation is performed by bytecode operations and the stack discipline above is relied on to free unused memory.
This allows us to simply consider registers and machine-word-sized frame slots as containing nouns.

The virtual machine for the bytecode has three registers: sub, for current subject noun, res, for the result of running a formula, and pc, for the current position in the bytecode. The contents of the PC register are opaque to bytecode programs and can only be stored to memory as part of an lnk instruction or loaded from memory as part of a don instruction.

Note that Nock allows for an arbitrary noun to be read computed and then treated as a Nock formula. Of course, if a noun is not a valid nock formula (e.g. 2 or [14 0]) this causes a crash. But if it is a valid Nock formula, we may not have bytecode already generated for it.

New Mars looks forward to caching bytecode, but for now we will simply assume that it is generated each time a nock noun is evaluated as a formula. In either case, there is need of some mechanism to transfer control between linear sequences of bytecode, invoking a sequence from some point within another, and returning to that point when the invoked sequence completes. Further, we need to support tail calls to permit recursion without linear memory usage for stale frames.

The puh instruction permits us to push a frame with a certain number of "slots". In the underlying representation, this number will be increased by 2, in order to store the previous stack and frame pointers. However this detail is hidden from programs in the bytecode, for which slot 0 is the first slot after the stack and frame pointers, slot one the second, and so on.

The pop instruction pops a frame, and performs the limited copying collection described above, using the contents of the res register as a root.

The lnk instruction saves the current pc register into the 0 slot of the frame, and then jumps to the bytecode for the noun in the res register. (This may involve generating the bytecode, or it may be cached.) By doing so, it sets up the bytecode for the formula in res to return to the following instruction, with its result in res.

The don instruction accomplishes this return. It simply loads the 0 slot of the frame into the PC register.

The lnt instruction jumps to bytecode for a formula, as in lnk, but does not save the pc. It is intended for tail calls where the result of the called formula is the result of the current formula.

Thus any the bytecode for any formula compiled either as the initial computation or as the result of being invoked from the res register by lnk or lnt ends with either lnt or don.

The bytecode instructions are as follows:

Name Opcode Description
nop 0x0 Do nothing
axe[x] 0xC1 Look up an axis in the subject, place in the result
cel[x] 0x82 Allocate a cell, use the given frame slot as the head and the res register as the tail, writing the allocated cell to res.
con[x] 0xC3 Place a constant in the res register
cl? 0x4 Write 0 to res if res is a cell, 1 otherwise
inc 0x5 Increment res if res is an atom. Crash otherwise
eq?[x] 0x86 Write 0 to res if the nouns in the given frame slot and res are structurally equal
puh[x] 0x87 Push a frame with the given number of slots
pop 0x8 Pop a frame
put[x] 0x89 Write the noun in res to the given frame slot
get[x] 0x8A Read the noun in the given frame slot to res
sav[x] 0x8B Write the noun in sub to the given frame slot
reo[x] 0x8C Read the noun in the given frame slot to `sub
sub[x] 0xD Set the sub register equal to the res register (1)
ext[x] 0xE Allocate a cell, use the res register as the head and the sub register as the tail, writing the allocated cell to sub.
edt[x] 0xCF Overwrite the given axis in the noun in sub with the noun in res, placing the result in res.
br1[x] 0x90 or 0xD0 Skip over the given number of following instructions if res is 1, skip none if res is 0. Crash otherwise. (2)
bru[x] 0x91 or 0xD1 Skip over the given number of following instructions. (2)
lnt 0x12 Compile or read cache of bytecode for formula noun in res and jump
lnk 0x13 Compile or read cache of bytecode for formula noun in res, save pc to frame slot 0, and jump
don 0x14 Restore pc from frame slot 0
spy 0x15 Ask runtime to query its namespace with cell from res and place result in res.

Compiling formulae to bytecode

The bytecode compiler uses a "caller-save" convention for registers, including the program counter. This means that we must track whether it is necessary to save the program counter (tail position or not) and the subject register. Thus, the abstract procedure for compiling Nock formulae to bytecode is specified with two additional bits of input.

One bit, when 0, specifies that the subject need not be saved prior to modification. This is true either in tail position (since when the code is finished its subject will no longer be needed) or when the code will be immediately followed by a sub or reo instruction which will overwrite the subject. When this bit is 1, it specifies that the subject should be saved prior to modification, and thus any overwriting of the subject should be preceded by a sav instruction and followed eventually by a reo instruction.

The other bit, when 0, specifies that this code is being generated for "tail position", meaning that its result will be the final result either of an external invocation or a dynamic call. Code is tail position when it is the code generated for the outermost formula of a compilation, or it is a recursive invocation to compile a subformula with no additional instructions appended after the recursive invocation.

For instance, when compiling Nock 7, if the Nock 7 is in tail position, then firstly, no frame is pushed as no registers need to be saved. The first subformula is compiled with bits 0,1 specifying that the subject may be mangled. A sub instruction is appended, and then the second subformula is compiled with bits 0,0, specifying both that the subject may be mangled and that the second subformula is in tail position. The result of this subcomputation will be the result of the outer computation, with no additional steps.

When compiling Nock 2 or Nock 9 in tail position, we take care that any frames pushed are popped (necessary for 2 as we must save the first subformula's result while we compute the second), properly set the sub and res registers, and then generate an lnt instruction which ends the code for our current outer formula. Thus the result of our outer formula will be the result of whichever formula was invoked by the Nock 2 or Nock 9.

When compiling Nock 2 or Nock 9 in a non-tail position, we must push a frame to ensure that frame slot 0 is available for the lnk instruction to save the return in, and pop this frame only after control returns from the invoked formula. If it is necessary that the subject should not be mangled, it is the caller's responsibility (i.e. the responsibility of the code generated for Nock 2 or Nock 9) to save the subject in this frame and restore it after control returns.

The following is the complete definition of the abstract function for compiling Nock formulae to this bytecode. A ; represents concatenation of bytecode sequences. Note that when this function is applied to produce bytecode for an external computation of a Nock formula, or a formula invoked by Nock 2 or Nock 9, it is invoked with both extra bits true/unset/0.

The len[] function gives the offset which when provided to br1 or bru would skip over the input instruction sequence.

Nock S T bc[s t n]
[[b c] d] * 0 puh[1]; bc[1 1 [b c]]; put[0]; bc[s 1 d]; cel[0]; pop; don
[[b c] d] * 1 puh[1]; bc[1 1 [b c]]; put[0]; bc[s 1 d]; cel[0]; pop
[0 b] * 0 axe[b]; don
[0 b] * 1 axe[b]
[1 c] * 0 con[c]; don
[1 c] * 1 con[c]
[2 b c] 0 0 puh[1]; bc[1 1 [c]]; put[0]; bc[0 1 b]; sub; get[0]; pop; lnt COMMENT: does this require we treat sub as a GC root? probably?
[2 b c] 0 1 puh[2]; bc[1 1 [c]]; put[1]; bc[0 1 b]; sub; get[0]; lnk; pop
[2 b c] 1 1 puh[3]; sav[2]; bc[0 1 c]; put[1]; reo[2]; bc[0 1 b]; sub; get[1]; lnk; reo[2]; pop
[3 b] 0 0 bc[0 1 b]; cl?; don
[3 b] * 1 bc[s 1 b]; cl?
[4 b] 0 0 bc[0 1 b]; inc; don
[4 b] * 1 bc[s 1 b]; inc; don
[5 b c] 0 0 puh[1]; bc[1 1 b]; put[0]; bc[0 1 c]; eq?[0]; pop; don
[5 b c] * 1 puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; eq?[0]; pop
`[6 b c d] 0 0 bc[1 1 b]; br1[len[bc[0 0 c]]]; bc[0 0 c]; bc[0 0 d]
`[6 b c d] * 1 bc[1 1 b]; br1[len[bc[s 1 c]; bru[len[bc[s 1 d]]]]]; bc[s 1 c]; bru[len[bc[s 1 d]]]; bc[s 1 d]
[7 b c] 0 * bc[0 1 b]; sub; bc[0 t c]
[7 b c] 1 1 puh[1]; sav[0]; bc[0 1 b]; sub; bc[0 1 c]; reo[0]; pop
[8 b c] 0 * bc[1 1 b]; ext; bc[0 t c]
[8 b c] 1 1 puh[1]; sav[0]; bc[0 1 b]; reo[0]; ext; bc[0 1 c]; reo[0]; pop
[9 b c] 0 0 bc[0 1 c]; sub; axe[b]; lnt
[9 b c] 0 1 puh[1]; bc[0 1 c]; sub; axe[b]; lnk; pop
[9 b c] 1 1 puh[2]; sav[1]; bc[0 1 c]; sub; axe[b]; lnk; reo[1]; pop
[10 [b c] d] 0 0 puh[1]; bc[1 1 d]; put[0]; bc[0 1 c]; reo[0]; edt[b]; pop; don
[10 [b c] d] 0 1 puh[1]; bc[1 1 d]; put[0]; bc[0 1 c]; reo[0]; edt[b]; pop
[10 [b c] d] 1 1 puh[2]; sav[1]; bc[0 1 d]; put[0]; reo[1]; bc[0 1 c]; reo[0]; edt[b]; reo[1]; pop
[11 [b c] d] * * bc[1 1 c]; hns[b]; hnd; bc[s t d]
[11 b c] * * hns[b]; bc[s t c]
[12 b c] 0 0 puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; cel[0]; pop; spy; don
[12 b c] * 1 puh[1]; bc[1 1 b]; put[0]; bc[s 1 c]; cel[0]; pop; spy