mirror of
https://github.com/urbit/ares.git
synced 2024-11-23 09:06:23 +03:00
56 lines
3.5 KiB
Markdown
56 lines
3.5 KiB
Markdown
|
# Noun Representation
|
||
|
|
||
|
A noun is represented as a word, which may be a pointer or reference to other memory.
|
||
|
|
||
|
Semantically, nouns are either atoms or cells, where a cell is an ordered pair of nouns.
|
||
|
|
||
|
A noun-word is a word sized to the machine architecture (e.g. 64 bits on x86_64, 32 bits on ARMv)
|
||
|
which, possibly together with other memory to which it points, represents a noun.
|
||
|
|
||
|
A noun-word with a 1 as the least significant bit is a direct atom. The atom value can be obtained as a machine word
|
||
|
by right-shifting the representational word by 1 bit.
|
||
|
|
||
|
A noun-word with 10 as the least significant bits is an indirect atom. By masking the 2 least significant bits,
|
||
|
this noun can be converted to a (4 byte aligned) pointer, which points to an length-tagged array.
|
||
|
That is, the machine word pointed to by the pointer specifies the length of the array in bytes,
|
||
|
and is immediately followed in memory by a byte array containing the full indirect atom.
|
||
|
|
||
|
A noun-word with 100 as the least significant bits is a pointer to a cell. By masking the 2 least significant bits,
|
||
|
this noun can be converted to a (8 byte aligned) pointer, which points to a cell in memory.
|
||
|
|
||
|
A noun-word with 1000 as the least significant bits is a hash reference. This is a (machine bits - 4) unsigned
|
||
|
integer hash of a noun, used to reference the noun in the hash memory arena.
|
||
|
|
||
|
|`MSB LSB`| |
|
||
|
|`|--------------------------------------------------------------|`| |
|
||
|
|------------------------------------------------------------------|-----------------|
|
||
|
|`XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX1`| Direct atom |
|
||
|
|`XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX10`| Indirect atom |
|
||
|
|`XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX100`| Pointer to cell |
|
||
|
|`XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX1000`| Hash reference |
|
||
|
|
||
|
## Indirect atoms
|
||
|
|
||
|
An indirect atom is a pointer to a machine word, which word is directly adjacent to an array of bytes.
|
||
|
The machine pointer is obtained by masking the least 2 significant bits of the noun-word.
|
||
|
The machine word specifies the size of the atom in bytes, and the byte array stores the atom in little-endian order.
|
||
|
|
||
|
## Cells
|
||
|
|
||
|
A cell is a pointer to a noun-word, which is adjacent to another noun-word.
|
||
|
The machine pointer is obtained by masking the last 3 bits of the cell noun-word.
|
||
|
The pointed-to noun-word is the head of the cell, and the immediately adjacent noun-word is the tail.
|
||
|
|
||
|
## Hash reference
|
||
|
New Mars maintains a HAMT which can store nouns which are large(1) and/or have metadata such as bytecode or a jet reference. An entry in the hashtable consists of metadata and a noun-word, which, if it is a pointer, references memory mapped in a memory arena separate from the stack allocator. Each noun in the hashtable is stored in contiguous memory,
|
||
|
unless it references another whole noun in the hash table, which it may do by a hash reference.
|
||
|
|
||
|
Hash references may be replaced in the stack arena by other noun-words on lookup, and then operated on.
|
||
|
This can result in noun-words which point into a memory arena for a noun at another location than the top of the
|
||
|
noun tree. This is permissible on the stack, but not permissible in the HAMT. Therefore,
|
||
|
when ejecting a noun from the stack arena to the hashtable, such references must result in the referenced
|
||
|
subnouns being copied to their own HAMT entries and replaced with a hash reference.
|
||
|
|
||
|
***TODO*** describe implementation and details of HAMT and stack arena
|
||
|
|