mirror of
https://github.com/urbit/shrub.git
synced 2024-12-15 04:22:48 +03:00
1572 lines
63 KiB
Markdown
1572 lines
63 KiB
Markdown
# u3: noun processing in C.
|
|
|
|
`u3` is the C library that makes Urbit work. If it wasn't called
|
|
`u3`, it might be called `libnoun` - it's a library for making
|
|
and storing nouns.
|
|
|
|
What's a noun? A noun is either a cell or an atom. A cell is an
|
|
ordered pair of any two nouns. An atom is an unsigned integer of
|
|
any size.
|
|
|
|
To the C programmer, this is not a terribly complicated data
|
|
structure, so why do you need a library for it?
|
|
|
|
One: nouns have a well-defined computation kernel, Nock, whose
|
|
spec fits on a page and gzips to 340 bytes. But the only
|
|
arithmetic operation in Nock is increment. So it's nontrivial
|
|
to compute both efficiently and correctly.
|
|
|
|
Two: `u3` is designed to support "permanent computing," ie, a
|
|
single-level store which is transparently snapshotted. This
|
|
implies a specialized memory-management model, etc, etc.
|
|
|
|
(Does `u3` depend on the higher levels of Urbit, Arvo and Hoon?
|
|
Yes and no. `u3` expects you to load something shaped like an
|
|
Arvo kernel, and use it as an event-processing function. But you
|
|
don't need to use this feature if you don't want, and your kernel
|
|
doesn't have to be Arvo proper - just Arvo-compatible. Think of
|
|
`u3` as the BIOS and Arvo as the boot kernel. And there are no
|
|
dependencies at all between Hoon the language and `u3`.)
|
|
|
|
## c3: C in Urbit
|
|
|
|
Under `u3` is the simple `c3` layer, which is just how we write C
|
|
in Urbit.
|
|
|
|
When writing C in u3, please of course follow the conventions of
|
|
the code around you as regards indentation, etc. It's especially
|
|
important that every function have a header comment, even if it
|
|
says nothing interesting.
|
|
|
|
But some of our idiosyncrasies go beyond convention. Yes, we've
|
|
done awful things to C. Here's what we did and why we did.
|
|
|
|
### c3: integer types
|
|
|
|
First, it's generally acknowledged that underspecified integer
|
|
types are C's worst disaster. C99 fixed this, but the `stdint`
|
|
types are wordy and annoying. We've replaced them with:
|
|
|
|
/* Good integers.
|
|
*/
|
|
typedef uint64_t c3_d; // double-word
|
|
typedef int64_t c3_ds; // signed double-word
|
|
typedef uint32_t c3_w; // word
|
|
typedef int32_t c3_ws; // signed word
|
|
typedef uint16_t c3_s; // short
|
|
typedef int16_t c3_ss; // signed short
|
|
typedef uint8_t c3_y; // byte
|
|
typedef int8_t c3_ys; // signed byte
|
|
typedef uint8_t c3_b; // bit
|
|
|
|
typedef uint8_t c3_t; // boolean
|
|
typedef uint8_t c3_o; // loobean
|
|
typedef uint8_t c3_g; // 5-bit atom for a 32-bit log.
|
|
typedef uint32_t c3_l; // little; 31-bit unsigned integer
|
|
typedef uint32_t c3_m; // mote; also c3_l; LSB first a-z 4-char string.
|
|
|
|
/* Bad integers.
|
|
*/
|
|
typedef char c3_c; // does not match int8_t or uint8_t
|
|
typedef int c3_i; // int - really bad
|
|
typedef uintptr_t c3_p; // pointer-length uint - really really bad
|
|
typedef intptr_t c3_ps; // pointer-length int - really really bad
|
|
|
|
Some of these need explanation. A loobean is a Nock boolean -
|
|
Nock, for mysterious reasons, uses 0 as true (always say "yes")
|
|
and 1 as false (always say "no").
|
|
|
|
Nock and/or Hoon cannot tell the difference between a short atom
|
|
and a long one, but at the `u3` level every atom under `2^31` is
|
|
direct. The `c3_l` type is useful to annotate this. A `c3_m` is
|
|
a *mote* - a string of up to 4 characters in a `c3_l`, least
|
|
significant byte first. A `c3_g` should be a 5-bit atom. Of
|
|
course, C cannot enforce these constraints, only document them.
|
|
|
|
Use the "bad" - ie, poorly specified - integer types only when
|
|
interfacing with external code that expects them.
|
|
|
|
An enormous number of motes are defined in `i/c/motes.h`. There
|
|
is no reason to delete motes that aren't being used, or even to
|
|
modularize the definitions. Keep them alphabetical, though.
|
|
|
|
### c3: variables and variable naming
|
|
|
|
The C3 style uses Hoon style TLV variable names, with a quasi
|
|
Hungarian syntax. This is weird, but works really well, as long
|
|
as what you're doing isn't hideously complicated. (Then it works
|
|
badly, but we shouldn't need anything hideous in u3.)
|
|
|
|
A TLV variable name is a random pronounceable three-letter
|
|
string, sometimes with some vague relationship to its meaning,
|
|
but usually not. Usually CVC (consonant-vowel-consonant) is a
|
|
good choice.
|
|
|
|
You should use TLVs much the way math people use Greek letters.
|
|
The same concept should in general get the same name across
|
|
different contexts. When you're working in a given area, you'll
|
|
tend to remember the binding from TLV to concept by sheer power
|
|
of associative memory. When you come back to it, it's not that
|
|
hard to relearn. And of course, when in doubt, comment it.
|
|
|
|
Variables take pseudo-Hungarian suffixes, matching in general the
|
|
suffix of the integer type:
|
|
|
|
c3_w wor_w; // 32-bit word
|
|
|
|
Unlike in standard Hungarian, there is no change for pointer
|
|
variables. C structure variables take a `_u` suffix.
|
|
|
|
### c3: loobeans
|
|
|
|
The code (from `defs.h`) tells the story:
|
|
|
|
# define c3y 0
|
|
# define c3n 1
|
|
|
|
# define _(x) (c3y == (x))
|
|
# define __(x) ((x) ? c3y : c3n)
|
|
# define c3a(x, y) __(_(x) && _(y))
|
|
# define c3o(x, y) __(_(x) || _(y))
|
|
|
|
In short, use `_()` to turn a loobean into a boolean, `__` to go
|
|
the other way. Use `!` as usual, `c3y` for yes and `c3n` for no,
|
|
`c3a` for and and `c3o` for or.
|
|
|
|
## u3: land of nouns
|
|
|
|
The division between `c3` and `u3` is that you could theoretically
|
|
imagine using `c3` as just a generic C environment. Anything to do
|
|
with nouns is in `u3`.
|
|
|
|
### u3: a map of the system
|
|
|
|
There are two kinds of symbols in `u3`: regular and irregular.
|
|
Regular symbols follow this pattern:
|
|
|
|
prefix purpose .h .c
|
|
-------------------------------------------------------
|
|
u3a_ allocation i/n/a.h n/a.c
|
|
u3e_ persistence i/n/e.h n/e.c
|
|
u3h_ hashtables i/n/h.h n/h.c
|
|
u3i_ noun construction i/n/i.h n/i.c
|
|
u3j_ jet control i/n/j.h n/j.c
|
|
u3m_ system management i/n/m.h n/m.c
|
|
u3n_ nock computation i/n/n.h n/n.c
|
|
u3r_ noun access (error returns) i/n/r.h n/r.c
|
|
u3t_ profiling i/n/t.h n/t.c
|
|
u3v_ arvo i/n/v.h n/v.c
|
|
u3x_ noun access (error crashes) i/n/x.h n/x.c
|
|
u3z_ memoization i/n/z.h n/z.c
|
|
u3k[a-g] jets (transfer, C args) i/j/k.h j/[a-g]/*.c
|
|
u3q[a-g] jets (retain, C args) i/j/q.h j/[a-g]/*.c
|
|
u3w[a-g] jets (retain, nock core) i/j/w.h j/[a-g]/*.c
|
|
|
|
Irregular symbols always start with `u3` and obey no other rules.
|
|
They're defined in `i/n/aliases.h`. Finally, `i/all.h` includes
|
|
all these headers (fast compilers, yay) and is all you need to
|
|
program in `u3`.
|
|
|
|
### u3: noun internals
|
|
|
|
A noun is a `u3_noun` - currently defined as a 32-bit `c3_w`.
|
|
|
|
If your `u3_noun` is less than `(1 << 31)`, it's a direct atom.
|
|
Every unsigned integer between `0` and `0x7fffffff` inclusive is
|
|
its own noun.
|
|
|
|
If bit `31` is set in a `u3_noun` and bit `30` is `1` the noun
|
|
is an indirect cell. If bit `31` is set and bit `30` is `0` the
|
|
noun is an indirect atom. Bits `29` through `0` are a word
|
|
pointer into the loom - see below. The structures are:
|
|
|
|
typedef struct {
|
|
c3_w mug_w;
|
|
c3_w len_w;
|
|
c3_w buf_w[0]; // actually [len_w]
|
|
} u3a_atom;
|
|
|
|
typedef struct {
|
|
c3_w mug_w;
|
|
u3_noun hed;
|
|
u3_noun tel;
|
|
} u3a_cell;
|
|
|
|
The only thing that should be mysterious here is `mug_w`, which
|
|
is a 31-bit lazily computed nonzero short hash (FNV currently,
|
|
soon Murmur3). If `mug_w` is 0, the hash is not yet computed.
|
|
We also hijack this field for various hacks, such as saving the
|
|
new address of a noun when copying over.
|
|
|
|
Also, the value `0xffffffff` is `u3_none`, which is never a valid
|
|
noun. Use the type `u3_weak` to express that a noun variable may
|
|
be `u3_none`.
|
|
|
|
### u3: reference counts
|
|
|
|
The only really essential thing you need to know about `u3` is
|
|
how to handle reference counts. Everything else, you can skip
|
|
and just get to work.
|
|
|
|
u3 deals with reference-counted, immutable, acyclic nouns.
|
|
Unfortunately, we are not Apple and can't build reference
|
|
counting into your C compiler, so you need to count by hand.
|
|
|
|
Every allocated noun (or any allocation object, because our
|
|
allocator is general-purpose) contains a counter which counts the
|
|
number of references to it - typically variables with type
|
|
`u3_noun`. When this counter goes to 0, the noun is freed.
|
|
|
|
To tell `u3` that you've added a reference to a noun, call the
|
|
function `u3a_gain()` or its shorthand `u3k()`. (For your
|
|
convenience, this function returns its argument.) To tell `u3`
|
|
that you've destroyed a reference, call `u3a_lose()` or `u3z()`.
|
|
|
|
(If you screw up by decrementing the counter too much, `u3` will
|
|
dump core in horrible ways. If you screw up by incrementing it
|
|
too much, `u3` will leak memory. To check for memory leaks,
|
|
set the `bug_o` flag in `u3e_boot()` - eg, run `vere` with `-g`.
|
|
Memory leaks are difficult to debug - the best way to handle
|
|
leaks is just to revert to a version that didn't have them, and
|
|
look over your code again.)
|
|
|
|
(You can gain or lose a direct atom. It does nothing.)
|
|
|
|
### u3: reference protocols
|
|
|
|
*THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION.*
|
|
|
|
The key question when calling a C function in a refcounted world
|
|
is what the function will do to the noun refcounts - and, if the
|
|
function returns a noun, what it does to the return.
|
|
|
|
There are two semantic patterns, `transfer` and `retain`. In
|
|
`transfer` semantics, the caller "gives" a use count to the
|
|
callee, which "gives back" any return. For instance, if I have
|
|
|
|
{
|
|
u3_noun foo = u3i_string("foobar");
|
|
u3_noun bar;
|
|
|
|
bar = u3f_futz(foo);
|
|
[...]
|
|
u3z(bar);
|
|
}
|
|
|
|
Suppose `u3f_futz()` has `transfer` semantics. At `[...]`, my
|
|
code holds one reference to `bar` and zero references to `foo` -
|
|
which has been freed, unless it's part of `bar`. My code now
|
|
owns `bar` and gets to work with it until it's done, at which
|
|
point a `u3z()` is required.
|
|
|
|
On the other hand, if `u3f_futz()` has `retain` semantics, we
|
|
need to write
|
|
|
|
{
|
|
u3_noun foo = u3i_string("foobar");
|
|
u3_noun bar;
|
|
|
|
bar = u3f_futz(foo);
|
|
[...]
|
|
u3z(foo);
|
|
}
|
|
|
|
because calling `u3f_futz()` does not release our ownership of
|
|
`foo`, which we have to free ourselves.
|
|
|
|
But if we free `bar`, we are making a great mistake, because our
|
|
reference to it is not in any way registered in the memory
|
|
manager (which cannot track references in C variables, of
|
|
course). It is normal and healthy to have these uncounted
|
|
C references, but they must be treated with care.
|
|
|
|
The bottom line is that it's essential for the caller to know
|
|
the refcount semantics of any function which takes or returns a
|
|
noun. (In some unusual circumstances, different arguments or
|
|
returns in one function may be handled differently.)
|
|
|
|
Broadly speaking, as a design question, retain semantics are more
|
|
appropriate for functions which inspect or query nouns. For
|
|
instance, `u3h()` (which takes the head of a noun) retains, so
|
|
that we can traverse a noun tree without constantly incrementing
|
|
and decrementing.
|
|
|
|
Transfer semantics are more appropriate for functions which make
|
|
nouns, which is obviously what most functions do.
|
|
|
|
In general, though, in most places it's not worth thinking about
|
|
what your function does. There is a convention for it, which
|
|
depends on where it is, not what it does. Follow the convention.
|
|
|
|
### u3: reference conventions
|
|
|
|
The `u3` convention is that, unless otherwise specified, *all
|
|
functions have transfer semantics* - with the exception of the
|
|
prefixes: `u3r`, `u3x`, `u3z`, `u3q` and `u3w`. Also, within
|
|
jet directories `a` through `f` (but not `g`), internal functions
|
|
retain (for historical reasons).
|
|
|
|
If functions outside this set have retain semantics, they need to
|
|
be commented, both in the `.h` and `.c` file, with `RETAIN` in
|
|
all caps. Yes, it's this important.
|
|
|
|
### u3: system architecture
|
|
|
|
If you just want to tinker with some existing code, it might be
|
|
enough to understand the above. If not, it's probably worth
|
|
taking the time to look at `u3` as a whole.
|
|
|
|
`u3` is designed to work as a persistent event processor.
|
|
Logically, it computes a function of the form
|
|
|
|
f(event, old state) -> (actions, new state)
|
|
|
|
Obviously almost any computing model - including, but not limited
|
|
to, Urbit - can be defined in this form. To create the illusion
|
|
of a computer that never loses state and never fails, we:
|
|
|
|
- log every event externally before it goes into u3
|
|
- keep a single reference to a permanent state noun.
|
|
- can abort any event without damaging the permanent state.
|
|
- snapshot the permanent state periodically, and/or prune logs.
|
|
|
|
### u3: the road model
|
|
|
|
`u3` uses a memory design which I'm sure someone has invented
|
|
somewhere before, because it's not very clever, but I've never
|
|
seen it anywhere in particular.
|
|
|
|
Every allocation starts with a solid block of memory, which `u3`
|
|
calls the `loom`. How do we allocate on the loom? You're
|
|
probably familiar with the Unix heap-stack design, in which the
|
|
stack grows downward and the heap (malloc arena) grows upward:
|
|
|
|
0 brk ffff
|
|
| heap | stack |
|
|
|------------#################################+++++++++++++|
|
|
| | |
|
|
0 sp ffff
|
|
|
|
A road is a normal heap-stack system, except that the heap
|
|
and stack can point in *either direction*. Therefore, inside
|
|
a road, we can nest another road in the *opposite direction*.
|
|
|
|
When the opposite road completes, its heap is left on top of
|
|
the opposite heap's stack. It's no more than the normal
|
|
behavior of a stack machine for all subcomputations to push
|
|
their results on the stack.
|
|
|
|
The performance tradeoff of "leaping" - reversing directions in
|
|
the road - is that if the outer computation wants to preserve the
|
|
results of the inner one, not just use them for temporary
|
|
purposes, it has to *copy them*.
|
|
|
|
This is a trivial cost in some cases, a prohibitive cost in
|
|
others. The upside, of course, is that all garbage accrued
|
|
in the inner computation is discarded at zero cost.
|
|
|
|
The goal of the road system is the ability to *layer* memory
|
|
models. If you are allocating on a road, you have no idea
|
|
how deep within a nested road system you are - in other words,
|
|
you have no idea exactly how durable your result may be.
|
|
But free space is never fragmented within a road.
|
|
|
|
Roads do not reduce the generality or performance of a memory
|
|
system, since even the most complex GC system can be nested
|
|
within a road at no particular loss of performance - a road
|
|
is just a block of memory.
|
|
|
|
Each road (`u3a_road` to be exact) uses four pointers: `rut` is
|
|
the bottom of the arena, `hat` the top of the arena, `mat` the
|
|
bottom of the stack, `cap` the top of the stack. (Bear in mind
|
|
that the road "stack" is not actually used as the C function-call
|
|
stack, though it probably should be.)
|
|
|
|
A "north" road has the stack high and the heap low:
|
|
|
|
0 rut hat ffff
|
|
| | | |
|
|
|~~~~~~~~~~~~-------##########################+++++++$~~~~~|
|
|
| | | |
|
|
0 cap mat ffff
|
|
|
|
A "south" road is the other way around:
|
|
|
|
0 mat cap ffff
|
|
| | | |
|
|
|~~~~~~~~~~~~$++++++##########################--------~~~~~|
|
|
| | | |
|
|
0 hat rut ffff
|
|
|
|
Legend: `-` is durable storage (heap); `+` is temporary storage
|
|
(stack); `~` is deep storage (immutable); `$` is the allocation
|
|
frame; `#` is free memory.
|
|
|
|
Pointer restrictions: pointers stored in `+` can point anywhere.
|
|
Of course, pointing to `#` (free memory) would be a bug.
|
|
Pointers in `-` can only point to `-` or `~`; pointers in `~`
|
|
only point to `~`.
|
|
|
|
To "leap" is to create a new inner road in the `###` free space.
|
|
but in the reverse direction, so that when the inner road
|
|
"falls" (terminates), its durable storage is left on the
|
|
temporary storage of the outer road.
|
|
|
|
`u3` keeps a global variable, `u3_Road` or its alias `u3R`, which
|
|
points to the current road. (If we ever run threads in inner
|
|
roads - see below - this will become a thread-local variable.)
|
|
Relative to `u3R`, `+` memory is called `junior` memory; `-`
|
|
memory is `normal` memory; `~` is `senior` memory.
|
|
|
|
### u3: explaining the road model
|
|
|
|
But... why?
|
|
|
|
We're now ready to understand why the road system works so
|
|
logically with the event and persistence model.
|
|
|
|
The key is that *we don't update refcounts in senior memory.*
|
|
A pointer from an inner road to an outer road is not counted.
|
|
Also, the outmost, or `surface` road, is the only part of the
|
|
image that gets checkpointed.
|
|
|
|
So the surface road contains the entire durable state of `u3`.
|
|
When we process an event, or perform any kind of complicated or
|
|
interesting calculation, *we process it in an inner road*. If
|
|
its results are saved, they need to be copied.
|
|
|
|
Since processing in an inner road does not touch surface memory,
|
|
(a) we can leave the surface road in a read-only state and not
|
|
mark its pages dirty; (b) we can abort an inner calculation
|
|
without screwing up the surface; and (c) because inner results
|
|
are copied onto the surface, the surface doesn't get fragmented.
|
|
|
|
All of (a), (b) and (c) are needed for checkpointing to be easy.
|
|
It might be tractable otherwise, but easy is even better.
|
|
|
|
Moreover, while the surface is most definitely single-threaded,
|
|
we could easily run multiple threads in multiple inner roads
|
|
(as long as the threads don't have pointers into each others'
|
|
memory, which they obviously shouldn't).
|
|
|
|
Moreover, in future, we'll experiment more with adding road
|
|
control hints to the programmer's toolbox. Reference counting is
|
|
expensive. We hypothesize that in many - if not most - cases,
|
|
the programmer can identify procedural structures whose garbage
|
|
should be discarded in one step by copying the results. Then,
|
|
within the procedure, we can switch the allocator into `sand`
|
|
mode, and stop tracking references at all.
|
|
|
|
### u3: rules for C programming
|
|
|
|
There are two levels at which we program in C: (1) above the
|
|
interpreter; (2) within the interpreter or jets. These have
|
|
separate rules which need to be respected.
|
|
|
|
### u3: rules above the interpreter
|
|
|
|
In its relations with Unix, Urbit follows a strict rule of "call
|
|
me, I won't call you." We do of course call Unix system calls,
|
|
but only for the purpose of actually computing.
|
|
|
|
Above Urbit, you are in a normal C/Unix programming environment
|
|
and can call anything in or out of Urbit. Note that when using
|
|
`u3`, you're always on the surface road, which is not thread-safe
|
|
by default. Generally speaking, `u3` is designed to support
|
|
event-oriented, single-threaded programming.
|
|
|
|
If you need threads which create nouns, you could use
|
|
`u3m_hate()` and `u3m_love()` to run these threads in subroads.
|
|
You'd need to make the global road pointer, `u3R`, a thread-local
|
|
variable instead. This seems perfectly practical, but we haven't
|
|
done it because we haven't needed to.
|
|
|
|
### u3: rules within the interpreter
|
|
|
|
Within the interpreter, your code can run either in the surface
|
|
road or in a deep road. You can test this by testing
|
|
|
|
(&u3H->rod_u == u3R)
|
|
|
|
ie: does the pier's home road equal the current road pointer?
|
|
|
|
Normally in this context you assume you're obeying the rules of
|
|
running on an inner road, ie, "deep memory." Remember, however,
|
|
that the interpreter *can* run on surface memory - but anything
|
|
you can do deep, you can do on the surface. The converse is by
|
|
no means the case.
|
|
|
|
In deep memory, think of yourself as if in a signal handler.
|
|
Your execution context is extremely fragile and may be terminated
|
|
without warning or cleanup at any time (for instance, by ^C).
|
|
|
|
For instance, you can't call `malloc` (or C++ `new`) in your C
|
|
code, because you don't have the right to modify data structures
|
|
at the global level, and will leave them in an inconsistent state
|
|
if your inner road gets terminated. (Instead, use our drop-in
|
|
replacements, `u3a_malloc()`, `u3a_free()`, `u3a_realloc()`.)
|
|
|
|
A good example is the different meaning of `c3_assert()` inside
|
|
and outside the interpreter. At either layer, you can use
|
|
regular assert(), which will just kill your process. On the
|
|
surface, `c3_assert()` will just... kill your process.
|
|
|
|
In deep execution, `c3_assert()` will issue an exception that
|
|
queues an error event, complete with trace stack, on the Arvo
|
|
event queue. Let's see how this happens.
|
|
|
|
### u3: exceptions
|
|
|
|
You produce an exception with
|
|
|
|
/* u3m_bail(): bail out. Does not return.
|
|
**
|
|
** Bail motes:
|
|
**
|
|
** %exit :: semantic failure
|
|
** %evil :: bad crypto
|
|
** %intr :: interrupt
|
|
** %fail :: execution failure
|
|
** %foul :: assert failure
|
|
** %need :: network block
|
|
** %meme :: out of memory
|
|
** %time :: timed out
|
|
** %oops :: assertion failure
|
|
*/
|
|
c3_i
|
|
u3m_bail(c3_m how_m);
|
|
|
|
Broadly speaking, there are two classes of exception: internal
|
|
and external. An external exception begins in a Unix signal
|
|
handler. An internal exception begins with a call to longjmp()
|
|
on the main thread.
|
|
|
|
There are also two kinds of exception: mild and severe. An
|
|
external exception is always severe. An internal exception is
|
|
normally mild, but some (like `c3__oops`, generated by
|
|
`c3_assert()`) are severe.
|
|
|
|
Either way, exceptions come with a stack trace. The `u3` nock
|
|
interpreter is instrumented to retain stack trace hints and
|
|
produce them as a printable `(list tank)`.
|
|
|
|
Mild exceptions are caught by the first virtualization layer and
|
|
returned to the caller, following the behavior of the Nock
|
|
virtualizer `++mock` (in `hoon.hoon`)
|
|
|
|
Severe exceptions, or mild exceptions at the surface, terminate
|
|
the entire execution stack at any depth and send the cumulative
|
|
trace back to the `u3` caller.
|
|
|
|
For instance, `vere` uses this trace to construct a `%crud`
|
|
event, which conveys our trace back toward the Arvo context where
|
|
it crashed. This lets any UI component anywhere, even on a
|
|
remote node, render the stacktrace as a consequence of the user's
|
|
action - even if its its direct cause was (for instance) a Unix
|
|
SIGINT or SIGALRM.
|
|
|
|
### u3: C structures on the loom
|
|
|
|
Normally, all data on the loom is nouns. Sometimes we break this
|
|
rule just a little, though - eg, in the `u3h` hashtables.
|
|
|
|
To point to non-noun C structs on the loom, we use a `u3_post`,
|
|
which is just a loom word offset. A macro lets us declare this
|
|
as if it was a pointer:
|
|
|
|
typedef c3_w u3_post;
|
|
#define u3p(type) u3_post
|
|
|
|
Some may regard this as clever, others as pointless. Anyway, use
|
|
`u3to()` and `u3of()` to convert to and from pointers.
|
|
|
|
When using C structs on the loom - generally a bad idea - make
|
|
sure anything which could be on the surface road is structurally
|
|
portable, eg, won't change size when the pointer size changes.
|
|
(Note also: we consider little-endian, rightly or wrongly, to
|
|
have won the endian wars.)
|
|
|
|
## u3: API overview by prefix
|
|
|
|
Let's run through the `u3` modules one by one. All public
|
|
functions are commented, but the comments may be cryptic.
|
|
|
|
### u3m: main control
|
|
|
|
To start `u3`, run
|
|
|
|
/* u3m_boot(): start the u3 system.
|
|
*/
|
|
void
|
|
u3m_boot(c3_o nuu_o, c3_o bug_o, c3_c* dir_c);
|
|
|
|
`nuu_o` is `c3y` (yes, `0`) if you're creating a new pier,
|
|
`c3n` (no, `1`) if you're loading an existing one. `bug_o`
|
|
is `c3y` if you want to test the garbage-collector, `c3n`
|
|
otherwise. `dir_c` is the directory for the pier files.
|
|
|
|
`u3m_boot()` expects an `urbit.pill` file to load the kernel
|
|
from. This is specified with the -B commandline option.
|
|
|
|
Any significant computation with nouns, certainly anything Turing
|
|
complete, should be run (a) virtualized and (b) in an inner road.
|
|
These are slightly different things, but at the highest level we
|
|
bundle them together for your convenience, in `u3m_soft()`:
|
|
|
|
/* u3m_soft(): system soft wrapper. unifies unix and nock errors.
|
|
**
|
|
** Produces [%$ result] or [%error (list tank)].
|
|
*/
|
|
u3_noun
|
|
u3m_soft(c3_w sec_w, u3_funk fun_f, u3_noun arg);
|
|
|
|
`sec_w` is the number of seconds to time out the computation.
|
|
`fun_f` is a C function accepting `arg`.
|
|
|
|
The result of `u3m_soft()` is a cell whose head is an atom. If
|
|
the head is `%$` - ie, `0` - the tail is the result of
|
|
`fun_f(arg)`. Otherwise, the head is a `term` (an atom which is
|
|
an LSB first string), and the tail is a `(list tank)` (a list of
|
|
`tank` printables - see `++tank` in `hoon.hoon`). Error terms
|
|
should be the same as the exception terms above.
|
|
|
|
If you're confident that your computation won't fail, you can
|
|
use `u3m_soft_sure()`, `u3m_soft_slam()`, or `u3m_soft_nock()`
|
|
for C functions, Hoon function calls, and Nock invocations.
|
|
Caution - this returns just the result, and asserts globally.
|
|
|
|
All the `u3m_soft` functions above work *only on the surface*.
|
|
Within the surface, virtualize with `u3m_soft_run()`. Note that
|
|
this takes a `fly` (a namespace gate), thus activating the `11`
|
|
super-operator in the nock virtualizer, `++mock`. When actually
|
|
using the `fly`, call `u3m_soft_esc()`. Don't do either unless
|
|
you know what you're doing!
|
|
|
|
For descending into a subroad *without* Nock virtualization,
|
|
use `u3m_hate()` and `u3m_love` respectively. Hating enters
|
|
a subroad; loving leaves it, copying out a product noun.
|
|
|
|
Other miscellaneous tools in `u3m`: `u3m_file()` loads a Unix
|
|
file as a Nock atom; `u3m_water()` measures the boundaries of
|
|
the loom in current use (ie, watermarks); and a variety of
|
|
prettyprinting routines, none perfect, are available, mainly for
|
|
debugging printfs: `u3m_pretty()`, `u3m_p()`, `u3m_tape()` and
|
|
`u3m_wall()`.
|
|
|
|
It's sometimes nice to run a mark-and-sweep garbage collector,
|
|
`u3m_grab()`, which collects the world from a list of roots,
|
|
and asserts if it finds any leaks or incorrect refcounts. This
|
|
tool is for debugging and long-term maintenance only; refcounts
|
|
should never err.
|
|
|
|
### u3j: jets
|
|
|
|
The jet system, `u3j`, is what makes `u3` and `nock` in any sense
|
|
a useful computing environment. Except perhaps `u3a` (there is
|
|
really no such thing as a trivial allocator, though `u3a` is
|
|
dumber than most) - `u3j` is the most interesting code in `u3`.
|
|
|
|
Let's consider the minor miracle of driver-to-battery binding
|
|
which lets `u3j` work - and decrement not be `O(n)` - without
|
|
violating the precisely defined semantics of pure Nock, *ever*.
|
|
|
|
It's easy to assume that jets represent an architectural coupling
|
|
between Hoon language semantics and Nock interpreter internals.
|
|
Indeed such a coupling would be wholly wrongtious and un-Urbit.
|
|
But the jet system is not Hoon-specific. It is specific to nock
|
|
runtime systems that use a design pattern we call a `core`.
|
|
|
|
#### u3j: core structure
|
|
|
|
A core is no more than a cell `[code data]`, in which a `code` is
|
|
either a Nock formula or a cell of `code`s, and `data` is anything.
|
|
In a proper core, the subject each formula expects is the core
|
|
itself.
|
|
|
|
Except for the arbitrary decision to make a core `[code data]`,
|
|
(or as we sometimes say, `[battery payload]`), instead of `[data
|
|
code]`, any high-level language transforming itself to Nock would
|
|
use this design.
|
|
|
|
So jets are in fact fully general. Broadly speaking, the jet
|
|
system works by matching a C *driver* to a battery. When the
|
|
battery is invoked with Nock operator `9`, it must be found in
|
|
associative memory and linked to its driver. Then we link the
|
|
formula axis of the operation (`a` in `[9 a b]`) to a specific
|
|
function in the driver.
|
|
|
|
To validate this jet binding, we need to know two things. One,
|
|
we need to know the C function actually is a perfect semantic
|
|
match for the Nock formula. This can be developed with driver
|
|
test flags, which work, and locked down with a secure formula
|
|
hash in the driver, which we haven't bothered with just yet.
|
|
(You could also try to develop a formal method for verifying
|
|
that C functions and Nock formulas are equivalent, but this is
|
|
a research problem for the future.)
|
|
|
|
Two, we need to validate that the payload is appropriate for the
|
|
battery. We should note that jets are a Nock feature and have no
|
|
reference to Hoon. A driver which relies on the Hoon type system
|
|
to only pair it with valid payloads is a broken driver, and
|
|
breaks the Nock compliance of the system as a whole. So don't.
|
|
|
|
Now, a casual observer might look at `[battery payload]` and
|
|
expect the simplest case of it to be `[formula subject]`. That
|
|
is: to execute a simple core whose battery is a single formula,
|
|
we compute
|
|
|
|
nock(+.a -.a)
|
|
|
|
Then, naturally, when we go from Hoon or a high-level language
|
|
containing functions down to Nock, `[function arguments]` turns
|
|
into `[formula subject]`. This seems like an obvious design, and
|
|
we mention it only because it is *completely wrong*.
|
|
|
|
Rather, to execute a one-armed core like the above, we run
|
|
|
|
nock(a -.a)
|
|
|
|
and the normal structure of a `gate`, which is simply Urbitese
|
|
for "function," is:
|
|
|
|
[formula [sample context]]
|
|
|
|
where `sample` is Urbitese for "arguments" - and `context`, any
|
|
Lisper will at once recognize, is Urbitese for "environment."
|
|
|
|
To `slam` or call the gate, we simply replace the default sample
|
|
with the caller's data, then nock the formula on the entire gate.
|
|
|
|
What's in the context? Unlike in most dynamic languages, it is
|
|
not some secret system-level bag of tricks. Almost always it is
|
|
another core. This onion continues until at the bottom, there is
|
|
an atomic constant, conventionally is the kernel version number.
|
|
|
|
Thus a (highly desirable) `static` core is one of the form
|
|
|
|
[battery constant]
|
|
[battery static-core]
|
|
|
|
ie, a solid stack of nested libraries without any dynamic data.
|
|
The typical gate will thus be, for example,
|
|
|
|
[formula [sample [battery battery battery constant]]]
|
|
|
|
but we would be most foolish to restrict the jet mechanism to
|
|
cores of this particular structure. We cannot constrain a
|
|
payload to be `[sample static-core]`, or even `[sample core]`.
|
|
Any such constraint would not be rich enough to handle Hoon,
|
|
let alone other languages.
|
|
|
|
#### u3j: jet state
|
|
|
|
There are two fundamental rules of computer science: (1) every
|
|
system is best understood through its state; (2) less state is
|
|
better than more state. Sadly, a pier has three different jet
|
|
state systems: `cold`, `warm` and `hot`. It needs all of them.
|
|
|
|
Hot state is associated with this particular Unix process. The
|
|
persistent pier is portable not just between process and process,
|
|
but machine and machine or OS and OS. The set of jets loaded
|
|
into a pier may itself change (in theory, though not in the
|
|
present implementation) during the lifetime of the process. Hot
|
|
state is a pure C data structure.
|
|
|
|
Cold state is associated with the logical execution history of
|
|
the pier. It consists entirely of nouns and ignores restarts.
|
|
|
|
Warm state contains all dependencies between cold and hot
|
|
state. It consists of C structures allocated on the loom.
|
|
|
|
Warm state is purely a function of cold and hot states, and
|
|
we can wipe and regenerate it at any time. On any restart where
|
|
the hot state might have changed, we clear the warm state
|
|
with `u3j_ream()`.
|
|
|
|
There is only one hot state, the global jet dashboard
|
|
`u3j_Dash` or `u3D` for short. In the present implementation,
|
|
u3D is a static structure not modified at runtime, except for
|
|
numbering itself on process initialization. This structure -
|
|
which embeds function pointers to all the jets - is defined
|
|
in `j/tree.c`. The data structures:
|
|
|
|
/* u3j_harm: driver arm.
|
|
*/
|
|
typedef struct _u3j_harm {
|
|
c3_c* fcs_c; // `.axe` or name
|
|
u3_noun (*fun_f)(u3_noun); // compute or 0 / semitransfer
|
|
c3_o ice; // perfect (don't test)
|
|
c3_o tot; // total (never punts)
|
|
c3_o liv; // live (enabled)
|
|
} u3j_harm;
|
|
|
|
/* u3j_core: C core driver.
|
|
*/
|
|
typedef struct _u3j_core {
|
|
c3_c* cos_c; // control string
|
|
struct _u3j_harm* arm_u; // blank-terminated static list
|
|
struct _u3j_core* dev_u; // blank-terminated static list
|
|
struct _u3j_core* par_u; // dynamic parent pointer
|
|
c3_l jax_l; // dynamic jet index
|
|
} u3j_core;
|
|
|
|
/* u3e_dash, u3_Dash, u3D: jet dashboard singleton
|
|
*/
|
|
typedef struct _u3e_dash {
|
|
u3j_core* dev_u; // null-terminated static list
|
|
c3_l len_l; // ray_u filled length
|
|
c3_l all_l; // ray_u allocated length
|
|
u3j_core* ray_u; // dynamic driver array
|
|
} u3j_dash;
|
|
|
|
Warm and cold state is *per road*. In other words, as we nest
|
|
roads, we also nest jet state. The jet state in the road is:
|
|
|
|
struct { // jet dashboard
|
|
u3p(u3h_root) har_p; // warm state
|
|
u3_noun das; // cold state
|
|
} jed;
|
|
|
|
In case you understand Hoon, `das` (cold state) is a `++dash`,
|
|
and `har_p` (warm state) is a map from battery to `++calx`:
|
|
|
|
++ bane ,@tas :: battery name
|
|
++ bash ,@uvH :: label hash
|
|
++ bosh ,@uvH :: local battery hash
|
|
++ batt ,* :: battery
|
|
++ calf ::
|
|
$: jax=,@ud :: hot core index
|
|
hap=(map ,@ud ,@ud) :: axis/hot arm index
|
|
lab=path :: label as path
|
|
jit=* :: arbitrary data
|
|
== ::
|
|
++ calx (trel calf (pair bash cope) club) :: cached by battery
|
|
++ clog (pair cope (map batt club)) :: identity record
|
|
++ club (pair corp (map term nock)) :: battery pattern
|
|
++ cope (trel bane axis (each bash noun)) :: core pattern
|
|
++ core ,* :: core
|
|
++ corp (each core batt) :: parent or static
|
|
++ dash (map bash clog) :: jet system
|
|
|
|
The driver index `jax` in a `++calx` is an index into `ray_u` in the
|
|
dashboard - ie, a pointer into hot state. This is why the warm
|
|
state has to be reset when we reload the pier in a new process.
|
|
|
|
Why is jet state nested? Nock of course is a functional system,
|
|
so as we compute we don't explicitly create state. Jet state is
|
|
an exception to this principle (which works only because it can't
|
|
be semantically detected from Nock/Hoon) - but it can't violate
|
|
the fundamental rules of the allocation system.
|
|
|
|
For instance, when we're on an inner road, we can't allocate on
|
|
an outer road, or point from an outer road to an inner. So if we
|
|
learn something - like a mapping from battery to jet - in the
|
|
inner road, we have to keep it in the inner road.
|
|
|
|
Mitigating this problem, when we leave an inner road (with
|
|
`u3m_love()`), we call `u3j_reap()` to promote jet information in
|
|
the dying road. Reaping promotes anything we've learned about
|
|
any battery that either (a) already existed in the outer road, or
|
|
(b) is being saved to the outer road.
|
|
|
|
#### u3j: jet binding
|
|
|
|
Jet binding starts with a `%fast` hint. (In Hoon, this is
|
|
produced by the runes `~%`, for the general case, or `~/`
|
|
for simple functions.) To bind a jet, execute a formula of the
|
|
form:
|
|
|
|
[10 [%fast clue-formula] core-formula]
|
|
|
|
`core-formula` assembles the core to be jet-propelled.
|
|
`clue-formula` produces the hint information, or `++clue`
|
|
above, which we want to annotate it with.
|
|
|
|
A clue is a triple of name, parent, and hooks:
|
|
|
|
++ clue (trel chum nock (list (pair term nock)))
|
|
|
|
The name, or `++chum`, has a bunch of historical structure which
|
|
we don't need (cleaning these things up is tricky), but just gets
|
|
flattened into a term.
|
|
|
|
The parent axis is a nock formula, but always reduces to a simple
|
|
axis, which is the address of this core's *parent*. Consider
|
|
again an ordinary gate
|
|
|
|
[formula [sample context]]
|
|
|
|
Typically the `context` is itself a library core, which itself
|
|
has a jet binding. If so, the parent axis of this gate is `7`.
|
|
|
|
If the parent is already bound - and the parent *must* be already
|
|
bound, in this road or a road containing it - we can hook this core
|
|
bottom-up into a tree hierarchy. Normally the child core is
|
|
produced by an arm of the parent core, so this is not a problem -
|
|
we wouldn't have the child if we hadn't already made the parent.
|
|
|
|
The clue also contains a list of *hooks*, named nock formulas on
|
|
the core. Usually these are arms, but they need not be. The
|
|
point is that we often want to call a core from C, in a situation
|
|
where we have no type or other source information. A common case
|
|
of this is a complex system in which we're mixing functions which
|
|
are jet-propelled with functions that aren't.
|
|
|
|
In any case, all the information in the `%fast` hint goes to
|
|
`u3j_mine()`, which registers the battery in cold state (`das` in
|
|
`jed` in `u3R`), then warm state (`har_p` in `jed`).
|
|
|
|
It's essential to understand that the `%fast` hint has to be,
|
|
well, fast - because we apply it whenever we build a core. For
|
|
instance, if the core is a Hoon gate - a function - we will call
|
|
`u3j_mine` every time the function is called.
|
|
|
|
### u3j: the cold jet dashboard
|
|
|
|
For even more fun, the jet tree is not actually a tree of
|
|
batteries. It's a tree of battery *labels*, where a label is
|
|
an [axis term] path from the root of the tree. (At the root,
|
|
if the core pattern is always followed properly, is a core whose
|
|
payload is an atomic constant, conventionally the Hoon version.)
|
|
|
|
Under each of these labels, it's normal to have an arbitrary
|
|
number of different Nock batteries (not just multiple copies
|
|
of the same noun, a situation we *do* strive to avoid). For
|
|
instance, one might be compiled with debugging hints, one not.
|
|
|
|
We might even have changed the semantics of the battery without
|
|
changing the label - so long as those semantics don't invalidate
|
|
any attached driver.
|
|
|
|
et tree. For instance, it's normal to have
|
|
two equivalent Nock batteries at the same time in one pier: one
|
|
battery compiled with debugging hints, one not.
|
|
|
|
Rather, the jet tree is a semantic hierarchy. The root of the
|
|
hierarchy is a constant, by convention the Hoon kernel version
|
|
because any normal jet-propelled core has, at the bottom of its
|
|
onion of libraries, the standard kernel. Thus if the core is
|
|
|
|
[foo-battery [bar-battery [moo-battery 164]]]
|
|
|
|
we can reverse the nesting to construct a hierarchical core
|
|
path. The static core
|
|
|
|
164/moo/bar/foo
|
|
|
|
extends the static core `164/moo/bar` by wrapping the `foo`
|
|
battery (ie, in Hoon, `|%`) around it. With the core above,
|
|
you can compute `foo` stuff, `bar` stuff, and `moo` stuff.
|
|
Rocket science, not.
|
|
|
|
Not all cores are static, of course - they may contain live data,
|
|
like the sample in a gate (ie, argument to a function). Once
|
|
again, it's important to remember that we track jet bindings not
|
|
by the core, which may not be static, but by the battery, which
|
|
is always static.
|
|
|
|
(And if you're wondering how we can use a deep noun like a Nock
|
|
formula or battery as a key in a key-value table, remember
|
|
`mug_w`, the lazily computed short hash, in all boxed nouns.)
|
|
|
|
In any case, `das`, the dashboard, is a map from `bash` to jet
|
|
location record (`++clog`). A `clog` in turn contains two kinds
|
|
of information: the `++cope`, or per-location noun; and a map of
|
|
batteries to a per-battery `++club`.
|
|
|
|
The `cope` is a triple of `++bane` (battery name, right now just
|
|
a `term`); `++axis`, the axis, within *this* core, of the parent;
|
|
and `(each bash noun)`, which is either `[0 bash]` if the parent
|
|
is another core, or `[1 noun]`, for the constant noun (like
|
|
`164`) if there is no parent core.
|
|
|
|
A `bash` is just the noun hash (`++sham`) of a `cope`, which
|
|
uniquely expresses the battery's hierarchical location without
|
|
depending on the actual formulas.
|
|
|
|
The `club` contains a `++corp`, which we use to actually validate
|
|
the core. Obviously jet execution has to be perfectly compatible
|
|
with Nock. We search on the battery, but getting the battery
|
|
right is not enough - a typical battery is dependent on its
|
|
context. For example, your jet-propelled library function is
|
|
very likely to call `++dec` or other advanced kernel technology.
|
|
If you've replaced the kernel in your context with something
|
|
else, we need to detect this and not run the jet.
|
|
|
|
There are two cases for a jet-propelled core - either the entire
|
|
core is a static constant, or it isn't. Hence the definition
|
|
of `corp`:
|
|
|
|
++ corp (each core batt) :: parent or static
|
|
|
|
Ie, a `corp` is `[0 core]` or `[1 batt]`. If it's static -
|
|
meaning that the jet only works with one specific core, ie, the
|
|
parent axis of each location in the hierarchy is `3` - we can
|
|
validate with a single comparison. Otherwise, we have to recurse
|
|
upward by checking the parent.
|
|
|
|
Note that there is at present no way to force a jet to depend on
|
|
static *data*.
|
|
|
|
### u3j: the warm jet dashboard
|
|
|
|
We don't use the cold state to match jets as we call them. We
|
|
use the cold state to register jets as we find them, and also to
|
|
rebuild the warm state after the hot state is reset.
|
|
|
|
What we actually use at runtime is the warm state, `jed->har_p`,
|
|
which is a `u3h` (built-in hashtable), allocated on the loom,
|
|
from battery to `++calx`.
|
|
|
|
A `calx` is a triple of a `++calf`, a `[bash cope]` cell, and a
|
|
`club`. The latter two are all straight from cold state.
|
|
|
|
The `calf` contains warm data dependent on hot state. It's a
|
|
quadruple: of `jax`, the hot driver index (in `ray_u` in
|
|
`u3j_dash`); `hap`, a table from arm axis (ie, the axis of each
|
|
formula within the battery) to driver arm index (into `arm_u` in
|
|
`u3j_core`); `lab`, the complete label path; and `jit`, any
|
|
other dynamic data that may speed up execution.
|
|
|
|
We construct `hap`, when we create the calx, by iterating through
|
|
the arms registered in the `u3j_core`. Note the way a `u3j_harm`
|
|
declares itself, with the string `fcs_c` which can contain either
|
|
an axis or a name. Most jetted cores are of course gates, which
|
|
have one formula at one axis within the core: `fcs_c` is `".3"`.
|
|
|
|
But we do often have fast cores with more complex arm structure,
|
|
and it would be sad to have to manage their axes by hand. To use
|
|
an `fcs_c` with a named arm, it's sufficient to make sure the
|
|
name is bound to a formula `[0 axis]` in the hook table.
|
|
|
|
`jit`, as its name suggests, is a stub where any sort of
|
|
optimization data computed on battery registration might go. To
|
|
use it, fill in the `_cj_jit()` function.
|
|
|
|
### u3j: the hot dashboard
|
|
|
|
Now it should be easy to see how we actually invoke jets. Every
|
|
time we run a nock `9` instruction (pretty often, obviously), we
|
|
have a core and an axis. We pass these to `u3j_kick()`, which
|
|
will try to execute them.
|
|
|
|
Because nouns with a reference count of 1 are precious,
|
|
`u3j_kick()` has a tricky reference control definition. It
|
|
reserves the right to return `u3_none` in the case where there is
|
|
no driver, or the driver does not apply for this case; in this
|
|
case, it retains argument `cor`. If it succeeds, though, it
|
|
transfers `cor`.
|
|
|
|
`u3j_kick()` searches for the battery (always the head of the
|
|
core, of course) in the hot dashboard. If the battery is
|
|
registered, it searches for the axis in `hap` in the `calx`.
|
|
If it exists, the core matches a driver and the driver jets this
|
|
arm. If not, we return `u3_none`.
|
|
|
|
Otherwise, we call `fun_f` in our `u3j_harm`. This obeys the
|
|
same protocol as `u3j_kick()`; it can refuse to function by
|
|
returning `u3_none`, or consume the noun.
|
|
|
|
Besides the actual function pointer `fun_f`, we have some flags
|
|
in the `u3j_harm` which tell us how to call the arm function.
|
|
|
|
If `ice` is yes (`&`, `0`), the jet is known to be perfect and we
|
|
can just trust the product of `fun_f`. Otherwise, we need to run
|
|
*both* the Nock arm and `fun_f`, and compare their results.
|
|
|
|
(Note that while executing the C side of this test, we have to
|
|
set `ice` to yes; on the Nock side, we have to set `liv` to no.
|
|
Otherwise, many non-exponential functions become exponential.
|
|
When auto-testing jets in this way, the principle is that the
|
|
test is on the outermost layer of recursion.)
|
|
|
|
(Note also that anyone who multi-threads this execution
|
|
environment has a slight locking problem with these flags if arm
|
|
testing is multi-threaded.)
|
|
|
|
If `tot` is yes, (`&`, `0`), the arm function is *total* and has
|
|
to return properly (though it can still return *u3_none*).
|
|
Otherwise, it is *partial* and can `u3_cm_bail()` out with
|
|
c3__punt. This feature has a cost: the jet runs in a subroad.
|
|
|
|
Finally, if `liv` is no (`|`, 1), the jet is off and doesn't run.
|
|
|
|
It should be easy to see how the tree of cores gets declared -
|
|
precisely, in `j/dash.c`. We declare the hierarchy as a tree
|
|
of `u3j_core` structures, each of which comes with a static list
|
|
of arms `arm_u` and sub-cores `dev_u`.
|
|
|
|
In `u3j_boot()`, we traverse the hierarchy, fill in parent
|
|
pointers `par_u`, and enumerate all `u3j_core` structures
|
|
into a single flat array `u3j_dash.ray_u`. Our hot state
|
|
then appears ready for action.
|
|
|
|
### u3j: jet functions
|
|
|
|
At present, all drivers are compiled statically into `u3`. This is
|
|
not a long-term permanent solution or anything. However, it will
|
|
always be the case with a certain amount of core functionality.
|
|
|
|
For instance, there are some jet functions that we need to call
|
|
as part of loading the Arvo kernel - like `++cue` to unpack a
|
|
noun from an atom. And obviously it makes sense, when jets are
|
|
significant enough to compile into `u3`, to export their symbols
|
|
in headers and the linker.
|
|
|
|
There are three interface prefixes for standard jet functions:
|
|
`u3k`, `u3q`, and `u3w`. All jets have `u3w` interfaces; most
|
|
have `u3q`; some have `u3k`. Of course the actual logic is
|
|
shared.
|
|
|
|
`u3w` interfaces use the same protocol as `fun_f` above: the
|
|
caller passes the entire core, which is retained if the function
|
|
returns `u3_none`, transferred otherwise. Why? Again, use
|
|
counts of 1 are special and precious for performance hackers.
|
|
|
|
`u3q` interfaces break the core into C arguments, *retain* noun
|
|
arguments, and *transfer* noun returns. `u3k` interfaces are the
|
|
same, except with more use of `u3_none` and other simple C
|
|
variations on the Hoon original, but *transfer* both arguments
|
|
and returns. Generally, `u3k` are most convenient for new code.
|
|
|
|
Following `u3k/q/w` is `[a-f]`, corresponding to the 6 logical
|
|
tiers of the kernel, or `g` for user-level jets. Another letter
|
|
is added for functions within subcores. The filename, under
|
|
`j/`, follows the tier and the function name.
|
|
|
|
For instance, `++add` is `u3wa_add(cor)`, `u3qa_add(a, b)`, or
|
|
`u3ka_add(a, b)`, in `j/a/add.c`. `++get` in `++by` is
|
|
`u3wdb_get(cor)`, `u3kdb_get(a, b)`, etc, in `j/d/by_get.c`.
|
|
|
|
For historical reasons, all internal jet code in `j/[a-f]`
|
|
*retains* noun arguments, and *transfers* noun results. Please
|
|
do not do this in new `g` jets! The new standard protocol is to
|
|
transfer both arguments and results.
|
|
|
|
### u3a: allocation functions
|
|
|
|
`u3a` allocates on the current road (u3R). Its internal
|
|
structures are uninteresting and typical of a naive allocator.
|
|
|
|
The two most-used `u3a` functions are `u3a_gain()` to add a
|
|
reference count, and `u3a_lose()` to release one (and free the
|
|
noun, if the use count is zero). For convenience, `u3a_gain()`
|
|
returns its argument. The pair are generally abbreviated with
|
|
the macros `u3k()` and `u3z()` respectively.
|
|
|
|
Normally we create nouns through `u3i` functions, and don't call
|
|
the `u3a` allocators directly. But if you do:
|
|
|
|
One, there are *two* sets of allocators: the word-aligned
|
|
allocators and the fully-aligned (ie, malloc compatible)
|
|
allocators. For instance, on a typical OS X setup, malloc
|
|
produces 16-byte aligned results - needed for some SSE
|
|
instructions.
|
|
|
|
These allocators are *not compatible*. For 32-bit alignment
|
|
as used in nouns, call
|
|
|
|
/* u3a_walloc(): allocate storage measured in words.
|
|
*/
|
|
void*
|
|
u3a_walloc(c3_w len_w);
|
|
|
|
/* u3a_wfree(): free storage.
|
|
*/
|
|
void
|
|
u3a_wfree(void* lag_v);
|
|
|
|
/* u3a_wealloc(): word realloc.
|
|
*/
|
|
void*
|
|
u3a_wealloc(void* lag_v, c3_w len_w);
|
|
|
|
For full alignment, call:
|
|
|
|
/* u3a_malloc(): aligned storage measured in bytes.
|
|
*/
|
|
void*
|
|
u3a_malloc(size_t len_i);
|
|
|
|
/* u3a_realloc(): aligned realloc in bytes.
|
|
*/
|
|
void*
|
|
u3a_realloc(void* lag_v, size_t len_i);
|
|
|
|
/* u3a_realloc2(): gmp-shaped realloc.
|
|
*/
|
|
void*
|
|
u3a_realloc2(void* lag_v, size_t old_i, size_t new_i);
|
|
|
|
/* u3a_free(): free for aligned malloc.
|
|
*/
|
|
void
|
|
u3a_free(void* tox_v);
|
|
|
|
/* u3a_free2(): gmp-shaped free.
|
|
*/
|
|
void
|
|
u3a_free2(void* tox_v, size_t siz_i);
|
|
|
|
There are also a set of special-purpose allocators for building
|
|
atoms. When building atoms, please remember that it's incorrect
|
|
to have a high 0 word - the word length in the atom structure
|
|
must be strictly correct.
|
|
|
|
Of course, we don't always know how large our atom will be.
|
|
Therefore, the standard way of building large atoms is to
|
|
allocate a block of raw space with `u3a_slab()`, then chop off
|
|
the end with `u3a_malt()` (which does the measuring itself)
|
|
or `u3a_mint()` in case you've measured it yourself.
|
|
|
|
Once again, *do not call `malloc()`* (or C++ `new`) within any
|
|
code that may be run within a jet. This will cause rare sporadic
|
|
corruption when we interrupt execution within a `malloc()`. We'd
|
|
just override the symbol, but `libuv` uses `malloc()` across
|
|
threads within its own synchronization primitives - for this to
|
|
work with `u3a_malloc()`, we'd have to introduce our own locks on
|
|
the surface-level road (which might be a viable solution).
|
|
|
|
### u3n: nock execution
|
|
|
|
The `u3n` routines execute Nock itself. On the inside, they have
|
|
a surprising resemblance to the spec proper (the only interesting
|
|
detail is how we handle tail-call elimination) and are, as one
|
|
would expect, quite slow. (There is no such thing as a fast tree
|
|
interpreter.)
|
|
|
|
There is only one Nock, but there are lots of ways to call it.
|
|
(Remember that all `u3n` functions *transfer* C arguments and
|
|
returns.)
|
|
|
|
The simplest interpreter, `u3n_nock_on(u3_noun bus, u3_noun fol)`
|
|
invokes Nock on `bus` (the subject) and `fol` (the formula).
|
|
(Why is it`[subject formula]`, not `[formula subject]`? The same
|
|
reason `0` is true and `1` is false.)
|
|
|
|
A close relative is `u3n_slam_on(u3_noun gat, u3_noun sam)`,
|
|
which slams a *gate* (`gat`) on a sample (`sam`). (In a normal
|
|
programming language which didn't talk funny and was retarded,
|
|
`u3n_slam_on()` would call a function on an argument.) We could
|
|
write it most simply as:
|
|
|
|
u3_noun
|
|
u3n_slam_on(u3_noun gat, u3_noun sam)
|
|
{
|
|
u3_noun pro = u3n_nock_on
|
|
(u3nc(u3k(u3h(gat)),
|
|
u3nc(sam, u3k(u3t(u3t(gat))))),
|
|
u3k(u3h(gat)));
|
|
u3z(gat);
|
|
return pro;
|
|
}
|
|
|
|
Simpler is `u3n_kick_on(u3_noun gat)`, which slams a gate (or,
|
|
more generally, a *trap* - because sample structure is not even
|
|
needed here) without changing its sample:
|
|
|
|
u3_noun
|
|
u3n_kick_on(u3_noun gat, u3_noun sam)
|
|
{
|
|
return u3n_nock_on(gat, u3k(u3h(gat)));
|
|
}
|
|
|
|
The `_on` functions in `u3n` are all defined as pure Nock. But
|
|
actually, even though we say we don't extend Nock, we do. But we
|
|
don't. But we do.
|
|
|
|
Note that `u3` has a well-developed error handling system -
|
|
`u3m_bail()` to throw an exception, `u3m_soft_*` to catch one.
|
|
But Nock has no exception model at all. That's okay - all it
|
|
means if that if an `_on` function bails, the exception is an
|
|
exception in the caller.
|
|
|
|
However, `u3`'s exception handling happens to match a convenient
|
|
virtual super-Nock in `hoon.hoon`, the infamous `++mock`. Of
|
|
course, Nock is slow, and `mock` is Nock in Nock, so it is
|
|
(logically) super-slow. Then again, so is decrement.
|
|
|
|
With the power of `u3`, we nest arbitrary layers of `mock`
|
|
without any particular performance cost. Moreover, we simply
|
|
treat Nock proper as a special case of `mock`. (More precisely,
|
|
the internal VM loop is `++mink` and the error compiler is
|
|
`++mook`. But we call the whole sandbox system `mock`.)
|
|
|
|
The nice thing about `mock` functions is that (by executing
|
|
within `u3m_soft_run()`, which as you may recall uses a nested
|
|
road) they provide both exceptions and the namespace operator -
|
|
`.^` in Hoon, which becomes operator `11` in `mock`.
|
|
|
|
`11` requires a namespace function, or `fly`, which produces a
|
|
`++unit` - `~` (`0`) for no binding, or `[0 value]`. The sample
|
|
to a `fly` is a `++path`, just a list of text `span`.
|
|
|
|
`mock` functions produce a `++toon`. Fully elaborated:
|
|
|
|
++ noun ,* :: any noun
|
|
++ path (list ,@ta) :: namespace path
|
|
++ span ,@ta :: text-atom (ASCII)
|
|
++ toon $% [%0 p=noun] :: success
|
|
[%1 p=(list path)] :: blocking paths
|
|
[%2 p=(list tank)] :: stack trace
|
|
== ::
|
|
++ tank :: printable
|
|
$% [%leaf p=tape] :: flat text
|
|
$: %palm :: backstep list
|
|
p=[p=tape q=tape r=tape s=tape] :: mid cap open close
|
|
q=(list tank) :: contents
|
|
== ::
|
|
$: %rose :: straight list
|
|
p=[p=tape q=tape r=tape] :: mid open close
|
|
q=(list tank) :: contents
|
|
== ::
|
|
==
|
|
|
|
(Note that `tank` is overdesigned and due for replacement.)
|
|
|
|
What does a `toon` mean? Either your computation succeded (`[0
|
|
noun]`, or could not finish because it blocked on one or more
|
|
global paths (`[1 (list path)]`), or it exited with a stack trace
|
|
(`[2 (list tank)]`).
|
|
|
|
Note that of all the `u3` exceptions, only `%exit` is produced
|
|
deterministically by the Nock definition. Therefore, only
|
|
`%exit` produces a `2` result. Any other argument to
|
|
`u3m_bail()` will unwind the virtualization stack all the way to
|
|
the top - or to be more exact, to `u3m_soft_top()`.
|
|
|
|
In any case, the simplest `mock` functions are `u3n_nock_un()`
|
|
and `u3n_slam_un()`. These provide exception control without
|
|
any namespace change, as you can see by the code:
|
|
|
|
/* u3n_nock_un(): produce .*(bus fol), as ++toon.
|
|
*/
|
|
u3_noun
|
|
u3n_nock_un(u3_noun bus, u3_noun fol)
|
|
{
|
|
u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0); // |=(a=* .^(a))
|
|
|
|
return u3n_nock_in(fly, bus, fol);
|
|
}
|
|
|
|
/* u3n_slam_un(): produce (gat sam), as ++toon.
|
|
*/
|
|
u3_noun
|
|
u3n_slam_un(u3_noun gat, u3_noun sam)
|
|
{
|
|
u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0); // |=(a=* .^(a))
|
|
|
|
return u3n_slam_in(fly, gat, sam);
|
|
}
|
|
|
|
The `fly` is added as the first argument to `u3n_nock_in()` and
|
|
`u3n_slam_in()`. Of course, logically, `fly` executes in the
|
|
caller's exception layer. (Maintaining this illusion is slightly
|
|
nontrivial.) Finally, `u3n_nock_an()` is a sandbox with a null
|
|
namespace.
|
|
|
|
### u3e: persistence
|
|
|
|
The only `u3e` function you should need to call is `u3e_save()`,
|
|
which saves the loom. As it can be restored on any platform,
|
|
please make sure you don't have any state in the loom that is
|
|
bound to your process or architecture - except for exceptions
|
|
like the warm jet state, which is actively purged on reboot.
|
|
|
|
### u3r: reading nouns (weak)
|
|
|
|
As befits accessors they don't make anything, `u3r` noun reading
|
|
functions always retain their arguments and their returns. They
|
|
never bail; rather, when they don't work, they return a `u3_weak`
|
|
result.
|
|
|
|
Most of these functions are straightforward and do only what
|
|
their comments say. A few are interesting enough to discuss.
|
|
|
|
`u3r_at()` is the familiar tree fragment function, `/` from the
|
|
Nock spec. For taking complex nouns apart, `u3r_mean()` is a
|
|
relatively funky way of deconstructing nouns with a varargs list
|
|
of `axis`, `u3_noun *`. For cells, triples, etc, decompose with
|
|
`u3r_cell()`, `u3r_trel()`, etc. For the tagged equivalents, use
|
|
`u3r_pq()` and friends.
|
|
|
|
`u3r_sing(u3_noun a, u3_noun b)` (true if `a` and `b` are a
|
|
*single* noun) are interesting because it uses mugs to help it
|
|
out. Clearly, different nouns may have the same mug, but the
|
|
same nouns cannot have a different mug. It's important to
|
|
understand the performance characteristics of `u3r_sing()`:
|
|
the worst possible case is a comparison of duplicate nouns,
|
|
which have the same value but were created separately. In this
|
|
case, the tree is traversed
|
|
|
|
`u3r_sung()` is a deeply funky and frightening version of
|
|
`u3r_sing()` that unifies pointers to the duplicate nouns it
|
|
finds, freeing the second copy. Obviously, do not use
|
|
`u3r_sung()` when you have live, but not reference counted, noun
|
|
references from C - if they match a noun with a refcount of 1
|
|
that gets freed, bad things happen.
|
|
|
|
It's important to remember that `u3r_mug()`, which produces a
|
|
31-bit, nonzero insecure hash, uses the `mug_w` slot in any boxed
|
|
noun as a lazy cache. There are a number of variants of
|
|
`u3r_mug()` that can get you out of building unneeded nouns.
|
|
|
|
### u3x: reading nouns (bail)
|
|
|
|
`u3x` functions are like `u3r` functions, but instead of
|
|
returning `u3_none` when (for instance) we try to take the head
|
|
of an atom, they bail with `%exit`. In other words, they do what
|
|
the same operation would do in Nock.
|
|
|
|
### u3h: hash tables.
|
|
|
|
We can of course use the Hoon `map` structure as an associative
|
|
array. This is a balanced treap and reasonably fast. However,
|
|
it's considerably inferior to a custom structure like an HAMT
|
|
(hash array-mapped trie). We use `u3_post` to allocate HAMT
|
|
structures on the loom.
|
|
|
|
(Our HAMT implements the classic Bagwell algorithm which depends
|
|
on the `gcc` standard directive `__builtin_popcount()`. On a CPU
|
|
which doesn't support popcount or an equivalent instruction, some
|
|
other design would probably be preferable.)
|
|
|
|
There's no particular rocket science in the API. `u3h_new()`
|
|
creates a hashtable; `u3h_free()` destroys one; `u3h_put()`
|
|
inserts, `u3h_get()` retrieves. You can transform values in a
|
|
hashtable with `u3h_walk()`.
|
|
|
|
The only funky function is `u3h_gut()`, which unifies keys with
|
|
`u3r_sung()`. As with all cases of `u3r_sung()`, this must be
|
|
used with extreme caution.
|
|
|
|
### u3z: memoization
|
|
|
|
Connected to the `~+` rune in Hoon, via the Nock `%memo` hint,
|
|
the memoization facility is a general-purpose cache.
|
|
|
|
(It's also used for partial memoization - a feature that'll
|
|
probably be removed, in which conservative worklist algorithms
|
|
(which would otherwise be exponential) memoize everything in the
|
|
subject *except* the worklist. This is used heavily in the Hoon
|
|
compiler jets (j/f/*.c). Unfortunately, it's probably not
|
|
possible to make this work perfectly in that it can't be abused
|
|
to violate Nock, so we'll probably remove it at a later date,
|
|
instead making `++ut` keep its own monadic cache.)
|
|
|
|
Each `u3z` function comes with a `c3_m` mote which disambiguates
|
|
the function mapping key to value. For Nock itself, use 0. For
|
|
extra speed, small tuples are split out in C; thus, find with
|
|
|
|
u3_weak u3z_find(c3_m, u3_noun);
|
|
u3_weak u3z_find_2(c3_m, u3_noun, u3_noun);
|
|
u3_weak u3z_find_3(c3_m, u3_noun, u3_noun, u3_noun);
|
|
u3_weak u3z_find_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun);
|
|
|
|
and save with
|
|
|
|
u3_noun u3z_save(c3_m, u3_noun, u3_noun);
|
|
u3_noun u3z_save_2(c3_m, u3_noun, u3_noun, u3_noun);
|
|
u3_noun u3z_save_3(c3_m, u3_noun, u3_noun, u3_noun, u3_noun);
|
|
u3_noun u3z_save_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun, u3_noun);
|
|
|
|
where the value is the last argument. To eliminate duplicate
|
|
nouns, there is also
|
|
|
|
u3_noun
|
|
u3z_uniq(u3_noun);
|
|
|
|
`u3z` functions retain keys and transfer values.
|
|
|
|
The `u3z` cache, built on `u3h` hashes, is part of the current
|
|
road, and goes away when it goes away. (In future, we may wish
|
|
to promote keys/values which outlive the road, as we do with jet
|
|
state.) There is no cache reclamation at present, so be careful.
|
|
|
|
### u3t: tracing and profiling.
|
|
|
|
TBD.
|
|
|
|
### u3v: the Arvo kernel
|
|
|
|
An Arvo kernel - or at least, a core that compiles with the Arvo
|
|
interface - is part of the global `u3` state. What is an Arvo
|
|
core? Slightly pseudocoded:
|
|
|
|
++ arvo
|
|
|%
|
|
++ come |= [yen=@ ova=(list ovum) nyf=pone] :: 11
|
|
^- [(list ovum) _+>]
|
|
!!
|
|
++ keep |= [now=@da hap=path] :: 4
|
|
^- (unit ,@da)
|
|
!!
|
|
++ load |= [yen=@ ova=(list ovum) nyf=pane] :: 86
|
|
^- [(list ovum) _+>]
|
|
!!
|
|
++ peek |= [now=@da path] :: 87
|
|
^- (unit)
|
|
!!
|
|
++ poke |= [now=@da ovo=ovum] :: 42
|
|
^- [(list ovum) _+>]
|
|
!!
|
|
++ wish |= txt=@ta :: 20
|
|
^- *
|
|
!!
|
|
--
|
|
++ card ,[p=@tas q=*] :: typeless card
|
|
++ ovum ,[p=wire q=card] :: Arvo event
|
|
++ wire path :: event cause
|
|
|
|
This is the Arvo ABI in a very real sense. Arvo is a core with
|
|
these six arms. To use these arms, we hardcode the axis of the
|
|
formula (`11`, `4`, `86`, etc) into the C code that calls Arvo,
|
|
because otherwise we'd need type metadata - which we can get, by
|
|
calling Arvo.
|
|
|
|
It's important to understand the Arvo event/action structure, or
|
|
`++ovum`. An `ovum` is a `card`, which is any `[term noun]`
|
|
cell, and a `++wire`, a `path` which indicates the location of
|
|
the event. At the Unix level, the `wire` corresponds to a system
|
|
module or context. For input events, this is the module that
|
|
caused the event; for output actions, it's the module that
|
|
performs the action.
|
|
|
|
`++poke` sends Arvo an event `ovum`, producing a cell of action
|
|
ova and a new Arvo core.
|
|
|
|
`++peek` dereferences the Arvo namespace. It takes a date and a
|
|
key, and produces `~` (`0`) or `[~ value]`.
|
|
|
|
`++keep` asks Arvo the next time it wants to be woken up, for the
|
|
given `wire`. (This input will probably be eliminated in favor
|
|
of a single global timer.)
|
|
|
|
`++wish` compiles a string of Hoon source. While just a
|
|
convenience, it's a very convenient convenience.
|
|
|
|
`++come` and `++load` are used by Arvo to reset itself (more
|
|
precisely, to shift the Arvo state from an old kernel to a new
|
|
one); there is no need to call them from C.
|
|
|
|
Now that we understand the Arvo kernel interface, let's look at
|
|
the `u3v` API. As usual, all the functions in `u3v` are
|
|
commented, but unfortunately it's hard to describe this API as
|
|
clean at present. The problem is that `u3v` remains design
|
|
coupled to the old `vere` event handling code written for `u2`.
|
|
But let's describe the functions you should be calling, assuming
|
|
you're not writing the next event system. There are only two.
|
|
|
|
`u3v_wish(str_c)` wraps the `++wish` functionality in a cache
|
|
(which is read-only unless you're on the surface road).
|
|
|
|
`u3v_do()` uses `wish` to provide a convenient interface for
|
|
calling Hoon kernel functions by name. Even more conveniently,
|
|
we tend to call `u3v_do()` with these convenient aliases:
|
|
|
|
#define u3do(txt_c, arg) u3v_do(txt_c, arg)
|
|
#define u3dc(txt_c, a, b) u3v_do(txt_c, u3nc(a, b))
|
|
#define u3dt(txt_c, a, b, c) u3v_do(txt_c, u3nt(a, b, c))
|
|
#define u3dq(txt_c, a, b, c, d) u3v_do(txt_c, u3nt(a, b, c, d))
|
|
|