shrub/Spec/u3.md

# u3: noun processing in C.

`u3` is the C library that makes Urbit work.  If it wasn't called
`u3`, it might be called `libnoun` - it's a library for making
and storing nouns.

What's a noun?  A noun is either a cell or an atom.  A cell is an
ordered pair of any two nouns.  An atom is an unsigned integer of
any size.

To the C programmer, this is not a terribly complicated data
structure, so why do you need a library for it?

One: nouns have a well-defined computation kernel, Nock, whose
spec fits on a page and gzips to 340 bytes.  But the only
arithmetic operation in Nock is increment.  So it's nontrivial
to compute both efficiently and correctly.

Two: `u3` is designed to support "permanent computing," ie, a
single-level store which is transparently snapshotted.  This
implies a specialized memory-management model, etc, etc.

(Does `u3` depend on the higher levels of Urbit, Arvo and Hoon?
Yes and no.  `u3` expects you to load something shaped like an
Arvo kernel, and use it as an event-processing function.  But you
don't need to use this feature if you don't want, and your kernel
doesn't have to be Arvo proper - just Arvo-compatible.  Think of
`u3` as the BIOS and Arvo as the boot kernel.  And there are no
dependencies at all between Hoon the language and `u3`.)

## c3: C in Urbit

Under `u3` is the simple `c3` layer, which is just how we write C
in Urbit.

When writing C in u3, please of course follow the conventions of
the code around you as regards indentation, etc.  It's especially
important that every function have a header comment, even if it
says nothing interesting.

But some of our idiosyncrasies go beyond convention.  Yes, we've
done awful things to C. Here's what we did and why we did.

### c3: integer types

First, it's generally acknowledged that underspecified integer
types are C's worst disaster.  C99 fixed this, but the `stdint`
types are wordy and annoying.  We've replaced them with:

    /* Good integers.
    */
      typedef uint64_t c3_d;  // double-word
      typedef int64_t c3_ds;  // signed double-word
      typedef uint32_t c3_w;  // word
      typedef int32_t c3_ws;  // signed word
      typedef uint16_t c3_s;  // short
      typedef int16_t c3_ss;  // signed short
      typedef uint8_t c3_y;   // byte
      typedef int8_t c3_ys;   // signed byte
      typedef uint8_t c3_b;   // bit

      typedef uint8_t c3_t;   // boolean
      typedef uint8_t c3_o;   // loobean
      typedef uint8_t c3_g;   // 5-bit atom for a 32-bit log.
      typedef uint32_t c3_l;  // little; 31-bit unsigned integer
      typedef uint32_t c3_m;  // mote; also c3_l; LSB first a-z 4-char string.

    /* Bad integers.
    */
      typedef char      c3_c; // does not match int8_t or uint8_t
      typedef int       c3_i; // int - really bad
      typedef uintptr_t c3_p; // pointer-length uint - really really bad
      typedef intptr_t c3_ps; // pointer-length int - really really bad

Some of these need explanation.  A loobean is a Nock boolean -
Nock, for mysterious reasons, uses 0 as true (always say "yes")
and 1 as false (always say "no").

Nock and/or Hoon cannot tell the difference between a short atom
and a long one, but at the `u3` level every atom under `2^31` is
direct.  The `c3_l` type is useful to annotate this.  A `c3_m` is
a *mote* - a string of up to 4 characters in a `c3_l`, least
significant byte first.  A `c3_g` should be a 5-bit atom.  Of
course, C cannot enforce these constraints, only document them.

Use the "bad" - ie, poorly specified - integer types only when
interfacing with external code that expects them.

An enormous number of motes are defined in `i/c/motes.h`.  There
is no reason to delete motes that aren't being used, or even to
modularize the definitions.  Keep them alphabetical, though.

### c3: variables and variable naming

The C3 style uses Hoon style TLV variable names, with a quasi
Hungarian syntax.  This is weird, but works really well, as long
as what you're doing isn't hideously complicated.  (Then it works
badly, but we shouldn't need anything hideous in u3.)

A TLV variable name is a random pronounceable three-letter
string, sometimes with some vague relationship to its meaning,
but usually not.  Usually CVC (consonant-vowel-consonant) is a
good choice.

You should use TLVs much the way math people use Greek letters.
The same concept should in general get the same name across
different contexts.  When you're working in a given area, you'll
tend to remember the binding from TLV to concept by sheer power
of associative memory.  When you come back to it, it's not that
hard to relearn.  And of course, when in doubt, comment it.

Variables take pseudo-Hungarian suffixes, matching in general the
suffix of the integer type:

    c3_w wor_w;     //  32-bit word

Unlike in standard Hungarian, there is no change for pointer
variables.  C structure variables take a `_u` suffix.

### c3: loobeans

The code (from `defs.h`) tells the story:

    #     define c3y      0
    #     define c3n      1

    #     define _(x)        (c3y == (x))
    #     define __(x)       ((x) ? c3y : c3n)
    #     define c3a(x, y)   __(_(x) && _(y))
    #     define c3o(x, y)   __(_(x) || _(y))

In short, use `_()` to turn a loobean into a boolean, `__` to go
the other way.  Use `!` as usual, `c3y` for yes and `c3n` for no,
`c3a` for and and `c3o` for or.

## u3: land of nouns

The division between `c3` and `u3` is that you could theoretically
imagine using `c3` as just a generic C environment.  Anything to do
with nouns is in `u3`.

### u3: a map of the system

There are two kinds of symbols in `u3`: regular and irregular.
Regular symbols follow this pattern:

    prefix    purpose                      .h         .c
    -------------------------------------------------------
    u3a_      allocation                   i/n/a.h    n/a.c
    u3e_      persistence                  i/n/e.h    n/e.c
    u3h_      hashtables                   i/n/h.h    n/h.c
    u3i_      noun construction            i/n/i.h    n/i.c
    u3j_      jet control                  i/n/j.h    n/j.c
    u3m_      system management            i/n/m.h    n/m.c
    u3n_      nock computation             i/n/n.h    n/n.c
    u3r_      noun access (error returns)  i/n/r.h    n/r.c
    u3t_      profiling                    i/n/t.h    n/t.c
    u3v_      arvo                         i/n/v.h    n/v.c
    u3x_      noun access (error crashes)  i/n/x.h    n/x.c
    u3z_      memoization                  i/n/z.h    n/z.c
    u3k[a-g]  jets (transfer, C args)      i/j/k.h    j/[a-g]/*.c
    u3q[a-g]  jets (retain, C args)        i/j/q.h    j/[a-g]/*.c
    u3w[a-g]  jets (retain, nock core)     i/j/w.h    j/[a-g]/*.c

Irregular symbols always start with `u3` and obey no other rules.
They're defined in `i/n/aliases.h`.  Finally, `i/all.h` includes
all these headers (fast compilers, yay) and is all you need to
program in `u3`.

### u3: noun internals

A noun is a `u3_noun` - currently defined as a 32-bit `c3_w`.

If your `u3_noun` is less than `(1 << 31)`, it's a direct atom.
Every unsigned integer between `0` and `0x7fffffff` inclusive is
its own noun.

If bit `31` is set in a `u3_noun` and bit `30` is `1` the noun
is an indirect cell.  If bit `31` is set and bit `30` is `0` the
noun is an indirect atom.  Bits `29` through `0` are a word
pointer into the loom - see below.  The structures are:

    typedef struct {
      c3_w mug_w;
      c3_w len_w;
      c3_w buf_w[0];    //  actually [len_w]
    } u3a_atom;

    typedef struct {
      c3_w    mug_w;
      u3_noun hed;
      u3_noun tel;
    } u3a_cell;

The only thing that should be mysterious here is `mug_w`, which
is a 31-bit lazily computed nonzero short hash (FNV currently,
soon Murmur3).  If `mug_w` is 0, the hash is not yet computed.
We also hijack this field for various hacks, such as saving the
new address of a noun when copying over.

Also, the value `0xffffffff` is `u3_none`, which is never a valid
noun.  Use the type `u3_weak` to express that a noun variable may
be `u3_none`.

### u3: reference counts

The only really essential thing you need to know about `u3` is
how to handle reference counts.  Everything else, you can skip
and just get to work.

u3 deals with reference-counted, immutable, acyclic nouns.
Unfortunately, we are not Apple and can't build reference
counting into your C compiler, so you need to count by hand.

Every allocated noun (or any allocation object, because our
allocator is general-purpose) contains a counter which counts the
number of references to it - typically variables with type
`u3_noun`.  When this counter goes to 0, the noun is freed.

To tell `u3` that you've added a reference to a noun, call the
function `u3a_gain()` or its shorthand `u3k()`.  (For your
convenience, this function returns its argument.)  To tell `u3`
that you've destroyed a reference, call `u3a_lose()` or `u3z()`.

(If you screw up by decrementing the counter too much, `u3` will
dump core in horrible ways.  If you screw up by incrementing it
too much, `u3` will leak memory.  To check for memory leaks,
set the `bug_o` flag in `u3e_boot()` - eg, run `vere` with `-g`.
Memory leaks are difficult to debug - the best way to handle
leaks is just to revert to a version that didn't have them, and
look over your code again.)

(You can gain or lose a direct atom.  It does nothing.)

### u3: reference protocols

*THIS IS THE MOST CRITICAL SECTION IN THE `u3` DOCUMENTATION.*

The key question when calling a C function in a refcounted world
is what the function will do to the noun refcounts - and, if the
function returns a noun, what it does to the return.

There are two semantic patterns, `transfer` and `retain`.  In
`transfer` semantics, the caller "gives" a use count to the
callee, which "gives back" any return.  For instance, if I have

    {
      u3_noun foo = u3i_string("foobar");
      u3_noun bar;

      bar = u3f_futz(foo);
      [...]
      u3z(bar);
    }

Suppose `u3f_futz()` has `transfer` semantics.  At `[...]`, my
code holds one reference to `bar` and zero references to `foo` -
which has been freed, unless it's part of `bar`.  My code now
owns `bar` and gets to work with it until it's done, at which
point a `u3z()` is required.

On the other hand, if `u3f_futz()` has `retain` semantics, we
need to write

    {
      u3_noun foo = u3i_string("foobar");
      u3_noun bar;

      bar = u3f_futz(foo);
      [...]
      u3z(foo);
    }

because calling `u3f_futz()` does not release our ownership of
`foo`, which we have to free ourselves.

But if we free `bar`, we are making a great mistake, because our
reference to it is not in any way registered in the memory
manager (which cannot track references in C variables, of
course).  It is normal and healthy to have these uncounted
C references, but they must be treated with care.

The bottom line is that it's essential for the caller to  know
the refcount semantics of any function which takes or returns a
noun.  (In some unusual circumstances, different arguments or
returns in one function may be handled differently.)

Broadly speaking, as a design question, retain semantics are more
appropriate for functions which inspect or query nouns.  For
instance, `u3h()` (which takes the head of a noun) retains, so
that we can traverse a noun tree without constantly incrementing
and decrementing.

Transfer semantics are more appropriate for functions which make
nouns, which is obviously what most functions do.

In general, though, in most places it's not worth thinking about
what your function does.  There is a convention for it, which
depends on where it is, not what it does.  Follow the convention.

### u3: reference conventions

The `u3` convention is that, unless otherwise specified, *all
functions have transfer semantics* - with the exception of the
prefixes: `u3r`, `u3x`, `u3z`, `u3q` and `u3w`.  Also, within
jet directories `a` through `f` (but not `g`), internal functions
retain (for historical reasons).

If functions outside this set have retain semantics, they need to
be commented, both in the `.h` and `.c` file, with `RETAIN` in
all caps.  Yes, it's this important.

### u3: system architecture

If you just want to tinker with some existing code, it might be
enough to understand the above.  If not, it's probably worth
taking the time to look at `u3` as a whole.

`u3` is designed to work as a persistent event processor.
Logically, it computes a function of the form

    f(event, old state) -> (actions, new state)

Obviously almost any computing model - including, but not limited
to, Urbit - can be defined in this form.  To create the illusion
of a computer that never loses state and never fails, we:

- log every event externally before it goes into u3
- keep a single reference to a permanent state noun.
- can abort any event without damaging the permanent state.
- snapshot the permanent state periodically, and/or prune logs.

### u3: the road model

`u3` uses a memory design which I'm sure someone has invented
somewhere before, because it's not very clever, but I've never
seen it anywhere in particular.

Every allocation starts with a solid block of memory, which `u3`
calls the `loom`.  How do we allocate on the loom?  You're
probably familiar with the Unix heap-stack design, in which the
stack grows downward and the heap (malloc arena) grows upward:

    0           brk                                          ffff
    |   heap     |                                    stack    |
    |------------#################################+++++++++++++|
    |                                             |            |
    0                                             sp         ffff

A road is a normal heap-stack system, except that the heap
and stack can point in *either direction*.  Therefore, inside
a road, we can nest another road in the *opposite direction*.

When the opposite road completes, its heap is left on top of
the opposite heap's stack.  It's no more than the normal
behavior of a stack machine for all subcomputations to push
their results on the stack.

The performance tradeoff of "leaping" - reversing directions in
the road - is that if the outer computation wants to preserve the
results of the inner one, not just use them for temporary
purposes, it has to *copy them*.

This is a trivial cost in some cases, a prohibitive cost in
others.  The upside, of course, is that all garbage accrued
in the inner computation is discarded at zero cost.

The goal of the road system is the ability to *layer* memory
models.  If you are allocating on a road, you have no idea
how deep within a nested road system you are - in other words,
you have no idea exactly how durable your result may be.
But free space is never fragmented within a road.

Roads do not reduce the generality or performance of a memory
system, since even the most complex GC system can be nested
within a road at no particular loss of performance - a road
is just a block of memory.

Each road (`u3a_road` to be exact) uses four pointers: `rut` is
the bottom of the arena, `hat` the top of the arena, `mat` the
bottom of the stack, `cap` the top of the stack.  (Bear in mind
that the road "stack" is not actually used as the C function-call
stack, though it probably should be.)

A "north" road has the stack high and the heap low:

    0           rut   hat                                    ffff
    |            |     |                                       |
    |~~~~~~~~~~~~-------##########################+++++++$~~~~~|
    |                                             |      |     |
    0                                            cap    mat  ffff

A "south" road is the other way around:

    0           mat   cap                                    ffff
    |            |     |                                       |
    |~~~~~~~~~~~~$++++++##########################--------~~~~~|
    |                                             |      |     |
    0                                            hat    rut  ffff

Legend: `-` is durable storage (heap); `+` is temporary storage
(stack); `~` is deep storage (immutable); `$` is the allocation
frame; `#` is free memory.

Pointer restrictions: pointers stored in `+` can point anywhere.
Of course, pointing to `#` (free memory) would be a bug.
Pointers in `-` can only point to `-` or `~`; pointers in `~`
only point to `~`.

To "leap" is to create a new inner road in the `###` free space.
but in the reverse direction, so that when the inner road
"falls" (terminates), its durable storage is left on the
temporary storage of the outer road.

`u3` keeps a global variable, `u3_Road` or its alias `u3R`, which
points to the current road.  (If we ever run threads in inner
roads - see below - this will become a thread-local variable.)
Relative to `u3R`, `+` memory is called `junior` memory; `-`
memory is `normal` memory; `~` is `senior` memory.

### u3: explaining the road model

But... why?

We're now ready to understand why the road system works so
logically with the event and persistence model.

The key is that *we don't update refcounts in senior memory.*
A pointer from an inner road to an outer road is not counted.
Also, the outmost, or `surface` road, is the only part of the
image that gets checkpointed.

So the surface road contains the entire durable state of `u3`.
When we process an event, or perform any kind of complicated or
interesting calculation, *we process it in an inner road*.  If
its results are saved, they need to be copied.

Since processing in an inner road does not touch surface memory,
(a) we can leave the surface road in a read-only state and not
mark its pages dirty; (b) we can abort an inner calculation
without screwing up the surface; and (c) because inner results
are copied onto the surface, the surface doesn't get fragmented.

All of (a), (b) and (c) are needed for checkpointing to be easy.
It might be tractable otherwise, but easy is even better.

Moreover, while the surface is most definitely single-threaded,
we could easily run multiple threads in multiple inner roads
(as long as the threads don't have pointers into each others'
memory, which they obviously shouldn't).

Moreover, in future, we'll experiment more with adding road
control hints to the programmer's toolbox.  Reference counting is
expensive.  We hypothesize that in many - if not most - cases,
the programmer can identify procedural structures whose garbage
should be discarded in one step by copying the results.  Then,
within the procedure, we can switch the allocator into `sand`
mode, and stop tracking references at all.

### u3: rules for C programming

There are two levels at which we program in C: (1) above the
interpreter; (2) within the interpreter or jets.  These have
separate rules which need to be respected.

### u3: rules above the interpreter

In its relations with Unix, Urbit follows a strict rule of "call
me, I won't call you."  We do of course call Unix system calls,
but only for the purpose of actually computing.

Above Urbit, you are in a normal C/Unix programming environment
and can call anything in or out of Urbit.  Note that when using
`u3`, you're always on the surface road, which is not thread-safe
by default.  Generally speaking, `u3` is designed to support
event-oriented, single-threaded programming.

If you need threads which create nouns, you could use
`u3m_hate()` and `u3m_love()` to run these threads in subroads.
You'd need to make the global road pointer, `u3R`, a thread-local
variable instead.  This seems perfectly practical, but we haven't
done it because we haven't needed to.

### u3: rules within the interpreter

Within the interpreter, your code can run either in the surface
road or in a deep road.  You can test this by testing

    (&u3H->rod_u == u3R)

ie: does the pier's home road equal the current road pointer?

Normally in this context you assume you're obeying the rules of
running on an inner road, ie, "deep memory."  Remember, however,
that the interpreter *can* run on surface memory - but anything
you can do deep, you can do on the surface.  The converse is by
no means the case.

In deep memory, think of yourself as if in a signal handler.
Your execution context is extremely fragile and may be terminated
without warning or cleanup at any time (for instance, by ^C).

For instance, you can't call `malloc` (or C++ `new`) in your C
code, because you don't have the right to modify data structures
at the global level, and will leave them in an inconsistent state
if your inner road gets terminated.  (Instead, use our drop-in
replacements, `u3a_malloc()`, `u3a_free()`, `u3a_realloc()`.)

A good example is the different meaning of `c3_assert()` inside
and outside the interpreter.  At either layer, you can use
regular assert(), which will just kill your process.  On the
surface, `c3_assert()` will just... kill your process.

In deep execution, `c3_assert()` will issue an exception that
queues an error event, complete with trace stack, on the Arvo
event queue.   Let's see how this happens.

### u3: exceptions

You produce an exception with

    /* u3m_bail(): bail out.  Does not return.
    **
    **  Bail motes:
    **
    **    %exit               ::  semantic failure
    **    %evil               ::  bad crypto
    **    %intr               ::  interrupt
    **    %fail               ::  execution failure
    **    %foul               ::  assert failure
    **    %need               ::  network block
    **    %meme               ::  out of memory
    **    %time               ::  timed out
    **    %oops               ::  assertion failure
    */
      c3_i
      u3m_bail(c3_m how_m);

Broadly speaking, there are two classes of exception: internal
and external.  An external exception begins in a Unix signal
handler.  An internal exception begins with a call to longjmp()
on the main thread.

There are also two kinds of exception: mild and severe.  An
external exception is always severe.  An internal exception is
normally mild, but some (like `c3__oops`, generated by
`c3_assert()`) are severe.

Either way, exceptions come with a stack trace.  The `u3` nock
interpreter is instrumented to retain stack trace hints and
produce them as a printable `(list tank)`.

Mild exceptions are caught by the first virtualization layer and
returned to the caller, following the behavior of the Nock
virtualizer `++mock` (in `hoon.hoon`)

Severe exceptions, or mild exceptions at the surface, terminate
the entire execution stack at any depth and send the cumulative
trace back to the `u3` caller.

For instance, `vere` uses this trace to construct a `%crud`
event, which conveys our trace back toward the Arvo context where
it crashed.  This lets any UI component anywhere, even on a
remote node, render the stacktrace as a consequence of the user's
action - even if its its direct cause was (for instance) a Unix
SIGINT or SIGALRM.

### u3: C structures on the loom

Normally, all data on the loom is nouns.  Sometimes we break this
rule just a little, though - eg, in the `u3h` hashtables.

To point to non-noun C structs on the loom, we use a `u3_post`,
which is just a loom word offset.  A macro lets us declare this
as if it was a pointer:

    typedef c3_w       u3_post;
    #define u3p(type)  u3_post

Some may regard this as clever, others as pointless.  Anyway, use
`u3to()` and `u3of()` to convert to and from pointers.

When using C structs on the loom - generally a bad idea - make
sure anything which could be on the surface road is structurally
portable, eg, won't change size when the pointer size changes.
(Note also: we consider little-endian, rightly or wrongly, to
have won the endian wars.)

## u3: API overview by prefix

Let's run through the `u3` modules one by one.  All public
functions are commented, but the comments may be cryptic.

### u3m: main control

To start `u3`, run

    /* u3m_boot(): start the u3 system.
    */
      void
      u3m_boot(c3_o nuu_o, c3_o bug_o, c3_c* dir_c);

`nuu_o` is `c3y` (yes, `0`) if you're creating a new pier,
`c3n` (no, `1`) if you're loading an existing one.  `bug_o`
is `c3y` if you want to test the garbage-collector, `c3n`
otherwise.  `dir_c` is the directory for the pier files.

`u3m_boot()` expects an `urbit.pill` file to load the kernel
from. This is specified with the -B commandline option.

Any significant computation with nouns, certainly anything Turing
complete, should be run (a) virtualized and (b) in an inner road.
These are slightly different things, but at the highest level we
bundle them together for your convenience, in `u3m_soft()`:

    /* u3m_soft(): system soft wrapper.  unifies unix and nock errors.
    **
    **  Produces [%$ result] or [%error (list tank)].
    */
      u3_noun
      u3m_soft(c3_w sec_w, u3_funk fun_f, u3_noun arg);

`sec_w` is the number of seconds to time out the computation.
`fun_f` is a C function accepting `arg`.

The result of `u3m_soft()` is a cell whose head is an atom.  If
the head is `%$` - ie, `0` - the tail is the result of
`fun_f(arg)`.  Otherwise, the head is a `term` (an atom which is
an LSB first string), and the tail is a `(list tank)` (a list of
`tank` printables - see `++tank` in `hoon.hoon`).  Error terms
should be the same as the exception terms above.

If you're confident that your computation won't fail, you can
use `u3m_soft_sure()`, `u3m_soft_slam()`, or `u3m_soft_nock()`
for C functions, Hoon function calls, and Nock invocations.
Caution - this returns just the result, and asserts globally.

All the `u3m_soft` functions above work *only on the surface*.
Within the surface, virtualize with `u3m_soft_run()`.  Note that
this takes a `fly` (a namespace gate), thus activating the `11`
super-operator in the nock virtualizer, `++mock`.  When actually
using the `fly`, call `u3m_soft_esc()`.  Don't do either unless
you know what you're doing!

For descending into a subroad *without* Nock virtualization,
use `u3m_hate()` and `u3m_love` respectively.  Hating enters
a subroad; loving leaves it, copying out a product noun.

Other miscellaneous tools in `u3m`: `u3m_file()` loads a Unix
file as a Nock atom;  `u3m_water()` measures the boundaries of
the loom in current use (ie, watermarks); and a variety of
prettyprinting routines, none perfect, are available, mainly for
debugging printfs: `u3m_pretty()`, `u3m_p()`, `u3m_tape()` and
`u3m_wall()`.

It's sometimes nice to run a mark-and-sweep garbage collector,
`u3m_grab()`, which collects the world from a list of roots,
and asserts if it finds any leaks or incorrect refcounts.  This
tool is for debugging and long-term maintenance only; refcounts
should never err.

### u3j: jets

The jet system, `u3j`, is what makes `u3` and `nock` in any sense
a useful computing environment.  Except perhaps `u3a` (there is
really no such thing as a trivial allocator, though `u3a` is
dumber than most) - `u3j` is the most interesting code in `u3`.

Let's consider the minor miracle of driver-to-battery binding
which lets `u3j` work - and decrement not be `O(n)` - without
violating the precisely defined semantics of pure Nock, *ever*.

It's easy to assume that jets represent an architectural coupling
between Hoon language semantics and Nock interpreter internals.
Indeed such a coupling would be wholly wrongtious and un-Urbit.
But the jet system is not Hoon-specific.  It is specific to nock
runtime systems that use a design pattern we call a `core`.

#### u3j: core structure

A core is no more than a cell `[code data]`, in which a `code` is
either a Nock formula or a cell of `code`s, and `data` is anything.
In a proper core, the subject each formula expects is the core
itself.

Except for the arbitrary decision to make a core `[code data]`,
(or as we sometimes say, `[battery payload]`), instead of `[data
code]`, any high-level language transforming itself to Nock would
use this design.

So jets are in fact fully general.  Broadly speaking, the jet
system works by matching a C *driver* to a battery.  When the
battery is invoked with Nock operator `9`, it must be found in
associative memory and linked to its driver.  Then we link the
formula axis of the operation (`a` in `[9 a b]`) to a specific
function in the driver.

To validate this jet binding, we need to know two things.  One,
we need to know the C function actually is a perfect semantic
match for the Nock formula.  This can be developed with driver
test flags, which work, and locked down with a secure formula
hash in the driver, which we haven't bothered with just yet.
(You could also try to develop a formal method for verifying
that C functions and Nock formulas are equivalent, but this is
a research problem for the future.)

Two, we need to validate that the payload is appropriate for the
battery.  We should note that jets are a Nock feature and have no
reference to Hoon.  A driver which relies on the Hoon type system
to only pair it with valid payloads is a broken driver, and
breaks the Nock compliance of the system as a whole.  So don't.

Now, a casual observer might look at `[battery payload]` and
expect the simplest case of it to be `[formula subject]`.  That
is: to execute a simple core whose battery is a single formula,
we compute

    nock(+.a -.a)

Then, naturally, when we go from Hoon or a high-level language
containing functions down to Nock, `[function arguments]` turns
into `[formula subject]`.  This seems like an obvious design, and
we mention it only because it is *completely wrong*.

Rather, to execute a one-armed core like the above, we run

    nock(a -.a)

and the normal structure of a `gate`, which is simply Urbitese
for "function," is:

    [formula [sample context]]

where `sample` is Urbitese for "arguments" - and `context`, any
Lisper will at once recognize, is Urbitese for "environment."

To `slam` or call the gate, we simply replace the default sample
with the caller's data, then nock the formula on the entire gate.

What's in the context?  Unlike in most dynamic languages, it is
not some secret system-level bag of tricks.  Almost always it is
another core.  This onion continues until at the bottom, there is
an atomic constant, conventionally is the kernel version number.

Thus a (highly desirable) `static` core is one of the form

    [battery constant]
    [battery static-core]

ie, a solid stack of nested libraries without any dynamic data.
The typical gate will thus be, for example,

    [formula [sample [battery battery battery constant]]]

but we would be most foolish to restrict the jet mechanism to
cores of this particular structure.  We cannot constrain a
payload to be `[sample static-core]`, or even `[sample core]`.
Any such constraint would not be rich enough to handle Hoon,
let alone other languages.

#### u3j: jet state

There are two fundamental rules of computer science: (1) every
system is best understood through its state; (2) less state is
better than more state.  Sadly, a pier has three different jet
state systems: `cold`, `warm` and `hot`.  It needs all of them.

Hot state is associated with this particular Unix process.  The
persistent pier is portable not just between process and process,
but machine and machine or OS and OS.  The set of jets loaded
into a pier may itself change (in theory, though not in the
present implementation) during the lifetime of the process.  Hot
state is a pure C data structure.

Cold state is associated with the logical execution history of
the pier.  It consists entirely of nouns and ignores restarts.

Warm state contains all dependencies between cold and hot
state.  It consists of C structures allocated on the loom.

Warm state is purely a function of cold and hot states, and
we can wipe and regenerate it at any time.  On any restart where
the hot state might have changed, we clear the warm state
with `u3j_ream()`.

There is only one hot state, the global jet dashboard
`u3j_Dash` or `u3D` for short.  In the present implementation,
u3D is a static structure not modified at runtime, except for
numbering itself on process initialization.  This structure -
which embeds function pointers to all the jets - is defined
in `j/tree.c`.  The data structures:

    /* u3j_harm: driver arm.
    */
      typedef struct _u3j_harm {
        c3_c*               fcs_c;            //  `.axe` or name
        u3_noun           (*fun_f)(u3_noun);  //  compute or 0 / semitransfer
        c3_o                ice;              //  perfect (don't test)
        c3_o                tot;              //  total (never punts)
        c3_o                liv;              //  live (enabled)
      } u3j_harm;

    /* u3j_core: C core driver.
    */
      typedef struct _u3j_core {
        c3_c*             cos_c;              //  control string
        struct _u3j_harm* arm_u;              //  blank-terminated static list
        struct _u3j_core* dev_u;              //  blank-terminated static list
        struct _u3j_core* par_u;              //  dynamic parent pointer
        c3_l              jax_l;              //  dynamic jet index
      } u3j_core;

    /* u3e_dash, u3_Dash, u3D: jet dashboard singleton
    */
      typedef struct _u3e_dash {
        u3j_core* dev_u;                      //  null-terminated static list
        c3_l      len_l;                      //  ray_u filled length
        c3_l      all_l;                      //  ray_u allocated length
        u3j_core* ray_u;                      //  dynamic driver array
      } u3j_dash;

Warm and cold state is *per road*.  In other words, as we nest
roads, we also nest jet state.  The jet state in the road is:

      struct {                                //  jet dashboard
        u3p(u3h_root) har_p;                  //  warm state
        u3_noun       das;                    //  cold state
      } jed;

In case you understand Hoon, `das` (cold state) is a `++dash`,
and `har_p` (warm state) is a map from battery to `++calx`:

    ++  bane  ,@tas                                 ::  battery name
    ++  bash  ,@uvH                                 ::  label hash
    ++  bosh  ,@uvH                                 ::  local battery hash
    ++  batt  ,*                                    ::  battery
    ++  calf                                        ::
      $:  jax=,@ud                                  ::  hot core index
          hap=(map ,@ud ,@ud)                       ::  axis/hot arm index
          lab=path                                  ::  label as path
          jit=*                                     ::  arbitrary data
      ==                                            ::
    ++  calx  (trel calf (pair bash cope) club)     ::  cached by battery
    ++  clog  (pair cope (map batt club))           ::  identity record
    ++  club  (pair corp (map term nock))           ::  battery pattern
    ++  cope  (trel bane axis (each bash noun))     ::  core pattern
    ++  core  ,*                                    ::  core
    ++  corp  (each core batt)                      ::  parent or static
    ++  dash  (map bash clog)                       ::  jet system

The driver index `jax` in a `++calx` is an index into `ray_u` in the
dashboard - ie, a pointer into hot state.  This is why the warm
state has to be reset when we reload the pier in a new process.

Why is jet state nested?  Nock of course is a functional system,
so as we compute we don't explicitly create state.  Jet state is
an exception to this principle (which works only because it can't
be semantically detected from Nock/Hoon) - but it can't violate
the fundamental rules of the allocation system.

For instance, when we're on an inner road, we can't allocate on
an outer road, or point from an outer road to an inner.  So if we
learn something - like a mapping from battery to jet - in the
inner road, we have to keep it in the inner road.

Mitigating this problem, when we leave an inner road (with
`u3m_love()`), we call `u3j_reap()` to promote jet information in
the dying road.  Reaping promotes anything we've learned about
any battery that either (a) already existed in the outer road, or
(b) is being saved to the outer road.

#### u3j: jet binding

Jet binding starts with a `%fast` hint.  (In Hoon, this is
produced by the runes `~%`, for the general case, or `~/`
for simple functions.)  To bind a jet, execute a formula of the
form:

    [10 [%fast clue-formula] core-formula]

`core-formula` assembles the core to be jet-propelled.
`clue-formula` produces the hint information, or `++clue`
above, which we want to annotate it with.

A clue is a triple of name, parent, and hooks:

    ++  clue  (trel chum nock (list (pair term nock)))

The name, or `++chum`, has a bunch of historical structure which
we don't need (cleaning these things up is tricky), but just gets
flattened into a term.

The parent axis is a nock formula, but always reduces to a simple
axis, which is the address of this core's *parent*.  Consider
again an ordinary gate

    [formula [sample context]]

Typically the `context` is itself a library core, which itself
has a jet binding.  If so, the parent axis of this gate is `7`.

If the parent is already bound - and the parent *must* be already
bound, in this road or a road containing it - we can hook this core
bottom-up into a tree hierarchy.  Normally the child core is
produced by an arm of the parent core, so this is not a problem -
we wouldn't have the child if we hadn't already made the parent.

The clue also contains a list of *hooks*, named nock formulas on
the core.  Usually these are arms, but they need not be.  The
point is that we often want to call a core from C, in a situation
where we have no type or other source information.  A common case
of this is a complex system in which we're mixing functions which
are jet-propelled with functions that aren't.

In any case, all the information in the `%fast` hint goes to
`u3j_mine()`, which registers the battery in cold state (`das` in
`jed` in `u3R`), then warm state (`har_p` in `jed`).

It's essential to understand that the `%fast` hint has to be,
well, fast - because we apply it whenever we build a core.  For
instance, if the core is a Hoon gate - a function - we will call
`u3j_mine` every time the function is called.

### u3j: the cold jet dashboard

For even more fun, the jet tree is not actually a tree of
batteries.  It's a tree of battery *labels*, where a label is
an [axis term] path from the root of the tree.  (At the root,
if the core pattern is always followed properly, is a core whose
payload is an atomic constant, conventionally the Hoon version.)

Under each of these labels, it's normal to have an arbitrary
number of different Nock batteries (not just multiple copies
of the same noun, a situation we *do* strive to avoid).  For
instance, one might be compiled with debugging hints, one not.

We might even have changed the semantics of the battery without
changing the label - so long as those semantics don't invalidate
any attached driver.

et tree.  For instance, it's normal to have
two equivalent Nock batteries at the same time in one pier: one
battery compiled with debugging hints, one not.

Rather, the jet tree is a semantic hierarchy.  The root of the
hierarchy is a constant, by convention the Hoon kernel version
because any normal jet-propelled core has, at the bottom of its
onion of libraries, the standard kernel.  Thus if the core is

    [foo-battery [bar-battery [moo-battery 164]]]

we can reverse the nesting to construct a hierarchical core
path.  The static core

    164/moo/bar/foo

extends the static core `164/moo/bar` by wrapping the `foo`
battery (ie, in Hoon, `|%`) around it.  With the core above,
you can compute `foo` stuff, `bar` stuff, and `moo` stuff.
Rocket science, not.

Not all cores are static, of course - they may contain live data,
like the sample in a gate (ie, argument to a function).  Once
again, it's important to remember that we track jet bindings not
by the core, which may not be static, but by the battery, which
is always static.

(And if you're wondering how we can use a deep noun like a Nock
formula or battery as a key in a key-value table, remember
`mug_w`, the lazily computed short hash, in all boxed nouns.)

In any case, `das`, the dashboard, is a map from `bash` to jet
location record (`++clog`).  A `clog` in turn contains two kinds
of information: the `++cope`, or per-location noun; and a map of
batteries to a per-battery `++club`.

The `cope` is a triple of `++bane` (battery name, right now just
a `term`); `++axis`, the axis, within *this* core, of the parent;
and `(each bash noun)`, which is either `[0 bash]` if the parent
is another core, or `[1 noun]`, for the constant noun (like
`164`) if there is no parent core.

A `bash` is just the noun hash (`++sham`) of a `cope`, which
uniquely expresses the battery's hierarchical location without
depending on the actual formulas.

The `club` contains a `++corp`, which we use to actually validate
the core.  Obviously jet execution has to be perfectly compatible
with Nock.  We search on the battery, but getting the battery
right is not enough - a typical battery is dependent on its
context.  For example, your jet-propelled library function is
very likely to call `++dec` or other advanced kernel technology.
If you've replaced the kernel in your context with something
else, we need to detect this and not run the jet.

There are two cases for a jet-propelled core - either the entire
core is a static constant, or it isn't.  Hence the definition
of `corp`:

    ++  corp  (each core batt)                ::  parent or static

Ie, a `corp` is `[0 core]` or `[1 batt]`.  If it's static -
meaning that the jet only works with one specific core, ie, the
parent axis of each location in the hierarchy is `3` - we can
validate with a single comparison.  Otherwise, we have to recurse
upward by checking the parent.

Note that there is at present no way to force a jet to depend on
static *data*.

### u3j: the warm jet dashboard

We don't use the cold state to match jets as we call them.  We
use the cold state to register jets as we find them, and also to
rebuild the warm state after the hot state is reset.

What we actually use at runtime is the warm state, `jed->har_p`,
which is a `u3h` (built-in hashtable), allocated on the loom,
from battery to `++calx`.

A `calx` is a triple of a `++calf`, a `[bash cope]` cell, and a
`club`.  The latter two are all straight from cold state.

The `calf` contains warm data dependent on hot state.  It's a
quadruple: of `jax`, the hot driver index (in `ray_u` in
`u3j_dash`); `hap`, a table from arm axis (ie, the axis of each
formula within the battery) to driver arm index (into `arm_u` in
`u3j_core`); `lab`, the complete label path; and  `jit`, any
other dynamic data that may speed up execution.

We construct `hap`, when we create the calx, by iterating through
the arms registered in the `u3j_core`.  Note the way a `u3j_harm`
declares itself, with the string `fcs_c` which can contain either
an axis or a name.  Most jetted cores are of course gates, which
have one formula at one axis within the core: `fcs_c` is `".3"`.

But we do often have fast cores with more complex arm structure,
and it would be sad to have to manage their axes by hand.  To use
an `fcs_c` with a named arm, it's sufficient to make sure the
name is bound to a formula `[0 axis]` in the hook table.

`jit`, as its name suggests, is a stub where any sort of
optimization data computed on battery registration might go.  To
use it, fill in the `_cj_jit()` function.

### u3j: the hot dashboard

Now it should be easy to see how we actually invoke jets.  Every
time we run a nock `9` instruction (pretty often, obviously), we
have a core and an axis.  We pass these to `u3j_kick()`, which
will try to execute them.

Because nouns with a reference count of 1 are precious,
`u3j_kick()` has a tricky reference control definition.  It
reserves the right to return `u3_none` in the case where there is
no driver, or the driver does not apply for this case; in this
case, it retains argument `cor`.  If it succeeds, though, it
transfers `cor`.

`u3j_kick()` searches for the battery (always the head of the
core, of course) in the hot dashboard.  If the battery is
registered, it searches for the axis in `hap` in the `calx`.
If it exists, the core matches a driver and the driver jets this
arm.  If not, we return `u3_none`.

Otherwise, we call `fun_f` in our `u3j_harm`.  This obeys the
same protocol as `u3j_kick()`; it can refuse to function by
returning `u3_none`, or consume the noun.

Besides the actual function pointer `fun_f`, we have some flags
in the `u3j_harm` which tell us how to call the arm function.

If `ice` is yes (`&`, `0`), the jet is known to be perfect and we
can just trust the product of `fun_f`.  Otherwise, we need to run
*both* the Nock arm and `fun_f`, and compare their results.

(Note that while executing the C side of this test, we have to
set `ice` to yes; on the Nock side, we have to set `liv` to no.
Otherwise, many non-exponential functions become exponential.
When auto-testing jets in this way, the principle is that the
test is on the outermost layer of recursion.)

(Note also that anyone who multi-threads this execution
environment has a slight locking problem with these flags if arm
testing is multi-threaded.)

If `tot` is yes, (`&`, `0`), the arm function is *total* and has
to return properly (though it can still return *u3_none*).
Otherwise, it is *partial* and can `u3_cm_bail()` out with
c3__punt.  This feature has a cost: the jet runs in a subroad.

Finally, if `liv` is no (`|`, 1), the jet is off and doesn't run.

It should be easy to see how the tree of cores gets declared -
precisely, in `j/dash.c`.  We declare the hierarchy as a tree
of `u3j_core` structures, each of which comes with a static list
of arms `arm_u` and sub-cores `dev_u`.

In `u3j_boot()`, we traverse the hierarchy, fill in parent
pointers `par_u`, and enumerate all `u3j_core` structures
into a single flat array `u3j_dash.ray_u`.  Our hot state
then appears ready for action.

### u3j: jet functions

At present, all drivers are compiled statically into `u3`.  This is
not a long-term permanent solution or anything.  However, it will
always be the case with a certain amount of core functionality.

For instance, there are some jet functions that we need to call
as part of loading the Arvo kernel - like `++cue` to unpack a
noun from an atom.  And obviously it makes sense, when jets are
significant enough to compile into `u3`, to export their symbols
in headers and the linker.

There are three interface prefixes for standard jet functions:
`u3k`, `u3q`, and `u3w`.  All jets have `u3w` interfaces; most
have `u3q`; some have `u3k`.  Of course the actual logic is
shared.

`u3w` interfaces use the same protocol as `fun_f` above: the
caller passes the entire core, which is retained if the function
returns `u3_none`, transferred otherwise.  Why?   Again, use
counts of 1 are special and precious for performance hackers.

`u3q` interfaces break the core into C arguments, *retain* noun
arguments, and *transfer* noun returns.  `u3k` interfaces are the
same, except with more use of `u3_none` and other simple C
variations on the Hoon original, but *transfer* both arguments
and returns.  Generally, `u3k` are most convenient for new code.

Following `u3k/q/w` is `[a-f]`, corresponding to the 6 logical
tiers of the kernel, or `g` for user-level jets.  Another letter
is added for functions within subcores.  The filename, under
`j/`, follows the tier and the function name.

For instance, `++add` is `u3wa_add(cor)`, `u3qa_add(a, b)`, or
`u3ka_add(a, b)`, in `j/a/add.c`.  `++get` in `++by` is
`u3wdb_get(cor)`, `u3kdb_get(a, b)`, etc, in `j/d/by_get.c`.

For historical reasons, all internal jet code in `j/[a-f]`
*retains* noun arguments, and *transfers* noun results.  Please
do not do this in new `g` jets!  The new standard protocol is to
transfer both arguments and results.

### u3a: allocation functions

`u3a` allocates on the current road (u3R).  Its internal
structures are uninteresting and typical of a naive allocator.

The two most-used `u3a` functions are `u3a_gain()` to add a
reference count,  and `u3a_lose()` to release one (and free the
noun, if the use count is zero).  For convenience, `u3a_gain()`
returns its argument.  The pair are generally abbreviated with
the macros `u3k()` and `u3z()` respectively.

Normally we create nouns through `u3i` functions, and don't call
the `u3a` allocators directly.  But if you do:

One, there are *two* sets of allocators: the word-aligned
allocators and the fully-aligned (ie, malloc compatible)
allocators.  For instance, on a typical OS X setup, malloc
produces 16-byte aligned results - needed for some SSE
instructions.

These allocators are *not compatible*.  For 32-bit alignment
as used in nouns, call

    /* u3a_walloc(): allocate storage measured in words.
    */
      void*
      u3a_walloc(c3_w len_w);

    /* u3a_wfree(): free storage.
    */
      void
      u3a_wfree(void* lag_v);

    /* u3a_wealloc(): word realloc.
    */
      void*
      u3a_wealloc(void* lag_v, c3_w len_w);

For full alignment, call:

    /* u3a_malloc(): aligned storage measured in bytes.
    */
      void*
      u3a_malloc(size_t len_i);

    /* u3a_realloc(): aligned realloc in bytes.
    */
      void*
      u3a_realloc(void* lag_v, size_t len_i);

    /* u3a_realloc2(): gmp-shaped realloc.
    */
      void*
      u3a_realloc2(void* lag_v, size_t old_i, size_t new_i);

    /* u3a_free(): free for aligned malloc.
    */
      void
      u3a_free(void* tox_v);

    /* u3a_free2(): gmp-shaped free.
    */
      void
      u3a_free2(void* tox_v, size_t siz_i);

There are also a set of special-purpose allocators for building
atoms.  When building atoms, please remember that it's incorrect
to have a high 0 word - the word length in the atom structure
must be strictly correct.

Of course, we don't always know how large our atom will be.
Therefore, the standard way of building large atoms is to
allocate a block of raw space with `u3a_slab()`, then chop off
the end with `u3a_malt()` (which does the measuring itself)
or `u3a_mint()` in case you've measured it yourself.

Once again, *do not call `malloc()`* (or C++ `new`) within any
code that may be run within a jet.  This will cause rare sporadic
corruption when we interrupt execution within a `malloc()`.  We'd
just override the symbol, but `libuv` uses `malloc()` across
threads within its own synchronization primitives - for this to
work with `u3a_malloc()`, we'd have to introduce our own locks on
the surface-level road (which might be a viable solution).

### u3n: nock execution

The `u3n` routines execute Nock itself.  On the inside, they have
a surprising resemblance to the spec proper (the only interesting
detail is how we handle tail-call elimination) and are, as one
would expect, quite slow.  (There is no such thing as a fast tree
interpreter.)

There is only one Nock, but there are lots of ways to call it.
(Remember that all `u3n` functions *transfer* C arguments and
returns.)

The simplest interpreter, `u3n_nock_on(u3_noun bus, u3_noun fol)`
invokes Nock on `bus` (the subject) and `fol` (the formula).
(Why is it`[subject formula]`, not `[formula subject]`?  The same
reason `0` is true and `1` is false.)

A close relative is `u3n_slam_on(u3_noun gat, u3_noun sam)`,
which slams a *gate* (`gat`) on a sample (`sam`).  (In a normal
programming language which didn't talk funny and was retarded,
`u3n_slam_on()` would call a function on an argument.)  We could
write it most simply as:

    u3_noun
    u3n_slam_on(u3_noun gat, u3_noun sam)
    {
      u3_noun pro = u3n_nock_on
                      (u3nc(u3k(u3h(gat)),
                            u3nc(sam, u3k(u3t(u3t(gat))))),
                       u3k(u3h(gat)));
      u3z(gat);
      return pro;
    }

Simpler is `u3n_kick_on(u3_noun gat)`, which slams a gate (or,
more generally, a *trap* - because sample structure is not even
needed here) without changing its sample:

    u3_noun
    u3n_kick_on(u3_noun gat, u3_noun sam)
    {
      return u3n_nock_on(gat, u3k(u3h(gat)));
    }

The `_on` functions in `u3n` are all defined as pure Nock.  But
actually, even though we say we don't extend Nock, we do.  But we
don't.  But we do.

Note that `u3` has a well-developed error handling system -
`u3m_bail()` to throw an exception, `u3m_soft_*` to catch one.
But Nock has no exception model at all.  That's okay - all it
means if that if an `_on` function bails, the exception is an
exception in the caller.

However, `u3`'s exception handling happens to match a convenient
virtual super-Nock in `hoon.hoon`, the infamous `++mock`.  Of
course, Nock is slow, and `mock` is Nock in Nock, so it is
(logically) super-slow.  Then again, so is decrement.

With the power of `u3`, we nest arbitrary layers of `mock`
without any particular performance cost.  Moreover, we simply
treat Nock proper as a special case of `mock`.  (More precisely,
the internal VM loop is `++mink` and the error compiler is
`++mook`.  But we call the whole sandbox system `mock`.)

The nice thing about `mock` functions is that (by executing
within `u3m_soft_run()`, which as you may recall uses a nested
road) they provide both exceptions and the namespace operator -
`.^` in Hoon, which becomes operator `11` in `mock`.

`11` requires a namespace function, or `fly`, which produces a
`++unit` - `~` (`0`) for no binding, or `[0 value]`.  The sample
to a `fly` is a `++path`, just a list of text `span`.

`mock` functions produce a `++toon`.  Fully elaborated:

    ++  noun  ,*                                      ::  any noun
    ++  path  (list ,@ta)                             ::  namespace path
    ++  span  ,@ta                                    ::  text-atom (ASCII)
    ++  toon  $%  [%0 p=noun]                         ::  success
                  [%1 p=(list path)]                  ::  blocking paths
                  [%2 p=(list tank)]                  ::  stack trace
              ==                                      ::
    ++  tank                                          ::  printable
              $%  [%leaf p=tape]                      ::  flat text
                  $:  %palm                           ::  backstep list
                      p=[p=tape q=tape r=tape s=tape] ::  mid cap open close
                      q=(list tank)                   ::  contents
                  ==                                  ::
                  $:  %rose                           ::  straight list
                      p=[p=tape q=tape r=tape]        ::  mid open close
                      q=(list tank)                   ::  contents
                  ==                                  ::
              ==

(Note that `tank` is overdesigned and due for replacement.)

What does a `toon` mean?  Either your computation succeded (`[0
noun]`, or could not finish because it blocked on one or more
global paths (`[1 (list path)]`), or it exited with a stack trace
(`[2 (list tank)]`).

Note that of all the `u3` exceptions, only `%exit` is produced
deterministically by the Nock definition.  Therefore, only
`%exit` produces a `2` result.  Any other argument to
`u3m_bail()` will unwind the virtualization stack all the way to
the top - or to be more exact, to `u3m_soft_top()`.

In any case, the simplest `mock` functions are `u3n_nock_un()`
and `u3n_slam_un()`.  These provide exception control without
any namespace change, as you can see by the code:

    /* u3n_nock_un(): produce .*(bus fol), as ++toon.
    */
    u3_noun
    u3n_nock_un(u3_noun bus, u3_noun fol)
    {
      u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0);  //  |=(a=* .^(a))

      return u3n_nock_in(fly, bus, fol);
    }

    /* u3n_slam_un(): produce (gat sam), as ++toon.
    */
    u3_noun
    u3n_slam_un(u3_noun gat, u3_noun sam)
    {
      u3_noun fly = u3nt(u3nt(11, 0, 6), 0, 0);  //  |=(a=* .^(a))

      return u3n_slam_in(fly, gat, sam);
    }

The `fly` is added as the first argument to `u3n_nock_in()` and
`u3n_slam_in()`.  Of course, logically, `fly` executes in the
caller's exception layer.  (Maintaining this illusion is slightly
nontrivial.)  Finally, `u3n_nock_an()` is a sandbox with a null
namespace.

### u3e: persistence

The only `u3e` function you should need to call is `u3e_save()`,
which saves the loom.  As it can be restored on any platform,
please make sure you don't have any state in the loom that is
bound to your process or architecture - except for exceptions
like the warm jet state, which is actively purged on reboot.

### u3r: reading nouns (weak)

As befits accessors they don't make anything, `u3r` noun reading
functions always retain their arguments and their returns.  They
never bail; rather, when they don't work, they return a `u3_weak`
result.

Most of these functions are straightforward and do only what
their comments say.  A few are interesting enough to discuss.

`u3r_at()` is the familiar tree fragment function, `/` from the
Nock spec.  For taking complex nouns apart, `u3r_mean()` is a
relatively funky way of deconstructing nouns with a varargs list
of `axis`, `u3_noun *`.  For cells, triples, etc, decompose with
`u3r_cell()`, `u3r_trel()`, etc.  For the tagged equivalents, use
`u3r_pq()` and friends.

`u3r_sing(u3_noun a, u3_noun b)` (true if `a` and `b` are a
*single* noun) are interesting because it uses mugs to help it
out.  Clearly, different nouns may have the same mug, but the
same nouns cannot have a different mug.  It's important to
understand the performance characteristics of `u3r_sing()`:
the worst possible case is a comparison of duplicate nouns,
which have the same value but were created separately.  In this
case, the tree is traversed

`u3r_sung()` is a deeply funky and frightening version of
`u3r_sing()` that unifies pointers to the duplicate nouns it
finds, freeing the second copy.  Obviously, do not use
`u3r_sung()` when you have live, but not reference counted, noun
references from C - if they match a noun with a refcount of 1
that gets freed, bad things happen.

It's important to remember that `u3r_mug()`, which produces a
31-bit, nonzero insecure hash, uses the `mug_w` slot in any boxed
noun as a lazy cache.  There are a number of variants of
`u3r_mug()` that can get you out of building unneeded nouns.

### u3x: reading nouns (bail)

`u3x` functions are like `u3r` functions, but instead of
returning `u3_none` when (for instance) we try to take the head
of an atom, they bail with `%exit`.  In other words, they do what
the same operation would do in Nock.

### u3h: hash tables.

We can of course use the Hoon `map` structure as an associative
array.  This is a balanced treap and reasonably fast.  However,
it's considerably inferior to a custom structure like an HAMT
(hash array-mapped trie).  We use `u3_post` to allocate HAMT
structures on the loom.

(Our HAMT implements the classic Bagwell algorithm which depends
on the `gcc` standard directive `__builtin_popcount()`.  On a CPU
which doesn't support popcount or an equivalent instruction, some
other design would probably be preferable.)

There's no particular rocket science in the API. `u3h_new()`
creates a hashtable; `u3h_free()` destroys one; `u3h_put()`
inserts, `u3h_get()` retrieves.  You can transform values in a
hashtable with `u3h_walk()`.

The only funky function is `u3h_gut()`, which unifies keys with
`u3r_sung()`.  As with all cases of `u3r_sung()`, this must be
used with extreme caution.

### u3z: memoization

Connected to the `~+` rune in Hoon, via the Nock `%memo` hint,
the memoization facility is a general-purpose cache.

(It's also used for partial memoization - a feature that'll
probably be removed, in which conservative worklist algorithms
(which would otherwise be exponential) memoize everything in the
subject *except* the worklist.  This is used heavily in the Hoon
compiler jets (j/f/*.c).  Unfortunately, it's probably not
possible to make this work perfectly in that it can't be abused
to violate Nock, so we'll probably remove it at a later date,
instead making `++ut` keep its own monadic cache.)

Each `u3z` function comes with a `c3_m` mote which disambiguates
the function mapping key to value.  For Nock itself, use 0.  For
extra speed, small tuples are split out in C; thus, find with

    u3_weak u3z_find(c3_m, u3_noun);
    u3_weak u3z_find_2(c3_m, u3_noun, u3_noun);
    u3_weak u3z_find_3(c3_m, u3_noun, u3_noun, u3_noun);
    u3_weak u3z_find_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun);

and save with

    u3_noun u3z_save(c3_m, u3_noun, u3_noun);
    u3_noun u3z_save_2(c3_m, u3_noun, u3_noun, u3_noun);
    u3_noun u3z_save_3(c3_m, u3_noun, u3_noun, u3_noun, u3_noun);
    u3_noun u3z_save_4(c3_m, u3_noun, u3_noun, u3_noun, u3_noun, u3_noun);

where the value is the last argument.  To eliminate duplicate
nouns, there is also

    u3_noun
    u3z_uniq(u3_noun);

`u3z` functions retain keys and transfer values.

The `u3z` cache, built on `u3h` hashes, is part of the current
road, and goes away when it goes away.  (In future, we may wish
to promote keys/values which outlive the road, as we do with jet
state.)  There is no cache reclamation at present, so be careful.

### u3t: tracing and profiling.

TBD.

### u3v: the Arvo kernel

An Arvo kernel - or at least, a core that compiles with the Arvo
interface - is part of the global `u3` state.  What is an Arvo
core?  Slightly pseudocoded:

    ++  arvo
      |%
      ++  come  |=  [yen=@ ova=(list ovum) nyf=pone]  ::  11
                ^-  [(list ovum) _+>]
                !!
      ++  keep  |=  [now=@da hap=path]                ::  4
                ^-  (unit ,@da)
                !!
      ++  load  |=  [yen=@ ova=(list ovum) nyf=pane]  ::  86
                ^-  [(list ovum) _+>]
                !!
      ++  peek  |=  [now=@da path]                    ::  87
                ^-  (unit)
                !!
      ++  poke  |=  [now=@da ovo=ovum]                ::  42
                ^-  [(list ovum) _+>]
                !!
      ++  wish  |=  txt=@ta                           ::  20
                ^-  *
                !!
      --
    ++  card  ,[p=@tas q=*]                           ::  typeless card
    ++  ovum  ,[p=wire q=card]                        ::  Arvo event
    ++  wire  path                                    ::  event cause

This is the Arvo ABI in a very real sense.  Arvo is a core with
these six arms.  To use these arms, we hardcode the axis of the
formula (`11`, `4`, `86`, etc) into the C code that calls Arvo,
because otherwise we'd need type metadata - which we can get, by
calling Arvo.

It's important to understand the Arvo event/action structure, or
`++ovum`.  An `ovum` is a `card`, which is any `[term noun]`
cell, and a `++wire`, a `path` which indicates the location of
the event.  At the Unix level, the `wire` corresponds to a system
module or context.  For input events, this is the module that
caused the event; for output actions, it's the module that
performs the action.

`++poke` sends Arvo an event `ovum`, producing a cell of action
ova and a new Arvo core.

`++peek` dereferences the Arvo namespace.  It takes a date and a
key, and produces `~` (`0`) or `[~ value]`.

`++keep` asks Arvo the next time it wants to be woken up, for the
given `wire`.  (This input will probably be eliminated in favor
of a single global timer.)

`++wish` compiles a string of Hoon source.  While just a
convenience, it's a very convenient convenience.

`++come` and `++load` are used by Arvo to reset itself (more
precisely, to shift the Arvo state from an old kernel to a new
one); there is no need to call them from C.

Now that we understand the Arvo kernel interface, let's look at
the `u3v` API.  As usual, all the functions in `u3v` are
commented, but unfortunately it's hard to describe this API as
clean at present.  The problem is that `u3v` remains design
coupled to the old `vere` event handling code written for `u2`.
But let's describe the functions you should be calling, assuming
you're not writing the next event system.  There are only two.

`u3v_wish(str_c)` wraps the `++wish` functionality in a cache
(which is read-only unless you're on the surface road).

`u3v_do()` uses `wish` to provide a convenient interface for
calling Hoon kernel functions by name.  Even more conveniently,
we tend to call `u3v_do()` with these convenient aliases:

    #define  u3do(txt_c, arg)         u3v_do(txt_c, arg)
    #define  u3dc(txt_c, a, b)        u3v_do(txt_c, u3nc(a, b))
    #define  u3dt(txt_c, a, b, c)     u3v_do(txt_c, u3nt(a, b, c))
    #define  u3dq(txt_c, a, b, c, d)  u3v_do(txt_c, u3nt(a, b, c, d))