43 KiB
u3: noun processing in C.
u3
is the C library that makes Urbit work. If it wasn't called
u3
, it might be called libnoun
- it's a library for making
and storing nouns.
What's a noun? A noun is either a cell or an atom. A cell is an ordered pair of any two nouns. An atom is an unsigned integer of any size.
To the C programmer, this is not a terribly complicated data structure, so why do you need a library for it?
One: nouns have a well-defined computation kernel, Nock, whose spec fits on a page and gzips to 340 bytes. But the only arithmetic operation in Nock is increment. So it's nontrivial to compute both efficiently and correctly.
Two: u3
is designed to support "permanent computing," ie, a
single-level store which is transparently snapshotted. This
implies a specialized memory-management model, etc, etc.
(Does u3
depend on the higher levels of Urbit, Arvo and Hoon?
Yes and no. u3
expects you to load something shaped like an
Arvo kernel, and use it as an event-processing function. But you
don't need to use this feature if you don't want, and your kernel
doesn't have to be Arvo proper - just Arvo-compatible. Think of
u3
as the BIOS and Arvo as the boot kernel. And there are no
dependencies at all between Hoon the language and u3
.)
c3: C in Urbit
Under u3
is the simple c3
layer, which is just how we write C
in Urbit.
When writing C in u3, please of course follow the conventions of the code around you as regards indentation, etc. It's especially important that every function have a header comment, even if it says nothing interesting.
But some of our idiosyncrasies go beyond convention. Yes, we've done awful things to C. Here's what we did and why we did.
c3: integer types
First, it's generally acknowledged that underspecified integer
types are C's worst disaster. C99 fixed this, but the stdint
types are wordy and annoying. We've replaced them with:
/* Good integers.
*/
typedef uint64_t c3_d; // double-word
typedef int64_t c3_ds; // signed double-word
typedef uint32_t c3_w; // word
typedef int32_t c3_ws; // signed word
typedef uint16_t c3_s; // short
typedef int16_t c3_ss; // signed short
typedef uint8_t c3_y; // byte
typedef int8_t c3_ys; // signed byte
typedef uint8_t c3_b; // bit
typedef uint8_t c3_t; // boolean
typedef uint8_t c3_o; // loobean
typedef uint8_t c3_g; // 32-bit log - 0-31 bits
typedef uint32_t c3_l; // little; 31-bit unsigned integer
typedef uint32_t c3_m; // mote; also c3_l; LSB first a-z 4-char string.
/* Bad integers.
*/
typedef char c3_c; // does not match int8_t or uint8_t
typedef int c3_i; // int - really bad
typedef uintptr_t c3_p; // pointer-length uint - really really bad
typedef intptr_t c3_ps; // pointer-length int - really really bad
Some of these need explanation. A loobean is a Nock boolean - Nock, for mysterious reasons, uses 0 as true (always say "yes") and 1 as false (always say "no").
Nock and/or Hoon cannot tell the difference between a short atom
and a long one, but at the u3
level every atom under 2^31
is
direct. The c3_l
type is useful to annotate this. A c3_m
is
a mote - a string of up to 4 characters in a c3_l
, least
significant byte first. A c3_g
should be a 5-bit atom. Of
course, C cannot enforce these constraints, only document them.
Use the "bad" - ie, poorly specified - integer types only when interfacing with external code that expects them.
An enormous number of motes are defined in i/c/motes.h
. There
is no reason to delete motes that aren't being used, or even to
modularize the definitions. Keep them alphabetical, though.
c3: variables and variable naming
The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax. This is weird, but works really well, as long as what you're doing isn't hideous.
A TLV variable name is a random pronounceable three-letter string, sometimes with some vague relationship to its meaning, but usually not. Usually CVC (consonant-vowel-consonant) is a good choice.
You should use TLVs much the way math people use Greek letters. The same concept should in general get the same name across different contexts. When you're working in a given area, you'll tend to remember the binding from TLV to concept by sheer power of associative memory. When you come back to it, it's not that hard to relearn. And of course, when in doubt, comment it.
Variables take pseudo-Hungarian suffixes, matching in general the suffix of the integer type:
c3_w wor_w; // 32-bit word
Unlike in standard Hungarian, there is no change for pointer
variables. C structure variables take a _u
suffix.
c3: loobeans
The code (from defs.h
) tells the story:
# define c3y 0
# define c3n 1
# define _(x) (c3y == (x))
# define __(x) ((x) ? c3y : c3n)
# define c3a(x, y) __(_(x) && _(y))
# define c3o(x, y) __(_(x) || _(y))
In short, use _()
to turn a loobean into a boolean, __
to go
the other way. Use !
as usual, c3y
for yes and c3n
for no,
c3a
for and and c3o
for or.
u3: land of nouns
The division between c3
and u3
is that you could theoretically
imagine using c3
as just a generic C environment. Anything to do
with nouns is in u3
.
u3: a map of the system
There are two kinds of symbols in u3
: regular and irregular.
Regular symbols follow this pattern:
prefix purpose .h .c
-------------------------------------------------------
u3a_ allocation i/n/a.h n/a.c
u3e_ persistence i/n/e.h n/e.c
u3h_ hashtables i/n/h.h n/h.c
u3i_ noun construction i/n/i.h n/i.c
u3j_ jet control i/n/j.h n/j.c
u3m_ system management i/n/m.h n/m.c
u3n_ nock computation i/n/n.h n/n.c
u3r_ noun access (error returns) i/n/r.h n/r.c
u3t_ profiling i/n/t.h n/t.c
u3v_ arvo i/n/v.h n/v.c
u3x_ noun access (error crashes) i/n/x.h n/x.c
u3z_ memoization i/n/z.h n/z.c
u3k[a-g] jets (transfer, C args) i/j/k.h j/[a-g]/*.c
u3q[a-g] jets (retain, C args) i/j/q.h j/[a-g]/*.c
u3w[a-g] jets (retain, nock core) i/j/w.h j/[a-g]/*.c
Irregular symbols always start with u3
and obey no other rules.
They're defined in i/n/u.h
. Finally, i/all.h
includes all
these headers (fast compilers, yay) and is all you need to
program in u3
.
u3: noun internals
A noun is a u3_noun
- currently defined as a 32-bit c3_w
.
If your u3_noun
is less than (1 << 31)
, it's a direct atom.
Every unsigned integer between 0
and 0x7fffffff
inclusive is
its own noun.
If bit 31
is set in a u3_noun
, bit 30
is always set - this
bit is reserved. Bit 29
is 1
if the noun is a cell, 0
if
it's an atom. Bits 28
through 0
are a word pointer into the
loom - see below. The structures are:
typedef struct {
c3_w mug_w;
c3_w len_w;
c3_w buf_w[0]; // actually [len_w]
} u3a_atom;
typedef struct {
c3_w mug_w;
u3_noun hed;
u3_noun tel;
} u3a_cell;
The only thing that should be mysterious here is mug_w
, which
is a 31-bit lazily computed nonzero short hash (FNV currently,
soon Murmur3). If mug_w
is 0, the hash is not yet computed.
We also hijack this field for various hacks, such as saving the
new address of a noun when copying over.
Also, the value 0xffffffff
is u3_none
, which is never a valid
noun. Use the type u3_weak
to express that a noun variable may
be u3_none
.
u3: reference counts
The only really essential thing you need to know about u3
is
how to handle reference counts. Everything else, you can skip
and just get to work.
u3 deals with reference-counted, immutable, acyclic nouns. Unfortunately, we are not Apple and can't build reference counting into your C compiler, so you need to count by hand.
Every allocated noun (or any allocation object, because our
allocator is general-purpose) contains a counter which counts the
number of references to it - typically variables with type
u3_noun
. When this counter goes to 0, the noun is freed.
To tell u3
that you've added a reference to a noun, call the
function u3a_gain()
or its shorthand u3k()
. (For your
convenience, this function returns its argument.) To tell u3
that you've destroyed a reference, call u3a_lose()
or u3z()
.
(If you screw up by decrementing the counter too much, u3
will
dump core in horrible ways. If you screw up by incrementing it
too much, u3
will leak memory. To check for memory leaks,
set the bug_o
flag in u3e_boot()
- eg, run vere
with -g
.
Memory leaks are difficult to debug - the best way to handle
leaks is just to revert to a version that didn't have them, and
look over your code again.)
(You can gain or lose a direct atom. It does nothing.)
u3: reference protocols
THIS IS THE MOST CRITICAL SECTION IN THE u3
DOCUMENTATION.
The key question when calling a C function in a refcounted world is what the function will do to the noun refcounts - and, if the function returns a noun, what it does to the return.
There are two semantic patterns, transfer
and retain
. In
transfer
semantics, the caller "gives" a use count to the
callee, which "gives back" any return. For instance, if I have
{
u3_noun foo = u3i_string("foobar");
u3_noun bar;
bar = u3f_futz(foo);
[...]
u3z(bar);
}
Suppose u3f_futz()
has transfer
semantics. At [...]
, my
code holds one reference to bar
and zero references to foo
-
which has been freed, unless it's part of bar
. My code now
owns bar
and gets to work with it until it's done, at which
point a u3z()
is required.
On the other hand, if u3f_futz()
has retain
semantics, we
need to write
{
u3_noun foo = u3i_string("foobar");
u3_noun bar;
bar = u3f_futz(foo);
[...]
u3z(foo);
}
because calling u3f_futz()
does not release our ownership of
foo
, which we have to free ourselves.
But if we free bar
, we are making a great mistake, because our
reference to it is not in any way registered in the memory
manager (which cannot track references in C variables, of
course). It is normal and healthy to have these uncounted
C references, but they must be treated with care.
The bottom line is that it's essential for the caller to know the refcount semantics of any function which takes or returns a noun. (In some unusual circumstances, different arguments or returns in one function may be handled differently.)
Broadly speaking, as a design question, retain semantics are more
appropriate for functions which inspect or query nouns. For
instance, u3h()
(which takes the head of a noun) retains, so
that we can traverse a noun tree without constantly incrementing
and decrementing.
Transfer semantics are more appropriate for functions which make nouns, which is obviously what most functions do.
In general, though, in most places it's not worth thinking about what your function does. There is a convention for it, which depends on where it is, not what it is. Follow the convention.
u3: reference conventions
The u3
convention is that, unless otherwise specified, all
functions have transfer semantics - with the exception of the
prefixes: u3r
, u3x
, u3z
, u3q
and u3w
. Also, within
jet directories a
through f
(but not g
), internal functions
retain (for historical reasons).
If functions outside this set have retain semantics, they need to
be commented, both in the .h
and .c
file, with RETAIN
in
all caps. Yes, it's this important.
u3: system architecture
If you just want to tinker with some existing code, it might be
enough to understand the above. If not, it's probably worth
taking the time to look at u3
as a whole.
u3
is designed to work as a persistent event processor.
Logically, it computes a function of the form
f(event, old state) -> (actions, new state)
Obviously almost any computing model - including, but not limited to, Urbit - can be defined in this form. To create the illusion of a computer that never loses state and never fails, we:
- log every event externally before it goes into u3
- keep a single reference to a permanent state noun.
- can abort any event without damaging the permanent state.
- snapshot the permanent state periodically, and/or prune logs.
u3: the road model
u3
uses a memory design which I'm sure someone has invented
somewhere before, because it's not very clever, but I've never
seen it anywhere in particular.
Every allocation starts with a solid block of memory, which u3
calls the loom
. How do we allocate on the loom? You're
probably familiar with the Unix heap-stack design, in which the
stack grows downward and the heap (malloc arena) grows upward:
0 brk ffff
| heap | stack |
|------------#################################+++++++++++++|
| | |
0 sp ffff
A road is a normal heap-stack system, except that the heap and stack can point in either direction. Therefore, inside a road, we can nest another road in the opposite direction.
When the opposite road completes, its heap is left on top of the opposite heap's stack. It's no more than the normal behavior of a stack machine for all subcomputations to push their results on the stack.
The performance tradeoff of "leaping" - reversing directions in the road - is that if the outer computation wants to preserve the results of the inner one, not just use them for temporary purposes, it has to copy them.
This is a trivial cost in some cases, a prohibitive cost in others. The upside, of course, is that all garbage accrued in the inner computation is discarded at zero cost.
The goal of the road system is the ability to layer memory models. If you are allocating on a road, you have no idea how deep within a nested road system you are - in other words, you have no idea exactly how durable your result may be. But free space is never fragmented within a road.
Roads do not reduce the generality or performance of a memory system, since even the most complex GC system can be nested within a road at no particular loss of performance - a road is just a block of memory.
Each road (u3a_road
to be exact) uses four pointers: rut
is
the bottom of the arena, hat
the top of the arena, mat
the
bottom of the stack, cap
the top of the stack. (Bear in mind
that the road "stack" is not actually used as the C function-call
stack, though it probably should be.)
A "north" road has the stack high and the heap low:
0 rut hat ffff
| | | |
|~~~~~~~~~~~~-------##########################+++++++$~~~~~|
| | | |
0 cap mat ffff
A "south" road is the other way around:
0 mat cap ffff
| | | |
|~~~~~~~~~~~~$++++++##########################--------~~~~~|
| | | |
0 hat rut ffff
Legend: -
is durable storage (heap); +
is temporary storage
(stack); ~
is deep storage (immutable); $
is the allocation
frame #
is free memory.
Pointer restrictions: pointers stored in +
can point anywhere.
Pointers in -
can only point to -
or ~
; pointers in ~
only point to ~
.
To "leap" is to create a new inner road in the ###
free space.
but in the reverse direction, so that when the inner road
"falls" (terminates), its durable storage is left on the
temporary storage of the outer road.
u3
keeps a global variable, u3_Road
or its alias u3R
, which
points to the current road. (If we ever run threads in inner
roads - see below - this will become a thread-local variable.)
Relative to u3R
, +
memory is called junior
memory; -
memory is normal
memory; ~
is senior
memory.
u3: explaining the road model
But... why?
We're now ready to understand why the road system works so logically with the event and persistence model.
The key is that we don't update refcounts in senior memory.
A pointer from an inner road to an outer road is not counted.
Also, the outmost, or surface
road, is the only part of the
image that gets checkpointed.
So the surface road contains the entire durable state of u3
.
When we process an event, or perform any kind of complicated or
interesting calculation, we process it in an inner road. If
its results are saved, they need to be copied.
Since processing in an inner road does not touch surface memory, (a) we can leave the surface road in a read-only state and not mark its pages dirty; (b) we can abort an inner calculation without screwing up the surface; and (c) because inner results are copied onto the surface, the surface doesn't get fragmented.
All of (a), (b) and (c) are needed for checkpointing to be easy. It might be tractable otherwise, but easy is even better.
Moreover, while the surface is most definitely single-threaded, we could easily run multiple threads in multiple inner roads (as long as the threads don't have pointers into each others' memory, which they obviously shouldn't).
Moreover, in future, we'll experiment more with adding road
control hints to the programmer's toolbox. Reference counting is
expensive. We hypothesize that in many - if not most - cases,
the programmer can identify procedural structures whose garbage
should be discarded in one step by copying the results. Then,
within the procedure, we can switch the allocator into sand
mode, and stop tracking references at all.
u3: rules for C programming
There are two levels at which we program in C: (1) above the interpreter; (2) within the interpreter or jets. These have separate rules which need to be respected.
u3: rules above the interpreter
In its relations with Unix, Urbit follows a strict rule of "call me, I won't call you." We do of course call Unix system calls, but only for the purpose of actually computing.
Above Urbit, you are in a normal C/Unix programming environment
and can call anything in or out of Urbit. Note that when using
u3
, you're always on the surface road, which is not thread-safe
by default. Generally speaking, u3
is designed to support
event-oriented, single-threaded programming.
If you need threads which create nouns, you could use
u3m_hate()
and u3m_love()
to run these threads in subroads.
You'd need to make the global road pointer, u3R
, a thread-local
variable instead. This seems perfectly practical, but we haven't
done it because we haven't needed to.
u3: rules within the interpreter
Within the interpreter, your code can run either in the surface road or in a deep road. You can test this by testing
(&u3H->rod_u == u3R)
ie: does the pier's home road equal the current road pointer?
Normally in this context you assume you're obeying the rules of running on an inner road, ie, "deep memory." Remember, however, that the interpreter can run on surface memory - but anything you can do deep, you can do on the surface. The converse is by no means the case.
In deep memory, think of yourself as if in a signal handler. Your execution context is extremely fragile and may be terminated without warning or cleanup at any time (for instance, by ^C).
For instance, you can't call malloc
(or C++ new
) in your C
code, because you don't have the right to modify data structures
at the global level, and will leave them in an inconsistent state
if your inner road gets terminated. (Instead, use our drop-in
replacements, u3a_malloc()
, u3a_free()
, u3a_realloc()
.)
A good example is the different meaning of c3_assert()
inside
and outside the interpreter. At either layer, you can use
regular assert(), which will just kill your process. On the
surface, c3_assert()
will just... kill your process.
In deep execution, c3_assert()
will issue an exception that
queues an error event, complete with trace stack, on the Arvo
event queue. Let's see how this happens.
u3: exceptions
You produce an exception with
/* u3m_bail(): bail out. Does not return.
**
** Bail motes:
**
** %exit :: semantic failure
** %evil :: bad crypto
** %intr :: interrupt
** %fail :: execution failure
** %foul :: assert failure
** %need :: network block
** %meme :: out of memory
** %time :: timed out
** %oops :: assertion failure
*/
c3_i
u3m_bail(c3_m how_m);
Broadly speaking, there are two classes of exception: internal and external. An external exception begins in a Unix signal handler. An internal exception begins with a call to longjmp() on the main thread.
There are also two kinds of exception: mild and severe. An
external exception is always severe. An internal exception is
normally mild, but some (like c3__oops
, generated by
c3_assert()
) are severe.
Either way, exceptions come with a stack trace. The u3
nock
interpreter is instrumented to retain stack trace hints and
produce them as a printable (list tank)
.
Mild exceptions are caught by the first virtualization layer and
returned to the caller, following the behavior of the Nock
virtualizer ++mock
(in hoon.hoon
)
Severe exceptions, or mild exceptions at the surface, terminate
the entire execution stack at any depth and send the cumulative
trace back to the u3
caller.
For instance, vere
uses this trace to construct a %crud
event, which conveys our trace back toward the Arvo context where
it crashed. This lets any UI component anywhere, even on a
remote node, render the stacktrace as a consequence of the user's
action - even if its its direct cause was (for instance) a Unix
SIGINT or SIGALRM.
u3: API overview by prefix
Let's run through the u3
modules one by one. All public
functions are commented, but the comments may be cryptic.
u3m: main control
To start u3
, run
/* u3m_boot(): start the u3 system.
*/
void
u3m_boot(c3_o nuu_o, c3_o bug_o, c3_c* dir_c);
nuu_o
is c3y
(yes, 0
) if you're creating a new pier,
c3n
(no, 1
) if you're loading an existing one. bug_o
is c3y
if you want to test the garbage-collector, c3n
otherwise. dir_c
is the directory for the pier files.
u3m_boot()
expects an urbit.pill
file to load the kernel
from. It will try first $dir/.urb.urbit.pill
, then U3_LIB
.
Any significant computation with nouns, certainly anything Turing
complete, should be run (a) virtualized and (b) in an inner road.
These are slightly different things, but at the highest level we
bundle them together for your convenience, in u3m_soft()
:
/* u3m_soft(): system soft wrapper. unifies unix and nock errors.
**
** Produces [%$ result] or [%error (list tank)].
*/
u3_noun
u3m_soft(c3_w sec_w, u3_funk fun_f, u3_noun arg);
sec_w
is the number of seconds to time out the computation.
fun_f
is a C function accepting arg
.
The result of u3m_soft()
is a cell whose head is an atom. If
the head is %$
- ie, 0
- the tail is the result of
fun_f(arg)
. Otherwise, the head is a term
(an atom which is
an LSB first string), and the tail is a (list tank)
(a list of
tank
printables - see ++tank
in hoon.hoon
). Error terms
should be the same as the exception terms above.
If you're confident that your computation won't fail, you can
use u3m_soft_sure()
, u3m_soft_slam()
, or u3m_soft_nock()
for C functions, Hoon function calls, and Nock invocations.
Caution - this returns just the result, and asserts globally.
All the u3m_soft
functions above work only on the surface.
Within the surface, virtualize with u3m_soft_run()
. Note that
this takes a fly
(a namespace gate), thus activating the 11
super-operator in the nock virtualizer, ++mock
. When actually
using the fly
, call u3m_soft_esc()
. Don't do either unless
you know what you're doing!
For descending into a subroad without Nock virtualization,
use u3m_hate()
and u3m_love
respectively. Hating enters
a subroad; loving leaves it, copying out a product noun.
Other miscellaneous tools in u3m
: u3m_file()
loads a Unix
file as a Nock atom; u3m_water()
measures the boundaries of
the loom in current use (ie, watermarks); and a variety of
prettyprinting routines, none perfect, are available, mainly for
debugging printfs: u3m_pretty()
, u3m_p()
, u3m_tape()
and
u3m_wall()
.
u3j: jets
The jet system, u3j
, is what makes u3
and nock
in any sense
a useful computing environment. Except perhaps u3a
(there is
really no such thing as a trivial allocator, though u3a
is
dumber than most) - u3j
is the most interesting code in u3
.
Let's consider the minor miracle of jet binding which lets u3j
work - and decrement not be O(n)
- without violating the
precisely defined semantics of pure Nock, ever.
It's easy to assume that jets represent an architectural coupling
between Hoon language semantics and Nock interpreter internals.
Indeed such a coupling would be wholly wrongtious and un-Urbit.
But the jet system is not Hoon-specific. It is specific to nock
runtime systems that use a design pattern we call a core
.
u3j: core structure
A core is no more than a cell [code data]
, in which a code
is
either a Nock formula or a cell of codes, and data
is anything.
In a proper core, the subject each formula expects is the core
itself.
Except for the arbitrary decision to make a core [code data]
,
(or as we sometimes say, [battery payload]
), instead of [data code]
, any high-level language transforming itself to Nock would
use this design. So jets are in fact fully general.
Now, a casual observer might look at [battery payload]
and
expect the simplest case of it to be [formula subject]
. That
is: to execute a simple core whose battery is a single formula,
we compute
nock(+.a -.a)
Then, naturally, when we go from Hoon or a high-level language
containing functions down to Nock, [function arguments]
turns
into [formula subject]
. This seems like an obvious design, and
we mention it only because it is completely wrong.
Rather, to execute a one-armed core like the above, we run
nock(a -.a)
and the normal structure of a gate
, which is simply Urbitese
for "function," is:
[formula [sample context]]
where sample
is Urbitese for "arguments" - and context
, any
Lisper will at once recognize, is Urbitese for "environment."
To slam
or call the gate, we simply replace the default sample
with the caller's data, then nock the formula on the entire gate.
What's in the context? Unlike in most dynamic languages, it is not some secret system-level bag of tricks. Almost always it is another core. This onion continues until at the bottom, there is an atomic constant, conventionally is the kernel version number.
Thus a (highly desirable) static
core is one of the form
[battery constant]
[battery static-core]
ie, a solid stack of nested libraries without any dynamic data. The typical gate will thus be, for example,
[formula [sample [battery battery battery constant]]]
but we would be most foolish to restrict the jet mechanism to
cores of this particular structure. We cannot constrain a
payload to be [sample static-core]
, or even [sample core]
.
Any such constraint would not be rich enough to handle Hoon,
let alone other languages.
u3j: jet state
There are two fundamental rules of computer science: (1) every
system is best understood through its state; (2) less state is
better than more state. Sadly, a pier has three different jet
state systems: cold
, warm
and hot
. It needs all of them.
Hot state is associated with this particular Unix process. The persistent pier is portable not just between process and process, but machine and machine or OS and OS. The set of jets loaded into a pier may itself change (in theory, though not in the present implementation) during the lifetime of the process. Hot state is a pure C data structure.
Cold state is associated with the logical execution history of the pier. It consists entirely of nouns and ignores restarts.
Warm state contains all dependencies between cold and hot
state. It consists of C structures allocated on the loom (with
u3_post
, ie, a word pointer relative to the loom).
Warm state is purely a function of cold and hot states, and
we can wipe and regenerate it at any time. On any restart where
the hot state might have changed, we clear the warm state
with u3j_ream()
.
There is only one hot state, the global jet dashboard
u3j_Dash
or u3D
for short. In the present implementation,
u3D is a static structure not modified at runtime, except for
numbering itself on process initialization. This structure -
which embeds function pointers to all the jets - is defined
in j/tree.c
. The data structures:
/* u3j_harm: jet arm.
*/
typedef struct _u3j_harm {
c3_c* fcs_c; // `.axe` or name
u3_noun (*fun_f)(u3_noun); // compute or 0 / semitransfer
c3_o ice; // perfect (don't test)
c3_o tot; // total (never punts)
c3_o liv; // live (enabled)
} u3j_harm;
/* u3j_core: driver definition.
*/
typedef struct _u3j_core {
c3_c* cos_c; // control string
struct _u3j_harm* arm_u; // blank-terminated static list
struct _u3j_core* dev_u; // blank-terminated static list
struct _u3j_core* par_u; // dynamic parent pointer
c3_l jax_l; // dynamic jet index
} u3j_core;
/* u3e_dash, u3_Dash, u3D: jet dashboard singleton
*/
typedef struct _u3e_dash {
u3j_core* dev_u; // null-terminated static list
c3_l len_l; // dynamic array length
c3_l all_l; // allocated length
u3j_core* ray_u; // dynamic array by axis
} u3j_dash;
Warm and cold state is per road. In other words, as we nest roads, we also nest jet state. The jet state in the road is:
struct { // jet dashboard
u3p(u3h_root) har_p; // warm state
u3_noun das; // cold state
} jed;
In case you understand Hoon, das
(cold state) is a ++dash
,
and har_p
(warm state) is a map from battery to ++calx
:
++ bane ,@tas :: battery name
++ bash ,@uvH :: ctx identity hash
++ bosh ,@uvH :: local battery hash
++ batt ,* :: battery
++ calx :: cached by battery
$: jax=,@ud :: jet index
pax=,@ud :: parent axis or 0
hap=(map ,@ud ,@ud) :: axis/jet
huc=(map term nock) :: name/tool
== ::
++ chum $? lef=term :: jet name
[std=term kel=@] :: kelvin version
[ven=term pro=term kel=@] :: vendor and product
[ven=term pro=term ver=@ kel=@] :: all of the above
== ::
++ clue (trel chum nock (list (pair term nock))):: battery definition
++ clog (pair cope (map batt (map term nock))) :: identity record
++ cope (trel bane axis (each bash noun)) :: core pattern
++ dash :: jet system
$: sys=(map batt bash) :: battery/identity
haw=(map bash clog) :: identity/core
== ::
The jet index in a ++calx
is an index into ray_u
in the
dashboard - ie, a pointer into hot state. This is why the
warm state has to be reset when we reload the pier.
Why is jet state nested? Nock of course is a functional system, so as we compute we don't explicitly create state. Jet state is an exception to this principle (which works only because it can't be semantically detected from Nock/Hoon) - but it can't violate the fundamental rules of the allocation system.
For instance, when we're on an inner road, we can't allocate on an outer road, or point from an outer road to an inner. So if we learn something - like a mapping from battery to jet - in the inner road, we have to keep it in the inner road.
Mitigating this problem, when we leave an inner road (with
u3m_love()
), we call u3j_reap()
to promote jet information in
the dying road. Reaping promotes anything we've learned about
any battery that either (a) already existed in the outer road, or
(b) is being saved to the outer road.
u3j: jet binding
Jet binding starts with a %fast
hint. (In Hoon, this is
produced by the runes ~%
, for the general case, or ~/
for simple functions.) To bind a jet, execute a formula of the
form:
[10 [%fast clue-formula] core-formula]
core-formula
assembles the core to be jet-propelled.
clue-formula
produces the hint information, or ++clue
above, which we want to annotate it with.
A clue is a triple of name, parent, and hooks:
++ clue (trel chum nock (list (pair term nock)))
The name, or ++chum
, has a bunch of historical structure which
we don't need (cleaning these things up is tricky), but just gets
flattened into a term.
The parent axis is a nock formula, but always reduces to a simple axis, which is the address of this core's parent. Consider again an ordinary gate
[formula [sample context]]
Typically the context
is itself a library core, which itself
has a jet binding. If so, the parent axis of this gate is 7
.
If the parent is already bound - and the parent must be already bound, in this road or a road containing it - we can hook this core bottom-up into a tree hierarchy. Normally the child core is produced by an arm of the parent core, so this is not a problem - we wouldn't have the child if we hadn't already made the parent.
The clue also contains a list of hooks, named nock formulas on the core. Usually these are arms, but they need not be. The point is that we often want to call a core from C, in a situation where we have no type or other source information. A common case of this is a complex system in which we're mixing functions which are jet-propelled with functions that aren't.
In any case, all the information in the %fast
hint goes to
u3j_mine()
, which registers the battery in cold state (das
in
jed
in u3R
), then warm state (har_p
in jed
).
It's essential to understand that the %fast
hint has to be,
well, fast - because we apply it whenever we build a core. For
instance, if the core is a Hoon gate - a function - we will call
u3j_mine
every time the function is called.
###: u3j: the cold jet dashboard
For even more fun, the jet tree is not actually a tree of batteries. Rather, multiple batteries may share any node in the jet tree. For instance, it's normal to have two equivalent Nock batteries at the same time in one pier: one battery compiled with debugging hints, one not.
Rather, the jet tree is a semantic hierarchy. The root of the hierarchy is a constant, by convention the Hoon kernel version because any normal jet-propelled core has, at the bottom of its onion of libraries, the standard kernel. Thus if the core is
[foo-battery [bar-battery [moo-battery 164]]]
we can reverse the nesting to construct a hierarchical core path. The static core
164/moo/bar/foo
extends the static core 164/moo/bar
by wrapping the foo
battery (ie, in Hoon, |%
) around it. With the core above,
you can compute foo
stuff, bar
stuff, and moo
stuff.
Rocket science, not.
Not all cores are static, of course - they may contain live data, like the sample in a gate (ie, argument to a function). Once again, it's important to remember that we track jet bindings not by the core, which may not be static, but by the battery, which is always static.
(And if you're wondering how we can use a phat noun like a Nock
formula or battery as a key in a key-value table, remember
mug_w
, the lazily computed short hash, in all boxed nouns.)
In any case, das
, the dashboard, contains sys
, a map from
battery to battery identity hash (++bash
), and haw
, a map
from bash
to battery record (++clog
).
A clog
is a cell whose tail is a hook map, straight from the
user's clue. The head is a ++cope
, which is a triple of
++bane
(battery name, right now just a term
); ++axis
,
the axis, within this core, of the parent; and (each bash noun)
, which is either [0 bash]
if the parent is another
core, or [1 noun]
, for the constant noun (like 164
) if
there is no parent core.
A bash
is just the noun hash (++sham
) of a cope
, which
uniquely expresses the battery's self-declared hierarchical
identity without depending on the actual battery code.
u3j: the warm jet dashboard
We don't use the cold state to match jets as we call them; we use the cold state to register jets as we find them, and also to rebuild the warm state after the hot state is reset.
What we actually use at runtime is the warm state, jed->har_p
,
which is a u3h
(built-in hashtable), allocated on the loom,
from battery to ++calx
.
A calx
is a quadruple of jax
, the jet index, an index into
ray_u
in u3j_dash
; pax
, the parent axis (as in
cope
above); hap
, a table from arm axis (ie, the axis of each
formula within the battery) to jet arm index (into arm_u
in
u3j_core
); and huc
, the hook table (as in clog
).
We construct hap
, when we create the calx, by iterating through
the arms registered in the u3j_core
. Note the way a u3j_harm
declares itself, with the string fcs_c
which can contain either
an axis or a name. Most jetted cores are of course gates, which
have one formula at one axis within the core: fcs_c
is ".3"
.
But we do often have fast cores with more complex arm structure,
and it would be sad to have to manage their axes by hand. To use
an fcs_c
with a named arm, it's sufficient to make sure the
name is bound to a formula [0 axis]
in the hook table.
u3j: the hot dashboard
Now it should be easy to see how we actually invoke jets. Every
time we run a nock 9
instruction (pretty often, obviously), we
have a core and an axis. We pass these to u3j_kick()
, which
will try to execute them.
Because nouns with a reference count of 1 are precious,
u3j_kick()
has a tricky reference control definition. It
reserves the right to return u3_none
in the case where there is
no jet, or the jet does not apply for this case; in this case, it
does not consume its argument cor
. Otherwise, it does.
u3j_kick()
searches for the battery (always the head of the
core, of course) in the hot dashboard. If the battery is
registered, it searches for the axis in hap
in the calx
.
If it exists, the core has a driver and the driver has a jet for
this arm, which we can try to call. If not, we return u3_none
.
Otherwise, we call fun_f
in our u3j_harm
. This obeys the
same protocol as u3j_kick()
; it can refuse to function by
Besides the actual function pointer fun_f
, we have some flags
in the u3j_harm
which tell us how to call the jet.
If ice
is yes (&
, 0
), the jet is known to be perfect and we
can just trust the product of fun_f
. Otherwise, we need to run
both the Nock arm and fun_f
, and compare their results.
(Note that while executing the C side of this test, we have to
set ice
to yes; on the Nock side, we have to set liv
to no.
Otherwise, many non-exponential functions become exponential.
When auto-testing jets in this way, the principle is that the
test is on the outermost layer of recursion.)
If tot
is yes, (&
, 0
), the jet is total and has to return
properly (though it can still return u3_none). Otherwise, it
is partial and can u3_cm_bail()
out with c3__punt. This
feature has a cost: the jet runs in a subroad.
Finally, if liv
is no (|
, 1), the jet is off and doesn't run.
It should be easy to see how the tree of cores gets declared -
precisely, in j/dash.c
. We declare the hierarchy as a tree
of u3j_core
structures, each of which comes with a static list
of arms arm_u
and sub-cores dev_u
.
In u3j_boot()
, we traverse the hierarchy, fill in parent
pointers par_u
, and enumerate all u3j_core
structures
into a single flat array u3j_dash.ray_u
. Our hot state
then appears ready for action.
u3j: jet functions
At present, all jets are compiled statically into u3
. This is
not a long-term permanent solution or anything. However, it will
always be the case with a certain amount of core functionality.
For instance, there are some jet functions that we need to call
as part of loading the Arvo kernel - like ++cue
to unpack a
noun from an atom. And obviously it makes sense, when jets are
significant enough to compile into u3
, to export their symbols
in headers and the linker.
There are three interface prefixes for standard jet functions:
u3k
, u3q
, and u3w
. All jets have u3w
interfaces; most
have u3q
; some have u3k
. Of course the actual logic is
shared.
u3w
interfaces use the same protocol as fun_f
above: the
caller passes the entire core, which is retained if the function
returns u3_none
, transferred otherwise.
u3q
interfaces break the core into C arguments, retain noun
arguments, and transfer noun returns. u3k
interfaces are the
same, except with more use of u3_none
and other simple C
variations on the Hoon original, but transfer both arguments
and returns. Generally, u3k
are most convenient for new code.
Following u3k/q/w
is [a-f]
, corresponding to the 6 logical
tiers of the kernel, or g
for user-level jets. Another letter
is added for functions within subcores. The filename, under
j/
, follows the tier and the function name.
For instance, ++add
is u3wa_add(cor)
, u3qa_add(a, b)
, or
u3ka_add(a, b)
, in j/a/add.c
. ++get
in ++by
is
u3wdb_get(cor)
, u3kdb_get(a, b)
, etc, in j/d/by_get.c
.
For historical reasons, all internal jet code in j/[a-f]
retains noun arguments, and transfers noun results. Please
do not do this in new g
jets! The new standard protocol is to
transfer both arguments and results.