mirror of
https://github.com/urbit/shrub.git
synced 2024-11-24 13:06:09 +03:00
244 lines
8.7 KiB
Markdown
244 lines
8.7 KiB
Markdown
# u3: noun processing in C.
|
|
|
|
`u3` is the C library that makes Urbit work. If it wasn't called
|
|
`u3`, it might be called `libnoun` - it's a library for making
|
|
and storing nouns.
|
|
|
|
What's a noun? A noun is either a cell or an atom. A cell is an
|
|
ordered pair of any two nouns. An atom is an unsigned integer of
|
|
any size.
|
|
|
|
To the C programmer, this is not a terribly complicated data
|
|
structure, so why do you need a library for it?
|
|
|
|
One: nouns have a well-defined computation kernel, Nock, whose
|
|
spec fits on a page and gzips to 340 bytes. But the only
|
|
arithmetic operation in Nock is increment. So it's nontrivial
|
|
to compute both efficiently and correctly.
|
|
|
|
Two: `u3` is designed to support "permanent computing," ie, a
|
|
single-level store which is transparently checkpointed. This
|
|
implies a specialized memory-management model, etc, etc.
|
|
|
|
(Does `u3` depend on the higher levels of Urbit, Arvo and Hoon?
|
|
Yes and no. It expects you to load something shaped like an Arvo
|
|
kernel, and use it as an event-processing function. But you
|
|
don't need to use this feature if you don't want, and your kernel
|
|
can be anything you want.)
|
|
|
|
## c3: C in Urbit
|
|
|
|
Under `u3` is the simple `c3` layer, which is just how we write C
|
|
in Urbit.
|
|
|
|
When writing C in u3, please of course follow the conventions of
|
|
the code around you as regards indentation, etc. It's especially
|
|
important that every function have a header comment, even if it
|
|
says nothing interesting.
|
|
|
|
But some of our idiosyncrasies go beyond convention. Yes, we've
|
|
done awful things to C. Here's what we did and why we did.
|
|
|
|
### c3: integer types
|
|
|
|
First, it's generally acknowledged that underspecified integer
|
|
types are C's worst disaster. C99 fixed this, but the `stdint`
|
|
types are wordy and annoying. We've replaced them with:
|
|
|
|
/* Good integers.
|
|
*/
|
|
typedef uint64_t c3_d; // double-word
|
|
typedef int64_t c3_ds; // signed double-word
|
|
typedef uint32_t c3_w; // word
|
|
typedef int32_t c3_ws; // signed word
|
|
typedef uint16_t c3_s; // short
|
|
typedef int16_t c3_ss; // signed short
|
|
typedef uint8_t c3_y; // byte
|
|
typedef int8_t c3_ys; // signed byte
|
|
typedef uint8_t c3_b; // bit
|
|
|
|
typedef uint8_t c3_t; // boolean
|
|
typedef uint8_t c3_o; // loobean
|
|
typedef uint8_t c3_g; // 32-bit log - 0-31 bits
|
|
typedef uint32_t c3_l; // little; 31-bit unsigned integer
|
|
typedef uint32_t c3_m; // mote; also c3_l; LSB first a-z 4-char string.
|
|
|
|
/* Bad integers.
|
|
*/
|
|
typedef char c3_c; // does not match int8_t or uint8_t
|
|
typedef int c3_i; // int - really bad
|
|
typedef uintptr_t c3_p; // pointer-length uint - really really bad
|
|
typedef intptr_t c3_ps; // pointer-length int - really really bad
|
|
|
|
Some of these need explanation. A loobean is a Nock boolean -
|
|
Nock, for mysterious reasons, uses 0 as true (always say "yes")
|
|
and 1 as false (always say "no").
|
|
|
|
Nock and/or Hoon cannot tell the difference between a short atom
|
|
and a long one, but at the `u3` level every atom under `2^31` is
|
|
direct. The `c3_l` type is useful to annotate this. A `c3_m` is
|
|
a mote - a string of up to 4 characters in a `c3_l`, least
|
|
significant byte first. A `c3_g` should be a 5-bit atom. Of
|
|
course, C cannot enforce these constraints, only document them.
|
|
|
|
Use the "bad" - ie, poorly specified - integer types only when
|
|
interfacing with external code that expects them.
|
|
|
|
An enormous number of motes are defined in `i/c/motes.h`. There
|
|
is no reason to delete motes that aren't being used, or even to
|
|
modularize the definitions. Keep them alphabetical, though.
|
|
|
|
### c3: variables and variable naming
|
|
|
|
The C3 style uses Hoon style TLV variable names, with a quasi
|
|
Hungarian syntax. This is weird, but works really well, as
|
|
long as what you're doing isn't hideous.
|
|
|
|
A TLV variable name is a random pronounceable three-letter
|
|
string, sometimes with some vague relationship to its meaning,
|
|
but usually not. Usually CVC (consonant-vowel-consonant) is a
|
|
good choice.
|
|
|
|
You should use TLVs much the way math people use Greek letters.
|
|
The same concept should in general get the same name across
|
|
different contexts. When you're working in a given area, you'll
|
|
tend to remember the binding from TLV to concept by sheer power
|
|
of associative memory. When you come back to it, it's not that
|
|
hard to relearn. And of course, when in doubt, comment it.
|
|
|
|
Variables take pseudo-Hungarian suffixes, matching in general the
|
|
suffix of the integer type:
|
|
|
|
c3_w wor_w; // 32-bit word
|
|
|
|
Unlike in true Hungarian, there is no change for pointer
|
|
variables. Structure variables take a `_u` suffix;
|
|
|
|
### c3: loobeans
|
|
|
|
The code (from `defs.h`) tells the story:
|
|
|
|
# define c3y 0
|
|
# define c3n 1
|
|
|
|
# define _(x) (c3y == (x))
|
|
# define __(x) ((x) ? c3y : c3n)
|
|
# define c3a(x, y) __(_(x) && _(y))
|
|
# define c3o(x, y) __(_(x) || _(y))
|
|
|
|
In short, use `_()` to turn a loobean into a boolean, `__` to go
|
|
the other way. Use `!` as usual, `c3y` for yes and `c3n` for no,
|
|
`c3a` for and and `c3o` for or.
|
|
|
|
## u3: introduction to the noun world
|
|
|
|
The division between `c3` and `u3` is that you could theoretically
|
|
imagine using `c3` as just a generic C environment. Anything to do
|
|
with nouns is in `u3`.
|
|
|
|
### u3: a map of the system
|
|
|
|
There are two kinds of symbols in `u3`: regular and irregular.
|
|
They all start with `u3`, but the regular names follow this
|
|
pattern:
|
|
|
|
prefix purpose header
|
|
---------------------------------------------------
|
|
u3a_ allocation
|
|
u3e_ persistence
|
|
u3h_ hashtables
|
|
u3i_ noun construction
|
|
u3j_ jet control
|
|
u3m_ system management
|
|
u3n_ nock computation
|
|
u3r_ noun access (error returns)
|
|
u3t_ profiling
|
|
u3v_ arvo
|
|
u3x_ noun access (error crashes)
|
|
u3z_ memoization
|
|
|
|
|
|
u3 deals with reference-counted, immutable, acyclic nouns. 90%
|
|
of what you need to know to program in u3 is just how to get your
|
|
refcounts right.
|
|
|
|
|
|
/** Prefix definitions:
|
|
***
|
|
*** u3a_: fundamental allocators.
|
|
*** u3c_: constants.
|
|
*** u3e_: checkpointing.
|
|
*** u3h_: HAMT hash tables.
|
|
*** u3i_: noun constructors
|
|
*** u3j_: jets.
|
|
*** u3k*: direct jet calls (modern C convention)
|
|
*** u3m_: system management etc.
|
|
*** u3n_: nock interpreter.
|
|
*** u3o_: fundamental macros.
|
|
*** u3q*: direct jet calls (archaic C convention)
|
|
*** u3r_: read functions which never bail out.
|
|
*** u3s_: structures and definitions.
|
|
*** u3t_: tracing.
|
|
*** u3w_: direct jet calls (core noun convention)
|
|
*** u3x_: read functions which do bail out.
|
|
*** u3v_: arvo specific structures.
|
|
*** u3z_: memoization.
|
|
***
|
|
*** u3_cr_, u3_cx_, u3_cz_ functions use retain conventions; the caller
|
|
*** retains ownership of passed-in nouns, the callee preserves
|
|
*** ownership of returned nouns.
|
|
***
|
|
*** Unless documented otherwise, all other functions use transfer
|
|
*** conventions; the caller logically releases passed-in nouns,
|
|
*** the callee logically releases returned nouns.
|
|
***
|
|
*** In general, exceptions to the transfer convention all occur
|
|
*** when we're using a noun as a key.
|
|
**/
|
|
|
|
|
|
|
|
|
|
The best way to introduce `u3` is with a simple map of the Urbit
|
|
build directory:
|
|
|
|
g/ u3 implementation
|
|
g/a.c allocation
|
|
g/e.c persistence
|
|
g/h.c hashtables
|
|
g/i.c noun construction
|
|
g/j.c jet control
|
|
g/m.c master state
|
|
g/n.c nock execution
|
|
g/r.c noun access, error returns
|
|
g/t.c tracing/profiling
|
|
g/v.c arvo kernel
|
|
g/x.c noun access, error crashes
|
|
g/z.c memoization/caching
|
|
i/ all includes
|
|
i/v vere systems headers
|
|
i/g u3 headers (matching g/ names)
|
|
i/c c3 headers
|
|
i/c/defs.h miscellaneous c3 macros
|
|
i/c/motes.h symbolic constants
|
|
i/c/portable.h portability definitions
|
|
i/c/types.h c3 types
|
|
i/j jet headers
|
|
i/j/k.h jet interfaces (transfer, args)
|
|
i/j/q.h jet interfaces (retain, args)
|
|
i/j/w.h jet interfaces (retain, core)
|
|
j/ jet code
|
|
j/dash.c jet structures
|
|
j/1 tier 1 jets: basic math
|
|
j/2 tier 2 jets: lists
|
|
j/3 tier 3 jets: bit twiddling
|
|
j/4 tier 4 jets: containers
|
|
j/5 tier 5 jets: misc
|
|
j/6 tier 6 jets: hoon
|
|
v/ vere systems code
|
|
outside/ all external bundled code
|
|
|
|
|
|
|
|
|