shrub/Spec/u3.md
C. Guy Yarvin 8c8fa11104 Rename.
2014-11-06 11:13:57 -08:00

8.7 KiB

u3: noun processing in C.

u3 is the C library that makes Urbit work. If it wasn't called u3, it might be called libnoun - it's a library for making and storing nouns.

What's a noun? A noun is either a cell or an atom. A cell is an ordered pair of any two nouns. An atom is an unsigned integer of any size.

To the C programmer, this is not a terribly complicated data structure, so why do you need a library for it?

One: nouns have a well-defined computation kernel, Nock, whose spec fits on a page and gzips to 340 bytes. But the only arithmetic operation in Nock is increment. So it's nontrivial to compute both efficiently and correctly.

Two: u3 is designed to support "permanent computing," ie, a single-level store which is transparently checkpointed. This implies a specialized memory-management model, etc, etc.

(Does u3 depend on the higher levels of Urbit, Arvo and Hoon? Yes and no. It expects you to load something shaped like an Arvo kernel, and use it as an event-processing function. But you don't need to use this feature if you don't want, and your kernel can be anything you want.)

c3: C in Urbit

Under u3 is the simple c3 layer, which is just how we write C in Urbit.

When writing C in u3, please of course follow the conventions of the code around you as regards indentation, etc. It's especially important that every function have a header comment, even if it says nothing interesting.

But some of our idiosyncrasies go beyond convention. Yes, we've done awful things to C. Here's what we did and why we did.

c3: integer types

First, it's generally acknowledged that underspecified integer types are C's worst disaster. C99 fixed this, but the stdint types are wordy and annoying. We've replaced them with:

/* Good integers.
*/
  typedef uint64_t c3_d;  // double-word
  typedef int64_t c3_ds;  // signed double-word
  typedef uint32_t c3_w;  // word
  typedef int32_t c3_ws;  // signed word
  typedef uint16_t c3_s;  // short
  typedef int16_t c3_ss;  // signed short
  typedef uint8_t c3_y;   // byte
  typedef int8_t c3_ys;   // signed byte
  typedef uint8_t c3_b;   // bit

  typedef uint8_t c3_t;   // boolean
  typedef uint8_t c3_o;   // loobean
  typedef uint8_t c3_g;   // 32-bit log - 0-31 bits
  typedef uint32_t c3_l;  // little; 31-bit unsigned integer
  typedef uint32_t c3_m;  // mote; also c3_l; LSB first a-z 4-char string.

/* Bad integers.
*/
  typedef char      c3_c; // does not match int8_t or uint8_t
  typedef int       c3_i; // int - really bad
  typedef uintptr_t c3_p; // pointer-length uint - really really bad
  typedef intptr_t c3_ps; // pointer-length int - really really bad

Some of these need explanation. A loobean is a Nock boolean - Nock, for mysterious reasons, uses 0 as true (always say "yes") and 1 as false (always say "no").

Nock and/or Hoon cannot tell the difference between a short atom and a long one, but at the u3 level every atom under 2^31 is direct. The c3_l type is useful to annotate this. A c3_m is a mote - a string of up to 4 characters in a c3_l, least significant byte first. A c3_g should be a 5-bit atom. Of course, C cannot enforce these constraints, only document them.

Use the "bad" - ie, poorly specified - integer types only when interfacing with external code that expects them.

An enormous number of motes are defined in i/c/motes.h. There is no reason to delete motes that aren't being used, or even to modularize the definitions. Keep them alphabetical, though.

c3: variables and variable naming

The C3 style uses Hoon style TLV variable names, with a quasi Hungarian syntax. This is weird, but works really well, as long as what you're doing isn't hideous.

A TLV variable name is a random pronounceable three-letter string, sometimes with some vague relationship to its meaning, but usually not. Usually CVC (consonant-vowel-consonant) is a good choice.

You should use TLVs much the way math people use Greek letters. The same concept should in general get the same name across different contexts. When you're working in a given area, you'll tend to remember the binding from TLV to concept by sheer power of associative memory. When you come back to it, it's not that hard to relearn. And of course, when in doubt, comment it.

Variables take pseudo-Hungarian suffixes, matching in general the suffix of the integer type:

c3_w wor_w;     //  32-bit word

Unlike in true Hungarian, there is no change for pointer variables. Structure variables take a _u suffix;

c3: loobeans

The code (from defs.h) tells the story:

#     define c3y      0
#     define c3n      1

#     define _(x)        (c3y == (x))
#     define __(x)       ((x) ? c3y : c3n)
#     define c3a(x, y)   __(_(x) && _(y))
#     define c3o(x, y)   __(_(x) || _(y))

In short, use _() to turn a loobean into a boolean, __ to go the other way. Use ! as usual, c3y for yes and c3n for no, c3a for and and c3o for or.

u3: introduction to the noun world

The division between c3 and u3 is that you could theoretically imagine using c3 as just a generic C environment. Anything to do with nouns is in u3.

u3: a map of the system

There are two kinds of symbols in u3: regular and irregular. They all start with u3, but the regular names follow this pattern:

prefix   purpose                      header
---------------------------------------------------
u3a_    allocation
u3e_    persistence
u3h_    hashtables
u3i_    noun construction
u3j_    jet control
u3m_    system management
u3n_    nock computation
u3r_    noun access (error returns)
u3t_    profiling
u3v_    arvo
u3x_    noun access (error crashes)
u3z_    memoization

u3 deals with reference-counted, immutable, acyclic nouns. 90% of what you need to know to program in u3 is just how to get your refcounts right.

/**  Prefix definitions:
***
***  u3a_: fundamental allocators.
***  u3c_: constants.
***  u3e_: checkpointing.
***  u3h_: HAMT hash tables.
***  u3i_: noun constructors
***  u3j_: jets.
***  u3k*: direct jet calls (modern C convention)
***  u3m_: system management etc.
***  u3n_: nock interpreter.
***  u3o_: fundamental macros.
***  u3q*: direct jet calls (archaic C convention)
***  u3r_: read functions which never bail out.
***  u3s_: structures and definitions.
***  u3t_: tracing.
***  u3w_: direct jet calls (core noun convention)
***  u3x_: read functions which do bail out.
***  u3v_: arvo specific structures.
***  u3z_: memoization.
***
***  u3_cr_, u3_cx_, u3_cz_ functions use retain conventions; the caller
***  retains ownership of passed-in nouns, the callee preserves 
***  ownership of returned nouns.
***
***  Unless documented otherwise, all other functions use transfer 
***  conventions; the caller logically releases passed-in nouns, 
***  the callee logically releases returned nouns.
***
***  In general, exceptions to the transfer convention all occur
***  when we're using a noun as a key.
**/

The best way to introduce u3 is with a simple map of the Urbit build directory:

g/                  u3 implementation
  g/a.c               allocation
  g/e.c               persistence
  g/h.c               hashtables
  g/i.c               noun construction
  g/j.c               jet control
  g/m.c               master state
  g/n.c               nock execution
  g/r.c               noun access, error returns
  g/t.c               tracing/profiling
  g/v.c               arvo kernel
  g/x.c               noun access, error crashes
  g/z.c               memoization/caching
i/                  all includes
  i/v               vere systems headers
  i/g               u3 headers (matching g/ names)
  i/c               c3 headers
    i/c/defs.h        miscellaneous c3 macros
    i/c/motes.h       symbolic constants
    i/c/portable.h    portability definitions
    i/c/types.h       c3 types
  i/j               jet headers
    i/j/k.h           jet interfaces (transfer, args)
    i/j/q.h           jet interfaces (retain, args)
    i/j/w.h           jet interfaces (retain, core)
j/                  jet code
  j/dash.c            jet structures
  j/1                 tier 1 jets: basic math
  j/2                 tier 2 jets: lists
  j/3                 tier 3 jets: bit twiddling
  j/4                 tier 4 jets: containers
  j/5                 tier 5 jets: misc
  j/6                 tier 6 jets: hoon
v/                  vere systems code
outside/            all external bundled code