mirror of
https://github.com/ilyakooo0/urbit.git
synced 2024-12-17 11:51:32 +03:00
176 lines
6.5 KiB
Markdown
176 lines
6.5 KiB
Markdown
|
# u3: noun processing in C.
|
||
|
|
||
|
`u3` is the C library that makes Urbit work. If it wasn't called
|
||
|
`u3`, it might be called `libnoun` - it's a library for making
|
||
|
and storing nouns.
|
||
|
|
||
|
What's a noun? A noun is either a cell or an atom. A cell is an
|
||
|
ordered pair of any two nouns. An atom is an unsigned integer of
|
||
|
any size.
|
||
|
|
||
|
To the C programmer, this is not a terribly complicated data
|
||
|
structure, so why do you need a library for it?
|
||
|
|
||
|
One: nouns have a well-defined computation kernel, Nock, whose
|
||
|
spec fits on a page and gzips to 340 bytes. But the only
|
||
|
arithmetic operation in Nock is increment. So it's nontrivial
|
||
|
to compute both efficiently and correctly.
|
||
|
|
||
|
Two: `u3` is designed to support "permanent computing," ie, a
|
||
|
single-level store which is transparently checkpointed. This
|
||
|
implies a specialized memory-management model.
|
||
|
|
||
|
Does `u3` depend on the higher levels of Urbit, Arvo and Hoon?
|
||
|
Yes and no. It expects you to load something shaped like an Arvo
|
||
|
kernel, and use it as an event-processing function. But you
|
||
|
don't need to use this feature if you don't want, and your kernel
|
||
|
can be anything you want.
|
||
|
|
||
|
|
||
|
## u3: files and directory
|
||
|
|
||
|
The best way to introduce `u3` is with a simple map of the Urbit
|
||
|
build directory - folding things we don't care about right now:
|
||
|
|
||
|
g/ u3 implementation
|
||
|
g/a.c allocation
|
||
|
g/e.c persistence
|
||
|
g/h.c hashtables
|
||
|
g/i.c noun construction
|
||
|
g/j.c jet control
|
||
|
g/m.c master state
|
||
|
g/n.c nock execution
|
||
|
g/r.c noun access, error returns
|
||
|
g/t.c tracing/profiling
|
||
|
g/v.c arvo kernel
|
||
|
g/x.c noun access, error crashes
|
||
|
g/z.c memoization/caching
|
||
|
i/ all includes
|
||
|
i/v vere systems headers
|
||
|
i/g u3 headers (matching g/ names)
|
||
|
i/c c3 headers
|
||
|
i/c/defs.h miscellaneous c3 macros
|
||
|
i/c/motes.h symbolic constants
|
||
|
i/c/portable.h portability definitions
|
||
|
i/c/types.h c3 types
|
||
|
j/ jets
|
||
|
j/dash.c jet structures
|
||
|
j/1 tier 1 jets: basic math
|
||
|
j/2 tier 2 jets: lists
|
||
|
j/3 tier 3 jets: bit twiddling
|
||
|
j/4 tier 4 jets: containers
|
||
|
j/5 tier 5 jets: misc
|
||
|
j/6 tier 6 jets: hoon
|
||
|
v/ vere systems code
|
||
|
outside/ all external bundled code
|
||
|
|
||
|
(The `v/` code is part of `vere`, which uses `u3` to run Urbit.)
|
||
|
|
||
|
|
||
|
## c3: C in the Urbit environment
|
||
|
|
||
|
When writing C code in u3, please of course follow the
|
||
|
conventions of the code around you as regards indentation, etc.
|
||
|
It's especially important that every function have a header
|
||
|
comment, even if it says nothing interesting.
|
||
|
|
||
|
But some of our idiosyncrasies go beyond convention. Yes, we've
|
||
|
done awful things to C. Here's what we did and why we did.
|
||
|
|
||
|
### c3: integer types
|
||
|
|
||
|
First, it's generally acknowledged that underspecified integer
|
||
|
types are C's worst disaster. C99 fixed this, but the `stdint`
|
||
|
types are wordy and annoying. We've replaced them with:
|
||
|
|
||
|
/* Good integers.
|
||
|
*/
|
||
|
typedef uint64_t c3_d; // double-word
|
||
|
typedef int64_t c3_ds; // signed double-word
|
||
|
typedef uint32_t c3_w; // word
|
||
|
typedef int32_t c3_ws; // signed word
|
||
|
typedef uint16_t c3_s; // short
|
||
|
typedef int16_t c3_ss; // signed short
|
||
|
typedef uint8_t c3_y; // byte
|
||
|
typedef int8_t c3_ys; // signed byte
|
||
|
typedef uint8_t c3_b; // bit
|
||
|
|
||
|
typedef uint8_t c3_t; // boolean
|
||
|
typedef uint8_t c3_o; // loobean
|
||
|
typedef uint8_t c3_g; // 32-bit log - 0-31 bits
|
||
|
typedef uint32_t c3_l; // little; 31-bit unsigned integer
|
||
|
typedef uint32_t c3_m; // mote; also c3_l; LSB first a-z 4-char string.
|
||
|
|
||
|
/* Bad integers.
|
||
|
*/
|
||
|
typedef char c3_c; // does not match int8_t or uint8_t
|
||
|
typedef int c3_i; // int - really bad
|
||
|
typedef uintptr_t c3_p; // pointer-length uint - really really bad
|
||
|
typedef intptr_t c3_ps; // pointer-length int - really really bad
|
||
|
|
||
|
Some of these need explanation. A loobean is a Nock boolean -
|
||
|
Nock, for mysterious reasons, uses 0 as true (always say "yes")
|
||
|
and 1 as false (always say "no").
|
||
|
|
||
|
Nock and/or Hoon cannot tell the difference between a short atom
|
||
|
and a long one, but at the `u3` level every atom under `2^31` is
|
||
|
direct. The `c3_l` type is useful to annotate this. A `c3_m` is
|
||
|
a mote - a string of up to 4 characters in a `c3_l`, least
|
||
|
significant byte first. A `c3_g` should be a 5-bit atom. Of
|
||
|
course, C cannot enforce these constraints, only document them.
|
||
|
|
||
|
Use the "bad" - ie, poorly specified - integer types only when
|
||
|
interfacing with external code that expects them.
|
||
|
|
||
|
An enormous number of motes are defined in `i/c/motes.h`. There
|
||
|
is no reason to delete motes that aren't being used, or even to
|
||
|
modularize the definitions. Keep them alphabetical, though.
|
||
|
|
||
|
### c3: variables and variable naming
|
||
|
|
||
|
The C3 style uses Hoon style TLV variable names, with a quasi
|
||
|
Hungarian syntax. This is weird, but works really well, as
|
||
|
long as what you're doing isn't hideous.
|
||
|
|
||
|
A TLV variable name is a random pronounceable three-letter
|
||
|
string, sometimes with some vague relationship to its meaning,
|
||
|
but usually not. Usually CVC (consonant-vowel-consonant) is a
|
||
|
good choice.
|
||
|
|
||
|
You should use TLVs much the way math people use Greek letters.
|
||
|
The same concept should in general get the same name across
|
||
|
different contexts. When you're working in a given area, you'll
|
||
|
tend to remember the binding from TLV to concept by sheer power
|
||
|
of associative memory. When you come back to it, it's not that
|
||
|
hard to relearn. And of course, when in doubt, comment it.
|
||
|
|
||
|
Variables take pseudo-Hungarian suffixes, matching in general the
|
||
|
suffix of the integer type:
|
||
|
|
||
|
c3_w wor_w; // 32-bit word
|
||
|
|
||
|
Unlike in true Hungarian, there is no change for pointer
|
||
|
variables. Structure variables take a `_u` suffix;
|
||
|
|
||
|
### c3: loobeans
|
||
|
|
||
|
The code (from `defs.h`) tells the story:
|
||
|
|
||
|
# define c3y 0
|
||
|
# define c3n 1
|
||
|
|
||
|
# define _(x) (c3y == (x))
|
||
|
# define __(x) ((x) ? c3y : c3n)
|
||
|
# define c3a(x, y) __(_(x) && _(y))
|
||
|
# define c3o(x, y) __(_(x) || _(y))
|
||
|
|
||
|
In short, use `_()` to turn a loobean into a boolean, `__` to go
|
||
|
the other way. Use `!` as usual, `c3y` for yes and `c3n` for no,
|
||
|
`c3a` for and and `c3o` for or.
|
||
|
|
||
|
## u3: introduction to the noun world
|
||
|
|
||
|
|
||
|
|
||
|
|