urbit/pub/docs/dev/nock/reference.md
Galen Wolfe-Pauly bb495844be doc -> docs
2015-10-20 10:51:45 -07:00

23 KiB

Nock Reference

Nouns

1  ::    A noun is an atom or a cell.
2  ::    An atom is a natural number.
3  ::    A cell is an ordered pair of nouns.

An atom is a natural number - ie, an unsigned integer. Nock does not limit the size of atoms, or know what an atom means.

For instance, the atom 97 might mean the number 97, or it might mean the letter a (ASCII 97). A very large atom might be the number of grains of sand on the beach - or it might be a GIF of your children playing on the beach. Typically when we represent strings or files as atoms, the first byte is the low byte. But even this is just a convention. An atom is an atom.

A cell is an ordered pair of any two nouns - cell or atom. We group cells with square brackets:

[1 1] 
[34 45] 
[[3 42] 12] 
[[1 0] [0 [1 99]]]

Nouns are the dumbest data model ever. Nouns make JSON look like XML and XML look like ASN.1. It may also remind you of Lisp's S-expressions - you can think of nouns as "S-expressions without the S."

To be exact, a noun is an S-expression, except that classic S-expressions have multiple atom types ("S" is for "symbol"). Since Nock is designed to be used with a higher-level type system (such as Hoon's), it does not need low-level types. An atom is just an unsigned integer of any size.

For instance, it's common to represent strings (or even whole text files) as atoms, arranging them LSB first - so "foo" becomes 0x6f6f66. How do we know to print this as "foo", not 0x6f6f66? We need external information - such as a Hoon type. Similarly, other common atomic types - signed integers, floating point, etc — are all straightforward to map into atoms.

It's also important to note that, unlike Lisp, Nock cannot create cyclical data structures. It is normal and common for nouns in a Nock runtime system to have acyclic structure - shared subtrees. But there is no Nock computation that can make a child point to its parent. One consequence: Nock has no garbage collector. (Nor can dag structure be detected, as with Lisp eq.)

There is also no single syntax for nouns. If you have nouns you have Nock; if you have Nock you have Hoon; if you have Hoon, you can write whatever parser you like.


The Nock Function

5  ::    nock(a)           *a

Nock is a pure (stateless) function from noun to noun. In our pseudocode (and only in our pseudocode) we express this with the prefix operator *.

A Nock program is given meaning by a process of reduction. To compute nock(x), where x is any noun, we step through the rules from the top down, find the first left-hand side that matches x, and reduce it to the right-hand side (in more mathematical notation, we might write line 5 as nock(a) -> *a, a style we do use in documentation)

When we use variable names, like a, in the pseudocode spec, we simply mean that the rule fits for any noun a.

So nock(x) is *x, for any noun x. And how do we reduce *x? Looking up, we see that lines 23 through 39 reduce *x - for different patterns of x.

Normally a in nock(a) is a cell [s f], or

[subject formula] 

Intuitively, the formula is your function and the subject is its argument. Hoon, or any other high-level language built on Nock, will build its own function calling convention which does not map directly to *[subject formula].


Bracket Grouping

6  ::    [a b c]           [a [b c]]

Brackets associate to the right.

So instead of writing

[2 [6 7]]
[2 [6 [14 15]]]
[2 [6 [[28 29] [30 31]]]]
[2 [6 [[28 29] [30 [62 63]]]]]

we can write

[2 6 7]
[2 6 14 15]
[2 6 [28 29] 30 31]
[2 6 [28 29] 30 62 63]

While this notational convenience is hardly rocket science, it's surprising how confusing it can be, especially if you have a Lisp background. Lisp's "S-expressions" are very similar to nouns, except that Lisp has multiple types of atom, and Lisp's syntax automatically adds list terminators to groups.

For those with Lisp experience, it's important to note that Nock and Hoon use tuples or "improper lists" much more heavily than Lisp. The list terminator, normally 0, is never automatic. So the Lisp list

(a b c)

becomes the Nock noun

[a b c 0]

which is equivalent to

[a [b [c 0]]]

Note that we can and do use unnecessary brackets anyway, for emphasis.


Axiomatic Functions

8  ::    ?[a b]           0
9  ::    ?a               1
10 ::    +[a b]           +[a b]
11 ::    +a               1 + a
12 ::    =[a a]           0
13 ::    =[a b]           1

Here we define three of Nock's four axiomatic functions: Cell-test, Increment and Equals (the fourth axiomatic function, called Address, is defined in lines 16 through 21). These functions are just pseudocode, not actual Nock syntax, and are only used to define the behaviour of certain Nock operators.

We should note that in Nock and Hoon, 0 (pronounced "yes") is true, and 1 ("no") is false. This convention is the opposite of old-fashioned booleans, so we try hard to say "yes" and "no" instead of "true" and "false."


Cell-test ?

? (pronounced "wut") tests whether is a noun is a cell. Again, 0 means "yes", 1 means "no":

8  ::    ?[a b]           0
9  ::    ?a               1

Increment +

+ (pronounced "lus") adds 1 to an atom:

10 ::    +[a b]           +[a b]
11 ::    +a               1 + a

Equals =

= (pronounced "tis") tests a cell for equality. 0 means "yes", 1 means "no":

12 ::    =[a a]           0
13 ::    =[a b]           1
14 ::    =a               =a

Testing an atom for equality makes no sense and logically fails to terminate.

Because + works only for atoms, whereas = works only for cells, the error rules match first for + and last for =.


Noun Address

16 ::    /[1 a]           a
17 ::    /[2 a b]         a
18 ::    /[3 a b]         b
19 ::    /[(a + a) b]     /[2 /[a b]]
20 ::    /[(a + a + 1) b] /[3 /[a b]]
21 ::    /a               /a

We define a noun as a binary tree - where each node branches to a left and right child - and assign an address, or axis, to every element in the tree. The root of the tree is /1. The left child (head) of every node at /a is /2a; the right child (tail) is /2a+1. (Writing (a + a) is just a clever way to write 2a, while minimizing the set of pseudocode forms.)

1 is the root. The head of every axis n is 2n; the tail is 2n+1. Thus a simple tree:

     1
  2      3
4   5  6   7
         14 15

If the value of every leaf is its tree address, this tree is

[[4 5] [6 14 15]]

Let's use the example [[97 2] [1 42 0]]. So

/[1 [97 2] [1 42 0]] -> [[97 2] [1 42 0]]

because /1 is the root of the tree, ie, the whole noun.

Its left child (head) is /2 (i.e. (1 + 1)):

/[2 [97 2] [1 42 0]] -> [97 2]

And its right child (tail) is /3 (i.e. (1 + 1 + 1)):

/[3 [97 2] [1 42 0]] -> [1 42 0]

And delving into /3, we see /(3 + 3) and /(3 + 3 + 1):

/[6 [97 2] [1 42 0]] -> 1
/[7 [97 2] [1 42 0]] -> [42 0]

It's also fun to build nouns in which every atom is its own axis:

1
[2 3]
[2 6 7]
[[4 5] 6 7]
[[4 5] 6 14 15]
[[4 5] [12 13] 14 15]
[[4 [10 11]] [12 13] 14 15]
[[[8 9] [10 11]] [12 13] 14 30 31]

Distribution

23 ::    *[a [b c] d]      [*[a b c] *[a d]]

The practical domain of the Nock function is always a cell. When a is an atom, *a, or nock(a), is always an error. Conventionally, Nock proper is always a function of a cell. The head of this cell is the subject, the tail is the formula

[subject formula]

and the result of passing a noun through the Nock function is called the product.

nock([subject formula]) => product

or

*[subject formula] => product

The subject is your data and the formula is your code. And the product is your code's output.

Notice that a in the Nock rules is always the subject, except line 39, which is a crash default for malformed nouns that do not evaluate.

A formula is always a cell. But when we look inside that cell, we see two basic kinds of formulas:

[operator operands]
[formula-x formula-y]

An operator is always an atom (0 through 10). A formula is always a cell. Line 23 distinguishes these forms:

23 ::    *[a [b c] d]     [*[a b c] *[a d]]

If your code contains multiple formulas, the subject will distribute over those formulas.

In other words, if you have two Nock formulas x and y, a formula that computes the pair of them is just the cell [x y].

*[subject [x y]]  -> [*[subject x] *[subject y]]

No atom is a valid formula, and every formula that does not use line 23 has an atomic head.

Suppose you have two formulas f and g, each of which computes some function of the subject s. You can then construct the formula h as [f g]; and h(s) = [f(s) g(s)].

For example:

*[[19 42] [0 3] 0 2]

The subject s is [19 42]. The formula h is [[0 3] 0 2].

*[s h]

The head of h is f, which is [0 3]. The tail of h is g, which is [0 2].

*[s [f g]]

by the distribution rule, *[s [f g]] is

[*[s f] *[s g]]

or

[*[[19 42] [0 3]] *[[19 42] 0 2]]

*[s f] is f(s) and produces 42. *[s g] is g(s) and produces 19.

Since h(s) is [f(s) g(s)], h(s) is [42 19]:

*[[19 42] [0 3] 0 2] -> [42 19]

Operator 0: Axis

25 ::    *[a 0 b]         /[b a]

Operator 0 is Nock's tree address or axis operator, using the tree addressing structure defined in lines 16 through 20. *[a 0 b] simply returns the value of the part of a at axis b. For any subject a, the formula [0 b] produces /[b a].

For example,

*[[19 42] 0 3] -> /[3 19 42] -> 42.

Operator 1: Just

26 ::    *[a 1 b]          b

1 is the constant, or Just operator. It produces its operand b without reference to the subject.

For example,

*[42 1 57] -> 57

Operator 1 is named Just because it produces "just" its operand.


Operator 2: Fire

27 ::    *[a 2 b c]       *[*[a b] *[a c]]

Operator 2 is the Fire operator, which brings us the essential magic of recursion. Given the formula [2 b c], b is a formula for generating a new subject; c is a formula for generating a new formula. To compute *[a 2 b c], we evaluate both b and c against the current subject a.

A common use of Fire is to evaluate data inside the subject as code.

For example:

*[[[40 43] [4 0 1]] [2 [0 4] [0 3]]] -> 41

*[[[40 43] [4 0 1]] [2 [0 5] [0 3]]] -> 44

Operator 2 is called Fire because it "fires" Nock formulas at its (possibly modified) subject.


Operator 3: Depth

28 ::    *[a 3 b]         ?*[a b]

Operator 3 applies the Cell-test function defined in lines 8 and 9 to the product of *[a b].

Operator 3 is called Depth because it tests the "depth" of a noun. Cell-test properly refers to the pseudocode function ?.


Operator 4: Bump

29 ::    *[a 4 b]         +*[a b]

Operator 4 applies the Increment function defined in lines 10 and 11 to the product of *[a b].

Operator 4 is called Bump because it "bumps" the atomic product *[a b] up by 1. Increment properly refers to the pseudocode function +.


Operator 5: Same

30 ::    *[a 5 b]         =*[a b]

Operator 5 applies the Equals function defined in lines 12, 13 and 14 to the product of *[a b].

Operator 5 is called the Same operator, because it tests if the head and tail of the product of *[a b] are the same. "Equals" properly refers to the pseudocode function "=".


Operator 6: If

32 ::    *[a 6 b c d]     *[a 2 [0 1] 2 [1 c d] [1 0] 2 [1 2 3] [1 0] 4 4 b]

Operator 6 is a primitive known to every programmer - If. Its operands, a test formula b, a then formula c and an else formula d.

If the test b applied to the subject evaluates to 0 ("yes"),

*[a b] -> 0

then If produces the result of c, the then formula, applied to the subject,

*[a c]

Else, if applying the test to the subject produces 1 ("no"),

*[a b] -> 1

Operator 6 produces the result of d, the else formula, applied to the subject,

*[a d]

If *[a b] produces any value other than 0 ("yes") or 1 ("no"), If crashes.

Let's examine the internals of Operator 6:

If could have been defined as a built-in pseudocode function, like Increment:

::    $[0 b c]         b
::    $[1 b c]         c

Then Operator 6 could have been restated quite compactly:

::    *[a 6 b c d]     *[a $[*[a b] c d]]

However, this is unnecessary complexity. If can be written as a macro using the primitive operators:

32 ::    *[a 6 b c d]     *[a 2 [0 1] 2 [1 c d] [1 0] 2 [1 2 3] [1 0] 4 4 b]

Reducing the right-hand side (an excellent exercise for the reader) produces:

*[a *[[c d] [0 *[[2 3] [0 ++*[a b]]]]]]

Which is the reduced pseudocode form of Operator 6.

Additionally, we could simplify the semantics of If, at the expense of breaking the system, by creating a macro that works as if and only if *[a b] produces either 0 or 1.

This simpler If would be:

::    *[a 6 b c d]    *[a 2 [0 1] 2 [1 c d] [1 0] [4 4 b]]

which reduces to:

*[a *[[c d] [0 ++*[a b]]]]

Let's examine the internals of this macro with a test example:

*[[40 43] [6 [3 0 1] [4 0 2] [4 0 1]]]

Fitting this to the reduced form:

*[[40 43] *[[4 0 2] [4 0 3]] [0 ++*[[40 43] [3 0 1]]]]]

Our test:

*[[40 43] [3 0 1]] 

produces a 0,

*[[40 43] *[[[4 0 2] [4 0 3]] [0 ++0]]]]

which gets incremented twice

*[[40 43] *[[[4 0 2] [4 0 3]] [0 2]]]

and is used as an axis to select the head of 4 0 2] [4 0 3

*[[40 43] [4 0 2]]

which increments 40 to produce 41. Had the test produced a "no" instead of a "yes", If would have incremented the tail of the subject instead of the head.

The real If is only slightly more complicated:

::    *[a 6 b c d]     *[a *[[c d] [0 *[[2 3] [0 ++*[a b]]]]]]

There is an extra step in the real If to prevent unexpected behaviour if the test produces a value other than 0 ("yes") or 1 ("no"). The real If will crash if this happens and the naive If may not (the reader will find it a useful exercise to figure out why).

It's worth noting that practical, compiler-generated Nock never does anything as funky as these Operator 6 macro internals.


Operator 7: Compose

33 ::    *[a 7 b c]       *[a 2 b 1 c]

Operator 7 implements function composition. To use a bit of math notation, if we define the formula [7 b c] as a function d(x):

d(a) == c(b(a))

This is apparent from the reduced pseudocode form of Operator 7:

*[*[a b] c]

As an example,

*[[42 44] [7 [4 0 3] [3 0 1]]] -> 1

The above sequentially applies the formulas [4 0 3] and [3 0 1] to our subject [42 44], first incrementing the head, then testing the depth.


Operator 8: Push

34 ::    *[a 8 b c]       *[a 7 [[7 [0 1] b] 0 1] c]

Operator 8 pushes a new noun onto our subject, not at all unlike pushing a new variable onto the stack.

The internals of Operator 8 are similar to Operator 7, except that the subject for c is not simply the product of b, but the ordered pair of the product of b and the original subject.

This is apparent from the reduced pseudocode form of Operator 8:

*[[*[a b] a] c]

Operator 8 evaluates the formula c with the cell of *[a b] and the original subject a. In math notation, if we define [8 b c] as d(x):

d(a) == c([b(a) a])

Suppose, for the purposes of the formula c, we need not just the subject a, but some intermediate noun computed from the subject that will be useful in the calculation of c. Operator 8 applies formula c to a new subject that is the pair of the intermediate value and the old subject.

In higher-level languages that compile to Nock, variable declarations are likely to generate an Operator 8, because the variable is computed against the present subject, and used in a calculation which depends both on the original subject and the new variable.


Op 9: Call

35 ::    *[a 9 b c]       *[a 7 c 2 [0 1] 0 b]

Operator 9 is the Call operator and is used for calling and applying formulas inside noun structures called cores

A noun can contain both data and code. By convention, all interesting flow control in Nock is done with cores, which are cells whose head is code (containing one or more formulas) and whose tail is data (possibly containing other cores):

[code data]

All flow structures in other languages not built on Nock correspond to cores. Functions and/or closures are cores, objects are cores, modules are cores, even loops are cores (Nock, of course, does not have a loop operator).

The head of a core is called the battery and the tail is called the payload:

[battery payload]

The payload of a core is any useful data needed for computation.

The battery of a core is a noun containing one or more arms, which are formulas whose subject is the entire core.

For example, in the case of the battery containing three arms:

[arm1 [arm2 arm3]]

Where arm1 is at axis /2 in the battery, arm2 is at /6, and arm3 is at /7. The axes will differ depending on the number of arms.

Of course, because the subject of an arm is the entire core, an arm can invoke itself (or any other arm in the battery). Hence, it can loop. And this is what a loop is - the simplest of cores.

Also, by convention, we have terms for different kinds of cores, depending on their structure:

A trap is a core whose battery contains a single arm. By convention, this arm is called $ (pronounced 'buc'), which is the empty-name. All traps have the following structure:

[$ payload]

A door is a core with a payload of the form [sample context]. The sample is dynamic data (such as the arguments of a function) and the context is any data that might be useful (such as other cores, variables, or the entire kernel of your OS). All doors have the following structure:

[battery [sample context]]

A gate is a core that is both a door and a trap. Gates are the Nock equivalent of lambdas or functions. All gates have the structure:

[$ [sample context]]

Operator 9 constructs a core and activates it by calling one of its arms.

Looking reduced pseudocode form of Call:

*[*[a c] *[*[a c] 0 b]]

Call applies a formula c to the subject a, where *[a c] produces a core. Call then calls an arm b of the core produced by *[a c] and reflexively applies it to the same core.


Op 10: Hint

36 ::    *[a 10 [b c] d]  *[a 8 c 7 [0 3] d]
37 ::    *[a 10 b c]      *[a c]

Operator 10 serves a hint to the interpreter.

If b is an atom and c is a formula, the formula [10 b c] appears to be equivalent to c. Likewise if [b c] is a cell, [10 [b c] d] appears to be equivalent to d. Operator 10 is actually a hint operator. The b or [b c]is discarded information - it is not used, formally, in the computation. It may help the interpreter compute the expression more efficiently, however.

Every Nock computes the same result - but not all at the same speed. What hints are supported? What do they do? Hints are a higher-level convention which do not, and should not, appear in the Nock spec. Some are defined in Hoon. Indeed, a naive Nock interpreter not optimized for Hoon will run Hoon quite poorly. When it gets the product, however, the product will be right. (Why is the c in [b c] computed? Because c could crash. A correct Nock cannot simply ignore it, and treat both variants of 10 as equivalent.)


Op 11: Retrieve

operator 11 retrieves data from the global namespace at the path it is given.

nock 11 is an additional nock operation provided by arvo to provide access to the global namespace. It's provided by arvo, so even though it's not technically a part of the nock, any code that arvo runs has access to it. When nock running on top of arvo hits nock 11, it either produces a value or blocks. The most common usage is with clay, where ^(%cx /path/to/file) will produce the referred-to-file.

Crash default

39 ::    *a               *a

The nock function is defined for every noun, but on many nouns it does nothing useful. For instance, if a is an atom, *a reduces to... *a. In theory, this means that Nock spins forever in an infinite loop. In other words, Nock produces no result - and in practice, your interpreter will stop.

Another way to see this is that Nock has "crash-only" semantics. There is no exception mechanism. The only way to catch Nock errors is to simulate Nock in a higher-level virtual Nock - which, in fact, we do all the time. A simulator (or a practical low-level interpreter) can report, out of band, that Nock would not terminate. It cannot recognize all infinite loops, of course (cf. Halting problem), but it can recognize common and obvious ones, which is usually sufficient in practice.