urbit/pub/docs/dev/nock/reference.md

738 lines
23 KiB
Markdown
Raw Normal View History

2015-02-18 06:03:21 +03:00
Nock Reference
==============
Nouns
-----
1 :: A noun is an atom or a cell.
2 :: An atom is a natural number.
3 :: A cell is an ordered pair of nouns.
An **atom** is a natural number - ie, an unsigned integer. Nock does not
limit the size of **atoms**, or know what an **atom** means.
For instance, the **atom** `97` might mean the number `97`, or it might
mean the letter `a` (ASCII 97). A very large **atom** might be the
number of grains of sand on the beach - or it might be a GIF of your
children playing on the beach. Typically when we represent strings or
files as **atoms**, the first byte is the low byte. But even this is
just a convention. An **atom** is an **atom**.
A **cell** is an ordered pair of any two **nouns** - **cell** or
**atom**. We group cells with square brackets:
[1 1]
[34 45]
[[3 42] 12]
[[1 0] [0 [1 99]]]
*Nouns* are the dumbest data model ever. **Nouns** make JSON look like
XML and XML look like ASN.1. It may also remind you of Lisp's
S-expressions - you can think of nouns as "S-expressions without the S."
To be exact, a *noun* is an S-expression, except that classic
S-expressions have multiple **atom** types ("S" is for "symbol"). Since
Nock is designed to be used with a higher-level type system (such as
Hoon's), it does not need low-level types. An **atom** is just an
unsigned integer of any size.
For instance, it's common to represent strings (or even whole text
files) as atoms, arranging them LSB first - so "foo" becomes `0x6f6f66`.
How do we know to print this as "foo", not `0x6f6f66`? We need external
information - such as a Hoon type. Similarly, other common atomic
types - signed integers, floating point, etc — are all straightforward
to map into **atoms**.
It's also important to note that, unlike Lisp, Nock cannot create
cyclical data structures. It is normal and common for **nouns** in a
Nock runtime system to have acyclic structure - shared subtrees. But
there is no Nock computation that can make a child point to its parent.
One consequence: Nock has no garbage collector. (Nor can dag structure
be detected, as with Lisp eq.)
There is also no single syntax for **nouns**. If you have **nouns** you
have Nock; if you have Nock you have Hoon; if you have Hoon, you can
write whatever parser you like.
------------------------------------------------------------------------
The Nock Function
-----------------
5 :: nock(a) *a
Nock is a pure (stateless) function from **noun** to **noun**. In our
pseudocode (and only in our pseudocode) we express this with the prefix
operator `*`.
A Nock program is given meaning by a process of reduction. To compute
`nock(x)`, where `x` is any **noun**, we step through the rules from the
top down, find the first left-hand side that matches `x`, and reduce it
to the right-hand side (in more mathematical notation, we might write
line 5 as `nock(a) -> *a`, a style we do use in documentation)
When we use variable names, like `a`, in the pseudocode spec, we simply
mean that the rule fits for any **noun** `a`.
So `nock(x)` is `*x`, for any **noun** `x`. And how do we reduce `*x`?
Looking up, we see that lines 23 through 39 reduce `*x` - for different
patterns of `x`.
Normally `a` in `nock(a)` is a **cell** `[s f]`, or
[subject formula]
Intuitively, the formula is your function and the subject is its
argument. Hoon, or any other high-level language built on Nock, will
build its own function calling convention which does not map directly to
`*[subject formula]`.
------------------------------------------------------------------------
Bracket Grouping
----------------
6 :: [a b c] [a [b c]]
Brackets associate to the right.
So instead of writing
[2 [6 7]]
[2 [6 [14 15]]]
[2 [6 [[28 29] [30 31]]]]
[2 [6 [[28 29] [30 [62 63]]]]]
we can write
[2 6 7]
[2 6 14 15]
[2 6 [28 29] 30 31]
[2 6 [28 29] 30 62 63]
While this notational convenience is hardly rocket science, it's
surprising how confusing it can be, especially if you have a Lisp
background. Lisp's "S-expressions" are very similar to **nouns**, except
that Lisp has multiple types of **atom**, and Lisp's syntax
automatically adds list terminators to groups.
For those with Lisp experience, it's important to note that Nock and
Hoon use tuples or "improper lists" much more heavily than Lisp. The
list terminator, normally 0, is never automatic. So the Lisp list
(a b c)
becomes the Nock **noun**
[a b c 0]
which is equivalent to
[a [b [c 0]]]
Note that we can and do use unnecessary brackets anyway, for emphasis.
------------------------------------------------------------------------
Axiomatic Functions
-------------------
8 :: ?[a b] 0
9 :: ?a 1
10 :: +[a b] +[a b]
11 :: +a 1 + a
12 :: =[a a] 0
13 :: =[a b] 1
Here we define three of Nock's four axiomatic functions: **Cell-test**,
**Increment** and **Equals** (the fourth axiomatic function, called
**Address**, is defined in lines 16 through 21). These functions are
just pseudocode, not actual Nock syntax, and are only used to define the
behaviour of certain Nock operators.
We should note that in Nock and Hoon, `0` (pronounced "yes") is true,
and `1` ("no") is false. This convention is the opposite of
old-fashioned booleans, so we try hard to say "yes" and "no" instead of
"true" and "false."
------------------------------------------------------------------------
### Cell-test `?`
`?` (pronounced "wut") tests whether is a **noun** is a **cell**. Again,
`0` means "yes", `1` means "no":
8 :: ?[a b] 0
9 :: ?a 1
------------------------------------------------------------------------
### Increment `+`
`+` (pronounced "lus") adds `1` to an **atom**:
10 :: +[a b] +[a b]
11 :: +a 1 + a
------------------------------------------------------------------------
### Equals `=`
`=` (pronounced "tis") tests a cell for equality. `0` means "yes", `1`
means "no":
12 :: =[a a] 0
13 :: =[a b] 1
14 :: =a =a
Testing an **atom** for equality makes no sense and logically fails to
terminate.
Because `+` works only for **atoms**, whereas `=` works only for
**cells**, the error rules match first for `+` and last for `=`.
------------------------------------------------------------------------
Noun Address
------------
16 :: /[1 a] a
17 :: /[2 a b] a
18 :: /[3 a b] b
19 :: /[(a + a) b] /[2 /[a b]]
20 :: /[(a + a + 1) b] /[3 /[a b]]
21 :: /a /a
We define a **noun** as a binary tree - where each node branches to a
left and right child - and assign an address, or **axis**, to every
element in the tree. The root of the tree is `/1`. The left child
(**head**) of every node at `/a` is `/2a`; the right child (**tail**) is
`/2a+1`. (Writing `(a + a)` is just a clever way to write `2a`, while
minimizing the set of pseudocode forms.)
`1` is the root. The **head** of every **axis** `n` is `2n`; the
**tail** is `2n+1`. Thus a simple tree:
1
2 3
4 5 6 7
14 15
If the value of every leaf is its tree address, this tree is
[[4 5] [6 14 15]]
Let's use the example `[[97 2] [1 42 0]]`. So
/[1 [97 2] [1 42 0]] -> [[97 2] [1 42 0]]
because `/1` is the root of the tree, ie, the whole **noun**.
Its left child (**head**) is `/2` (i.e. `(1 + 1)`):
/[2 [97 2] [1 42 0]] -> [97 2]
And its right child (**tail**) is `/3` (i.e. `(1 + 1 + 1)`):
/[3 [97 2] [1 42 0]] -> [1 42 0]
And delving into `/3`, we see `/(3 + 3)` and `/(3 + 3 + 1)`:
/[6 [97 2] [1 42 0]] -> 1
/[7 [97 2] [1 42 0]] -> [42 0]
It's also fun to build nouns in which every atom is its own axis:
1
[2 3]
[2 6 7]
[[4 5] 6 7]
[[4 5] 6 14 15]
[[4 5] [12 13] 14 15]
[[4 [10 11]] [12 13] 14 15]
[[[8 9] [10 11]] [12 13] 14 30 31]
------------------------------------------------------------------------
Distribution
------------
23 :: *[a [b c] d] [*[a b c] *[a d]]
The practical domain of the Nock function is always a **cell**. When `a`
is an **atom**, `*a`, or `nock(a)`, is always an error. Conventionally,
Nock proper is always a function of a **cell**. The **head** of this
**cell** is the **subject**, the **tail** is the **formula**
[subject formula]
and the result of passing a **noun** through the Nock function is called
the **product**.
nock([subject formula]) => product
or
*[subject formula] => product
The **subject** is your data and the **formula** is your code. And the
**product** is your code's output.
Notice that `a` in the Nock rules is always the **subject**, except line
39, which is a crash default for malformed **nouns** that do not
evaluate.
A **formula** is always a **cell**. But when we look inside that
**cell**, we see two basic kinds of **formulas**:
[operator operands]
[formula-x formula-y]
An **operator** is always an **atom** (`0` through `10`). A **formula**
is always a **cell**. Line 23 distinguishes these forms:
23 :: *[a [b c] d] [*[a b c] *[a d]]
If your code contains multiple **formulas**, the **subject** will
distribute over those **formulas**.
In other words, if you have two Nock **formulas** `x` and `y`, a
**formula** that computes the pair of them is just the **cell** `[x y]`.
*[subject [x y]] -> [*[subject x] *[subject y]]
No **atom** is a valid **formula**, and every **formula** that does not
use line 23 has an atomic **head**.
Suppose you have two **formulas** `f` and `g`, each of which computes
some function of the **subject** `s`. You can then construct the
**formula** `h` as `[f g]`; and `h(s) = [f(s) g(s)]`.
For example:
*[[19 42] [0 3] 0 2]
The **subject** `s` is `[19 42]`. The **formula** `h` is `[[0 3] 0 2]`.
*[s h]
The **head** of `h` is `f`, which is `[0 3]`. The **tail** of `h` is
`g`, which is `[0 2]`.
*[s [f g]]
by the distribution rule, `*[s [f g]]` is
[*[s f] *[s g]]
or
[*[[19 42] [0 3]] *[[19 42] 0 2]]
`*[s f]` is `f(s)` and produces `42`. `*[s g]` is `g(s)` and produces
19.
Since `h(s)` is `[f(s) g(s)]`, `h(s)` is `[42 19]`:
*[[19 42] [0 3] 0 2] -> [42 19]
------------------------------------------------------------------------
Operator 0: Axis
----------------
25 :: *[a 0 b] /[b a]
Operator 0 is Nock's tree address or **axis** operator, using the tree
addressing structure defined in lines 16 through 20. `*[a 0 b]` simply
returns the value of the part of `a` at **axis** `b`. For any subject
`a`, the formula `[0 b]` produces `/[b a]`.
For example,
*[[19 42] 0 3] -> /[3 19 42] -> 42.
------------------------------------------------------------------------
Operator 1: Just
----------------
26 :: *[a 1 b] b
`1` is the constant, or **Just operator**. It produces its operand `b`
without reference to the **subject**.
For example,
*[42 1 57] -> 57
**Operator 1** is named **Just** because it produces "just" its operand.
------------------------------------------------------------------------
Operator 2: Fire
----------------
27 :: *[a 2 b c] *[*[a b] *[a c]]
**Operator 2** is the **Fire operator**, which brings us the essential
magic of recursion. Given the **formula** `[2 b c]`, `b` is a
**formula** for generating a new **subject**; `c` is a **formula** for
generating a new **formula**. To compute `*[a 2 b c]`, we evaluate both
`b` and `c` against the current **subject** `a`.
A common use of **Fire** is to evaluate data inside the **subject** as
code.
For example:
*[[[40 43] [4 0 1]] [2 [0 4] [0 3]]] -> 41
*[[[40 43] [4 0 1]] [2 [0 5] [0 3]]] -> 44
**Operator 2** is called **Fire** because it "fires" Nock **formulas**
at its (possibly modified) **subject**.
------------------------------------------------------------------------
Operator 3: Depth
-----------------
28 :: *[a 3 b] ?*[a b]
**Operator 3** applies the **Cell-test** function defined in lines 8 and
9 to the product of `*[a b]`.
**Operator 3** is called **Depth** because it tests the "depth" of a
noun. **Cell-test** properly refers to the pseudocode function `?`.
------------------------------------------------------------------------
Operator 4: Bump
----------------
29 :: *[a 4 b] +*[a b]
**Operator 4** applies the **Increment** function defined in lines 10
and 11 to the product of `*[a b]`.
**Operator 4** is called **Bump** because it "bumps" the atomic product
`*[a b]` up by 1. **Increment** properly refers to the pseudocode
function `+`.
------------------------------------------------------------------------
Operator 5: Same
----------------
30 :: *[a 5 b] =*[a b]
**Operator 5** applies the Equals function defined in lines 12, 13 and
14 to the product of \*[a b].
**Operator 5** is called the **Same** operator, because it tests if the
head and tail of the product of `*[a b]` are the same. "Equals" properly
refers to the pseudocode function "=".
------------------------------------------------------------------------
Operator 6: If
--------------
32 :: *[a 6 b c d] *[a 2 [0 1] 2 [1 c d] [1 0] 2 [1 2 3] [1 0] 4 4 b]
**Operator 6** is a primitive known to every programmer - **If**. Its
operands, a **test formula** `b`, a **then formula** `c` and an **else
formula** `d`.
If the **test** `b` applied to the **subject** evaluates to `0` ("yes"),
*[a b] -> 0
then **If** produces the result of `c`, the **then formula**, applied to
the **subject**,
*[a c]
Else, if applying the **test** to the **subject** produces `1` ("no"),
*[a b] -> 1
**Operator 6** produces the result of `d`, the **else formula**, applied
to the **subject**,
*[a d]
If `*[a b]` produces any value other than `0` ("yes") or `1` ("no"),
**If** crashes.
Let's examine the internals of **Operator 6**:
**If** could have been defined as a built-in pseudocode function, like
**Increment**:
:: $[0 b c] b
:: $[1 b c] c
Then **Operator 6** could have been restated quite compactly:
:: *[a 6 b c d] *[a $[*[a b] c d]]
However, this is unnecessary complexity. **If** can be written as a
macro using the primitive operators:
32 :: *[a 6 b c d] *[a 2 [0 1] 2 [1 c d] [1 0] 2 [1 2 3] [1 0] 4 4 b]
Reducing the right-hand side (an excellent exercise for the reader)
produces:
*[a *[[c d] [0 *[[2 3] [0 ++*[a b]]]]]]
Which is the reduced pseudocode form of **Operator 6**.
Additionally, we could simplify the semantics of **If**, at the expense
of breaking the system, by creating a macro that works as if and only if
`*[a b]` produces either `0` or `1`.
This simpler **If** would be:
:: *[a 6 b c d] *[a 2 [0 1] 2 [1 c d] [1 0] [4 4 b]]
which reduces to:
*[a *[[c d] [0 ++*[a b]]]]
Let's examine the internals of this macro with a test example:
*[[40 43] [6 [3 0 1] [4 0 2] [4 0 1]]]
Fitting this to the reduced form:
*[[40 43] *[[4 0 2] [4 0 3]] [0 ++*[[40 43] [3 0 1]]]]]
Our test:
*[[40 43] [3 0 1]]
produces a 0,
*[[40 43] *[[[4 0 2] [4 0 3]] [0 ++0]]]]
which gets incremented twice
*[[40 43] *[[[4 0 2] [4 0 3]] [0 2]]]
and is used as an axis to select the head of [[4 0 2] [4 0 3]]
*[[40 43] [4 0 2]]
which increments `40` to produce `41`. Had the **test** produced a "no"
instead of a "yes", **If** would have incremented the **tail** of the
subject instead of the **head**.
The real **If** is only slightly more complicated:
:: *[a 6 b c d] *[a *[[c d] [0 *[[2 3] [0 ++*[a b]]]]]]
There is an extra step in the real **If** to prevent unexpected
behaviour if the test produces a value other than 0 ("yes") or 1 ("no").
The real **If** will crash if this happens and the naive **If** may not
(the reader will find it a useful exercise to figure out why).
It's worth noting that practical, compiler-generated Nock never does
anything as funky as these **Operator 6** macro internals.
------------------------------------------------------------------------
Operator 7: Compose
-------------------
33 :: *[a 7 b c] *[a 2 b 1 c]
**Operator 7** implements function composition. To use a bit of math
notation, if we define the formula `[7 b c]` as a function `d(x)`:
d(a) == c(b(a))
This is apparent from the reduced pseudocode form of **Operator 7**:
*[*[a b] c]
As an example,
*[[42 44] [7 [4 0 3] [3 0 1]]] -> 1
The above sequentially applies the **formulas** `[4 0 3]` and `[3 0 1]`
to our subject `[42 44]`, first incrementing the **head**, then testing
the **depth**.
------------------------------------------------------------------------
Operator 8: Push
----------------
34 :: *[a 8 b c] *[a 7 [[7 [0 1] b] 0 1] c]
**Operator 8** pushes a new **noun** onto our **subject**, not at all
unlike pushing a new variable onto the stack.
The internals of **Operator 8** are similar to **Operator 7**, except
that the **subject** for `c` is not simply the **product** of `b`, but
the ordered pair of the **product** of `b` and the original **subject**.
This is apparent from the reduced pseudocode form of **Operator 8**:
*[[*[a b] a] c]
**Operator 8** evaluates the **formula** `c` with the **cell** of
`*[a b]` and the original **subject** `a`. In math notation, if we
define `[8 b c]` as `d(x)`:
d(a) == c([b(a) a])
Suppose, for the purposes of the **formula** `c`, we need not just the
**subject** `a`, but some intermediate **noun** computed from the
**subject** that will be useful in the calculation of `c`. **Operator
8** applies **formula** `c` to a new **subject** that is the pair of the
intermediate value and the old **subject**.
In higher-level languages that compile to Nock, variable declarations
are likely to generate an **Operator 8**, because the variable is
computed against the present **subject**, and used in a calculation
which depends both on the original **subject** and the new variable.
------------------------------------------------------------------------
Op 9: Call
----------
35 :: *[a 9 b c] *[a 7 c 2 [0 1] 0 b]
**Operator 9** is the **Call** operator and is used for calling and
applying **formulas** inside **noun** structures called **cores**
A **noun** can contain both data and code. By convention, all
interesting flow control in Nock is done with **cores**, which are
**cells** whose head is code (containing one or more **formulas**) and
whose **tail** is data (possibly containing other **cores**):
[code data]
All flow structures in other languages not built on Nock correspond to
**cores**. Functions and/or closures are **cores**, objects are
**cores**, modules are **cores**, even loops are **cores** (Nock, of
course, does not have a loop operator).
The head of a **core** is called the **battery** and the tail is called
the **payload**:
[battery payload]
The **payload** of a core is any useful data needed for computation.
The **battery** of a **core** is a **noun** containing one or more
**arms**, which are **formulas** whose **subject** is the entire
**core**.
For example, in the case of the **battery** containing three **arms**:
[arm1 [arm2 arm3]]
Where `arm1` is at **axis** `/2` in the **battery**, `arm2` is at `/6`,
and `arm3` is at `/7`. The **axes** will differ depending on the number
of **arms**.
Of course, because the **subject** of an **arm** is the entire **core**,
an **arm** can invoke itself (or any other **arm** in the **battery**).
Hence, it can loop. And this is what a loop is - the simplest of
**cores**.
Also, by convention, we have terms for different kinds of **cores**,
depending on their structure:
A **trap** is a **core** whose battery contains a single **arm**. By
convention, this arm is called `$` (pronounced 'buc'), which is the
empty-name. All **traps** have the following structure:
[$ payload]
A **door** is a **core** with a **payload** of the form
`[sample context]`. The **sample** is dynamic data (such as the
arguments of a function) and the **context** is any data that might be
useful (such as other **cores**, variables, or the entire kernel of your
OS). All **doors** have the following structure:
[battery [sample context]]
A **gate** is a **core** that is both a **door** and a **trap**.
**Gates** are the Nock equivalent of lambdas or functions. All **gates**
have the structure:
[$ [sample context]]
**Operator 9** constructs a **core** and activates it by calling one of
its **arms**.
Looking reduced pseudocode form of **Call**:
*[*[a c] *[*[a c] 0 b]]
**Call** applies a **formula** `c` to the **subject** `a`, where
`*[a c]` produces a **core**. **Call** then calls an **arm** `b` of the
**core** produced by `*[a c]` and reflexively applies it to the same
**core**.
------------------------------------------------------------------------
Op 10: Hint
-----------
36 :: *[a 10 [b c] d] *[a 8 c 7 [0 3] d]
37 :: *[a 10 b c] *[a c]
**Operator 10** serves a hint to the interpreter.
If `b` is an atom and `c` is a **formula**, the **formula** `[10 b c]`
appears to be equivalent to `c`. Likewise if `[b c]` is a **cell**,
`[10 [b c] d]` appears to be equivalent to `d`. **Operator 10** is
actually a hint operator. The `b` or `[b c]`is discarded information -
it is not used, formally, in the computation. It may help the
interpreter compute the expression more efficiently, however.
Every Nock computes the same result - but not all at the same speed.
What hints are supported? What do they do? Hints are a higher-level
convention which do not, and should not, appear in the Nock spec. Some
are defined in Hoon. Indeed, a naive Nock interpreter not optimized for
Hoon will run Hoon quite poorly. When it gets the product, however, the
product will be right. (Why is the `c` in `[b c]` computed? Because `c`
could crash. A correct Nock cannot simply ignore it, and treat both
variants of `10` as equivalent.)
------------------------------------------------------------------------
2015-06-22 22:02:41 +03:00
Op 11: Retrieve
2015-06-22 21:59:29 +03:00
-----------
2015-06-22 22:02:41 +03:00
**operator 11** retrieves data from the global namespace at the path it is given.
2015-06-22 21:59:29 +03:00
nock 11 is an additional nock operation provided by arvo to provide access to the global namespace. It's provided by arvo, so even though it's not technically a part of the nock, any code that arvo runs has access to it. When nock running on top of arvo hits nock 11, it either produces a value or blocks. The most common usage is with clay, where ^(%cx /path/to/file) will produce the referred-to-file.
2015-02-18 06:03:21 +03:00
Crash default
-------------
39 :: *a *a
The nock function is defined for every **noun**, but on many **nouns**
it does nothing useful. For instance, if `a` is an **atom**, `*a`
reduces to... `*a`. In theory, this means that Nock spins forever in an
infinite loop. In other words, Nock produces no result - and in
practice, your interpreter will stop.
Another way to see this is that Nock has "crash-only" semantics. There
is no exception mechanism. The only way to catch Nock errors is to
simulate Nock in a higher-level virtual Nock - which, in fact, we do all
the time. A simulator (or a practical low-level interpreter) can report,
out of band, that Nock would not terminate. It cannot recognize all
infinite loops, of course (cf. Halting problem), but it can recognize
common and obvious ones, which is usually sufficient in practice.