mirror of
https://github.com/ilyakooo0/urbit.git
synced 2024-12-19 04:41:37 +03:00
270 lines
9.7 KiB
Plaintext
270 lines
9.7 KiB
Plaintext
---
|
|
title: Hoon 101.0: nouns, spans, and molds
|
|
sort: 0
|
|
next: true
|
|
---
|
|
|
|
# Hoon 101.0: nouns, spans and molds
|
|
|
|
Hoon is a strict, higher-order typed pure-functional language.
|
|
|
|
Why Hoon? Typed functional languages are known for a pleasant
|
|
phenomenon: once your code compiles, it's quite likely to work.
|
|
But most typed functional languages are conceptually dependent on
|
|
abstract advanced math, and difficult to understand without it.
|
|
|
|
Hoon is a typed FP language for the common street programmer.
|
|
Well-written Hoon is as concrete and data-oriented as possible.
|
|
The less functional magic you use, the better. But the magic is
|
|
there, mostly, if you need it.
|
|
|
|
The main disadvantage of Hoon is that its syntax and semantics
|
|
are unfamiliar. The syntax will remind too many of Perl, but
|
|
like most human languages (and unlike Perl) it combines a regular
|
|
core structure with irregular variations. Its semantic
|
|
complexity is bounded by the fact that the compiler is only 2000
|
|
lines of Hoon (admittedly an expressive language). Most peoples'
|
|
experience is that Hoon is much easier to learn than it looks.
|
|
It does not look easy to learn, though!
|
|
|
|
But let's give it a try. One style point: we'll nest design
|
|
digressions in braces. If you see a {paragraph} or two,
|
|
assume it's of interest to language nerds only. These
|
|
digressions are "guaranteed not on the test."
|
|
|
|
> The name "Hoon" is from the Wallace Stevens poem, _Tea at the
|
|
Palaz of Hoon_. It also means "hooligan" in Australian.
|
|
|
|
## Nouns: data made boring
|
|
|
|
A noun is an atom or a cell. An atom is any unsigned integer. A
|
|
cell is an ordered pair of nouns.
|
|
|
|
The noun is an intentionally boring data model. Nouns don't have
|
|
cycles (although a noun implementation should take advantage of
|
|
acyclic graph structure). Noun comparison is always by value
|
|
(there is no way for the programmer to test pointer equality).
|
|
Nouns are strict; there is no such thing as an infinite noun.
|
|
And, of course, nouns are immutable. There's basically no way to
|
|
have any real fun with nouns.
|
|
|
|
> Nouns are Lisp's S-expressions, minus a lot of hacks, tricks,
|
|
and features that made sense 50 years ago. In particular,
|
|
because atoms are not tagged (an atom can encode a string, for
|
|
instance), nouns need a static type system. How do you print an
|
|
atom if you don't know whether it's a string or a number?
|
|
|
|
## A type system for nouns
|
|
|
|
One obstacle to learning Hoon is that it has two quite distinct
|
|
concepts that might equally be called a "type." Worse, most
|
|
other typed functional languages are mathy and share a basically
|
|
mathematical concept of "type." Hoon does not have this concept
|
|
at all. We can't avoid using the T-word occasionally, but it has
|
|
no precise meaning in Hoon and can be extremely confusing.
|
|
|
|
Hoon's two kinds of "type" are `span` and `mold`. A span is both
|
|
a constructively defined set of nouns, and a semantic convention
|
|
for users in that set. A `mold` is a function whose range is
|
|
some useful span. A mold is always idempotent (for any noun x,
|
|
`f(x)` equals `f(f(x))`), and its domain is any noun.
|
|
|
|
One way to explain this is that while a span is what most
|
|
languages call a "type," Hoon has no syntax for the programmer to
|
|
define a span directly. Instead, we use inference to define it
|
|
as the range of a mold function. This mold can also be used to
|
|
validate or normalize untrusted, untyped data -- a common problem
|
|
in modern programming, because networks.
|
|
|
|
Hoon's inference algorithm is dumber than the unification
|
|
algorithms (Hindley-Milner) used in most typed functional
|
|
languages. Hoon thinks only forward, not backward. Eg, Haskell
|
|
can infer the result type of a function from its argument
|
|
(forward), or the argument type from the result (backward).
|
|
Hoon can do the first but not the second.
|
|
|
|
So Hoon needs more manual typecasts, which you usually want
|
|
anyway for prosaic software-engineering reasons. Otherwise its
|
|
typesystem solves more or less the same job, including
|
|
pattern-matching, genericity / typeclasses, etc.
|
|
|
|
{Sending a noun over the network is a good example. In a normal
|
|
modern language, you serialize and deserialize a data type by
|
|
extending your type to implement a serialization interface. In
|
|
Hoon, any value is just a noun, so we have one function (`jam`)
|
|
that converts any noun to an atom, and another (`cue`) that is
|
|
its inverse. To validate, the receiver runs its own mold on the
|
|
cued noun, and we've sent typed data over the network without any
|
|
attack surface (except `jam` and `cue`, which fit on a page). No
|
|
custom serialization methods are required, and the mold itself is
|
|
never sent; protocol agreement is out of band.}
|
|
|
|
## Let's make some nouns
|
|
|
|
Nouns aren't even slightly hard. Let's make a noun:
|
|
```
|
|
~tasfyn-partyv:dojo> 42
|
|
```
|
|
You'll see the expression you entered, then the resulting value:
|
|
```
|
|
> 42
|
|
42
|
|
```
|
|
Let's try a different value:
|
|
```
|
|
~tasfyn-partyv:dojo> 0x2a
|
|
```
|
|
You'll see:
|
|
```
|
|
> 0x2a
|
|
0x2a
|
|
```
|
|
`42` and `0x2a` are actually *the same noun*, because they're the
|
|
same number. But we don't just have the noun to print - we have
|
|
a `[span noun]` cell (sometimes called a `vase`).
|
|
|
|
As you recall, a span defines a set of nouns and a semantic
|
|
interpretation. As sets, both spans here are "any number". But
|
|
semantically, `42` has a decimal span and `0x2a` hexadecimal, so
|
|
they print differently.
|
|
|
|
{It's important to note that Hoon is a statically typed language.
|
|
We don't work with vases unless we're dynamically compiling code,
|
|
which is of course what we're doing here in the dojo. In Hoon,
|
|
dynamic type is static type plus runtime compilation.}
|
|
|
|
Let's make some cells. Try these on your own urbit:
|
|
```
|
|
~tasfyn-partyv:dojo> [42 0x2a]
|
|
~tasfyn-partyv:dojo> [42 [0x2a 420]]
|
|
~tasfyn-partyv:dojo> [42 0x2a 420]
|
|
```
|
|
We observe that cells associate right: `[a b c]` is just another
|
|
way of writing `[a [b c]]`.
|
|
|
|
{Lisp veterans beware: Hoon `[a b]` is Lisp `(a . b)`, Lisp
|
|
`(a b)` is Hoon `[a b ~]`(`~` represents nil, with a value of
|
|
atom `0`). Lisp and Hoon are both pair-oriented languages down
|
|
below, but Lisp has a layer of sugar that makes it look
|
|
list-oriented. Hoon loves its "improper lists," ie, tuples.}
|
|
|
|
## Looking at spans
|
|
|
|
What are these mysterious spans? We can see them with the `?`
|
|
prefix, which prints the span along with the result. Moving to
|
|
a more compact example format:
|
|
```
|
|
~tasfyn-partyv:dojo> ? 42
|
|
@ud
|
|
42
|
|
~tasfyn-partyv:dojo> ? 0x2a
|
|
@ux
|
|
0x2a
|
|
```
|
|
`@ud` and `@ux` stand for "unsigned decimal" and "unsigned hex,"
|
|
obviously.
|
|
|
|
{What is this span syntax? We only derive spans through
|
|
inference. So there's no parsing grammar for a span. We have to
|
|
be able to print spans, if only for debugging and diagnostics,
|
|
but the syntax is output-only. As in this case, it often looks
|
|
like the `mold` syntax, but the two are at opposite ends of the
|
|
type food chain.}
|
|
|
|
## Looking at spans, part 2
|
|
|
|
Good style in Hoon is concrete style. When a Hoon programmer
|
|
defines an abstract semantic value in terms of a noun, we rarely
|
|
put a conceptual layer of abstraction between value and noun. We
|
|
think of the semantic value as an interpretation of the
|
|
concrete noun, and often we just think of the noun.
|
|
|
|
With the `?` command, we *do* use an abstract layer, by printing
|
|
our span noun in a custom syntax. But we can also look at the
|
|
noun directly, with the `??` command.
|
|
|
|
{Spans are an exception to the concrete principle, because we use
|
|
"manual laziness" to define recursive structures. A recursive
|
|
span contains Hoon code which is evaluated to apply it. In
|
|
practice, it often contains the entire Urbit kernel, so you
|
|
wouldn't want to try to print it in the dojo. If you find
|
|
`??` taking a weirdly long time, this may have happened; just
|
|
press ^C.}
|
|
```
|
|
~tasfyn-partyv:dojo> ?? 42
|
|
[%atom %ud]
|
|
42
|
|
~tasfyn-partyv:dojo> ?? [42 0x2a]
|
|
[%cell [%atom %ud] [%atom %ux]]
|
|
[42 0x2a]
|
|
```
|
|
What is this `%atom` syntax? Is it a real noun? Can anyone
|
|
make one?
|
|
```
|
|
~tasfyn-partyv:dojo> %atom
|
|
%atom
|
|
~tasfyn-partyv:dojo> %foo
|
|
%foo
|
|
~tasfyn-partyv:dojo> [%foo %bar]
|
|
[%foo %bar]
|
|
```
|
|
What's the span of one of these symbols?
|
|
```
|
|
~tasfyn-partyv:dojo> ? %foo
|
|
%foo
|
|
%foo
|
|
~tasfyn-partyv:dojo> ?? %foo
|
|
[%cube 7.303.014 [%atom %tas]]
|
|
%foo
|
|
```
|
|
This takes a little bit of explaining. `7.303.014` is just the
|
|
Urbit (and German) way of writing the English number `7,303,014`,
|
|
or the Urbit hex number `0x6f.6f66`, or the string "foo" as an
|
|
unsigned integer with least-significant byte first.
|
|
|
|
A `%cube` span is a constant -- a set of one noun, the atom
|
|
`7.303.014`. But we still need to know how to print that noun.
|
|
In this case, it's an `[%atom %tas]`, ie, a text symbol.
|
|
|
|
Cubes don't have to be symbols -- in fact, we can take the
|
|
numbers we've just been using, and make them constants:
|
|
```
|
|
~tasfyn-partyv:dojo> %42
|
|
%42
|
|
~tasfyn-partyv:dojo> ? %42
|
|
%42
|
|
%42
|
|
~tasfyn-partyv:dojo> ?? %42
|
|
[%cube 42 [%atom %ud]]
|
|
%42
|
|
```
|
|
|
|
## Our first mold
|
|
|
|
After seeing a few span examples, are we ready to describe the
|
|
set of all spans with a Hoon mold? Well, no, but let's try it
|
|
anyway. Ignore the syntax (which we'll explain later; this is a
|
|
tutorial, not a reference manual), and you'll get the idea:
|
|
```
|
|
++ span
|
|
$% [%atom p=@tas]
|
|
[%cell p=span q=span]
|
|
[%cube p=* q=span]
|
|
==
|
|
```
|
|
This mold is not the entire definition of `span`, just the cases
|
|
we've seen so far. In English, a valid span is either:
|
|
|
|
- a cell with head `%atom`, and tail some symbol.
|
|
- a cell with head `%cell`, and tail some pair of spans.
|
|
- a cell with head `%cube`, and tail a noun-span pair.
|
|
|
|
The head of a span is essentially the tag in a variant record,
|
|
a pattern every programming language has. To use the noun, we
|
|
look at the head and then decide what to do with the tail.
|
|
|
|
{A conventional naming strategy for simple, self-explaining
|
|
structures is to name the legs of a tuple `p`, `q`, `r`, `s` and
|
|
`t`. If you get all the way to `t`, your noun is probably not
|
|
simple or self-explaining; meaningful names are recommended.}
|