urbit/pub/docs/dev/hoon/tutorial/2-syntax.md

393 lines
14 KiB
Markdown
Raw Normal View History

2015-09-30 00:36:58 +03:00
# Hoon 2: serious syntax
We've done a bunch of fun stuff on the command line. We know our
nouns. It's time to actually write some serious code -- in a
real source file.
## Building a simple generator
In Urbit there's a variety of source file roles, distinguished by
the magic paths they're loaded from: `/gen` for generators,
`/ape` for appliances, `/fab` for renderers, etc.
We'll start with a generator, the simplest kind of Urbit program.
### Create a sandbox desk
A desk is the Urbit equivalent of a `git` branch. We're just
playing around here and don't intend to soil our `%home` desk with
test files, so let's make a sandbox:
```
|merge %sandbox our %home
```
### Mount the sandbox
Your Urbit pier is in `~/tasfyn-partyv`, or at least mine is.
So we can get our code into Urbit, run the command
```
~tasfyn-partyv:dojo> |mount /=sandbox=/gen %gen
```
mounts the `/gen` folder from the `%sandbox` desk in your Unix
directory `~/tasfyn-partyv/gen`. The mount is a two-way sync,
like your Dropbox. When you edit a Unix file and save, your edit
is automatically committed as a change to `%sandbox`.
### Execute from the sandbox
The `%sandbox` desk obviously is merged from `%home`, so it
contains find all the default facilities you'd expect there.
Bear in mind, we didn't set it to auto-update when `%home`
is updated (that would be `|sync` instead of `|merge`).
So we're not roughing it when we set the dojo to load from
`%sandbox`:
```
[switch to %home]
```
### Write your builder
Let's build the simplest possible kind of generator, a builder.
With your favorite Unix text editor (there are Hoon modes for vim
and emacs), create the file `~/tasfyn-partyv/gen/test.hoon`.
Edit it into this:
```
:- %say |= * :- %noun
[%hello %world]
```
Get the spaces exactly right, please. Hoon is not in general a
whitespace-sensitive language, but the difference between one
space and two-or-more matters. And for the moment, think of
```
:- %say |= * :- %noun
```
as gibberish boilerplate at the start of a file, like `#include
"stdio.h"` at the start of a C program. Any of our old Hoon
constants would work in place of `[%hello %world`].
Now, run your builder:
```
~tasfyn-partyv:dojo/sandbox> +test
[%hello %world]
```
Obviously this is your first Hoon *program* per se.
## Hoon syntax 101
But what's up with this syntax?
### A syntactic apology
The relationship between ASCII and human programming languages
is like the relationship between the electric guitar and
rock-and-roll. If it doesn't have a guitar, it's not rock.
Some great rockers play three chords, like Johnny Ramone; some
shred it up, like Jimmy Page.
The two major families of ASCII-shredding languages are Perl and
the even more spectacular APL. (Using non-ASCII characters is
just a fail, but APL successors like J fixed this.) No one
has any right to rag on Larry Wall or Ken Iverson, but Hoon,
though it shreds, shreds very differently.
The philosophical case for a "metalhead" language is threefold.
One, human beings are much better at associating meaning with
symbols than they think they are. Two, a programming language is
a professional tool and not a plastic shovel for three-year-olds.
And three, the alternative to heavy metal is keywords. When you
use a keyword language, not only are you forcing the programmer
to tiptoe around a ridiculous maze of restricted words used and
reserved, you're expressing your program through two translation
steps: symbol->English and English->computation. When you shred,
you are going direct: symbol->computation. Especially in a pure
language, this creates a sense of "seeing the function" which no
keyword language can quite duplicate.
But any metalhead language you don't yet know is line noise.
Let's get you up to speed as fast as possible.
### A glyphic bestiary
A programming language needs to be not just read but said. But
no one wants to say "ampersand." Therefore, we've taken the
liberty of assigning three-letter names to all ASCII glyphs.
Some of these bindings are obvious and some aren't. You'll be
genuinely surprised at how easy they are to remember:
```
ace [1 space] dot . pan ]
bar | fas / pel )
bis \ gap [>1 space, nl] pid }
buc $ hax # ran >
cab _ ket ^ rep '
cen % lep ( sac ;
col : lit < tar *
com , lus + tec `
das - mat @ tis =
den " med & wut ?
dip { nap [ zap !
```
It's fun to confuse people by using these outside Urbit. A few
digraphs also have irregular sounds:
```
== stet
-- shed
++ slus
-> dart
-< dusk
+> lark
+< lush
```
### The shape of a twig
A twig, of course, is a noun. As usual, the easiest way to
explain both the syntax that compiles into that noun, and the
semantic meaning of the noun, is the noun's physical structure.
#### Autocons
A twig is always a cell, and any cell of twigs is a twig
producing a cell. As an homage to Lisp, we call this
"autocons." Where you'd write `(cons a b)` in Lisp, you write
`[a b]` in Hoon, and the shape of the twig follows.
The `???` prefix prints a twig as a noun instead of running it.
Let's see autocons in action:
```
~tasfyn-partyv:dojo/sandbox> ??? 42
[%dtzy %ud 42]
~tasfyn-partyv:dojo/sandbox> ??? 0x2a
[%dtzy %ux 42]
~tasfyn-partyv:dojo/sandbox> ??? [42 0xa]
[[%dtzy %ud 42] %dtzy %ux 42]
```
(As always, it may confuse *you* that this is the same noun as
`[[%dtzy %ud 42] [%dtzy %ux 42]]`, but it doesn't confuse Hoon.)
#### The stem-bulb pattern
If the head of your twig is a cell, it's an autocons. If the
head is an atom, it's an unpronounceable four-letter symbol like
the `%dtzy` above.
This is the same pattern as we see in the `span` mold -- a
variant record, essentially, in nouns. The head of one of these
cells is called the "stem." The tail is the "bulb." The shape
of the bulb is totally dependent on the value of the stem.
#### Runes and stems
A "rune" (a word intentionally chosen to annoy Go programmers) is
a digraph - a sequence of two ASCII glyphs. If you know C, you
know digraphs like `->` and `?:` and are used to reading them as
single characters.
2015-10-20 20:51:45 +03:00
In Hoon you can *say* them as words: "dasran" and "wattis"
respectively. In a metalhead language, if we had to say
2015-09-30 00:36:58 +03:00
"minus greater-than" and "question-colon", we'd just die.
Most twig stems are made from runes, by concatenating the glyph
names and removing the vowels. For example, the rune `=+`,
pronounced "tislus," becomes the stem `%tsls`. (Note that in
many noun implementations, this is a 31-bit direct value.)
(Some stems (like `%dtzy`) are not runes, simply because they
don't have regular-form syntax and don't need to use precious
ASCII real estate. They are otherwise no different.)
An important point to note about runes: they're organized. The
first glyph in the rune defines a category. For instance, runes
starting with `.` compute intrinsics; runes starting with `|`
produce cores; etc.
Another important point about runes: they come in two flavors,
"natural" (stems interpreted directly by the compiler) and
"synthetic" (macros, essentially).
(Language food fight warning: one advantage of Hoon over Lisp is
that all Hoon macros are inherently hygienic. Another advantage
is that Hoon has no (user-level) macros. In Hoon terms, nobody
gets to invent their own runes. A DSL is always and everywhere
a write-only language. Hoon shreds its ASCII pretty hard, but
the same squiggles mean the same things in everyone's code.)
#### Wide and tall regular forms
A good rune example is the simple rune `=+`, pronounced "tislus",
which becomes the stem `%tsls`. A `%tsls` twig has the shape
`[%tsls twig twig]`.
The very elegance of functional languages creates a visual
problem that imperative languages lack. An imperative language
has distinct statements (with side effects) and (usually pure)
expressions; it's natural that in most well-formatted code,
statements flow vertically down the screen, and expressions grow
horizontally across this. This interplay creates a natural and
relaxing shape on your screen.
In a functional language, there's no difference. The trivial
functional syntax is Lisp's, which has two major problems. One:
piles of expression terminators build up at the bottom of complex
functions. Two: the natural shape of code is diagonal. The more
complex a function, the more it wants to besiege the right
margin. The children of a node have to start to the right of its
parent, so the right margin bounds the tree depth.
Hoon does not completely solve these problems, but alleviates
them. In Hoon, there are actually two regular syntax forms for
most twig cases: "tall" and "wide" form. Tall twigs can contain
wide twigs, but not vice versa, so the visual shape of a program
is very like that of a statements-and-expressions language.
Also, in tall mode, most runes don't need terminators. Take
`=+`, for example. Since the parser knows to expect exactly
two twigs after the `=+` rune, it doesn't need any extra syntax
to tell it that it's done.
Let's try a wide `=+` in the dojo:
```
~tasfyn-partyv:dojo/sandbox> =+(planet=%world [%hello planet])
[%hello %world]
```
(`=+` seems to be some sort of variable declaration? Let's not
worry about it right now. We're on syntax.)
The wide syntax for a `=+` twig, or any binary rune: `(`, the
first subtwig, one space, the second subtwig, and `)`). To read
this twig out loud, you'd say:
```
tislus lap planet is cen world ace nep cen hello ace planet pen
pal
```
("tis" not in a rune gets contracted to "is".)
Let's try a tall `=+` in `test.hoon`:
```
:- %say |= * :- %noun
=+ planet=%world
[%hello planet]
```
The tall syntax for a `=+` twig, or any binary rune: the rune, at
least two spaces or one newline, the first subtwig, at least two
spaces or one newline, the second subtwig. Again, tall subtwigs
can be tall or wide; wide subtwigs have to be wide.
(Note that our boilerplate line is a bunch of tall runes on one
line, with two-space gaps. This is unusual but quite legal, and
not to be confused with the actual wide form.)
To read this twig out loud, you'd say:
```
tislus gap planet is cen world gap nep cen hello ace planet pen
```
#### Layout conventions
Should you use wide twigs or tall twigs? When? How? What
should your code look like? You're the artist. Except for the
difference between one space (`ace`) and more space (`gap`), the
parser doesn't care how you format your code. Hoon is not Go --
there are no fixed rules for doing it right.
However, the universal convention is to keep lines under 80
characters. Also, hard tab characters are illegal. And when in
doubt, make your code look like the kernel code.
##### Backstep indentation
Note that the "variable declaration" concept of `=+` (which is no
more a variable declaration than a Tasmanian tiger is a tiger)
works perfectly here. Because `[%hello planet]` -- despite being
a subtree of the the `=+` twig -- is at the same indent level.
So our code flows down the screen, not down and to the right, and
of course there are no superfluous terminators. It looks good,
and creates fewer hard-to-find syntax errors than you'd think.
This is called "backstep" indentation. Another example, using a
ternary rune that has a strange resemblance to C:
```
:- %say |= * :- %noun
=+ planet=%world
?: =(%world planet)
[%hello planet]
[%goodbye planet]
```
It's not always the case when backstepping that the largest
subtwig is at the bottom and loses no margin, but it often is.
And not all runes have tuple structure; some are n-ary, and use
the `==` terminator (again, pronounced "stet"):
```
:- %say |= * :- %noun
=+ planet=%world
?+ planet
[%unknown planet]
%world [%hello planet]
%ocean [%goodbye planet]
==
```
So we occasionally lose right-margin as we descend a deep twig.
But we can keep this lossage low with good layout design. The
goal is to keep the heavy twigs on the right, and Hoon tries as
hard as possible to help you with this.
For instance, `=+` ("tislus") is a binary rune: `=+(a b)`. In
most cases of `=+` the heavy twig is `b`, but sometimes it's `a`.
So we can use its friend the `=-` rune ("tisdas") to get the same
semantics with the right shape: `=-(b a)`.
#### Irregular forms
There are more regular forms than we've shown above, but not a
lot more. Hoon would be quite easy to learn if it was only its
regular forms. It wouldn't be as easy to read or use, though.
The learning curve is important, but not all-important.
Some stems (like the `%dtzy` constants above) obviously don't and
can't have any kind of regular form (which is why `%dtzy` is not
a real digraph rune). Many of the true runes have only regular
forms. But some have irregular forms. Irregular forms are
always wide, but there is no other constraint on their syntax.
We've already encountered one of the irregular forms: `foo=42`
from the last chapter, and `planet=%world` here. Let's unpack
this twig:
```
~tasfyn-partyv:dojo/sandbox> ?? %world
[%cube 431.316.168.567 %atom %tas]
%world
~tasfyn-partyv:dojo/sandbox> ??? %world
[%dtzz %tas 431.316.168.567]
```
Clearly, `%dtzz` is one of our non-regulars. But we can wrap it
with our irregular form:
```
~tasfyn-partyv:dojo/sandbox> ?? planet=%world
[%face %planet [%cube 431.316.168.567 %atom %tas]]
planet=%world
~tasfyn-partyv:dojo/sandbox> ??? planet=%world
[%ktts %planet %dtzz %tas 431.316.168.567]
```
Since `%ktts` is "kettis", ie, `^=`, this has to be the irregular
form of
```
~tasfyn-partyv:dojo/sandbox> ^=(planet %world)
planet=world
```
So if we wrote our example without this irregular form, it'd be
```
:- %say |= * :- %noun
=+ ^=(planet %world)
[%hello planet]
```
Or with a gratuitous use of tall form:
```
:- %say |= * :- %noun
=+ ^= planet %world
[%hello planet]
```
Now you know how to read Hoon! For fun, try to pronounce more of
the code on this page. Please don't laugh too hard at yourself.