added some extremely rough hoon reference

This commit is contained in:
johncburnham 2014-08-21 14:09:52 -07:00
parent 87f8bc901f
commit 5222da66ad
6 changed files with 3141 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,54 @@
Morphology
==========
Hoon is a statically typed language that compiles to Nock.
A type is a function whose domain is the set of all nouns and whose range is the set of all nouns that are members of that type.
The compilation process is as follows:
First, a runic expression is parsed into an abstact syntax-tree, called a `twig`
expression => twig
A subject type is generated from the twig. This type describes the subject
of the Nock formula that the twig compiles to.
twig => [subject-type twig]
The twig is then compiled into nock formula, and the type of the product of
the formula is inferred.
[subject-type twig] => [product-type nock-formula]
As long as subject-type is a correct description of some subject, you can
take any twig and compile it against subject-type, producing a formula such
that
*[subject formula]
is a product correctly described by product-type.
This works well enough that in Hoon there is no direct syntax for defining or
declaring a type. There is only a syntax for constructing twigs. Types are
always produced by inference.
Let's look at a simple example of the above proc
Hoon has 120 [XX count] digraph runes. The choice of glyph is not random. The
first defines a semantic category (with some exceptions). These categories are:
| bar core construction
$ buc tiles and tiling
% cen invocations
: col tuples
. dot nock operators
^ ket type conversions
; sem miscellaneous macros
~ sig hints
= tis compositions
? wut conditionals, booleans, tests
! zap special operations

View File

@ -0,0 +1,100 @@
Orthography: Consensus Aesthetic
==========
The Hoon compiler enforces the syntactical correctness of the language, it does
not, with some exceptions, enforce aesthetic standards. Many different styles
of Hoon are possible. However, given Hoon's runic syntax, it is remarkably easy
for the novice programmer to generate idiosyncratic illegible code. Many other
languages that make heavy use of ASCII have a similar problem. Furthermore,
collaborative programming is made vastly easier by using a standard style
convention.
The Urbit source is written in a style of Hoon called the Consensus Aesthetic.
No patches to the Urbit source will be accepted unless they follow the ConsensusAesthetic.
The general rules of the Consensus Aesthetic are the following:
Character Restriction
---------------------
The horizontal tab character, \ht, ASCII 0x9, must never occur. This is
enforced by the compiler.
Line and Comments
-----------------
Lines must not exceed 80 columns in width and should not exceed 55 columns.
Blank lines (lines consisting entirely of whitespace) should not occur. For
visual separation of code, use empty comments.
Comments may appear on column 0, column 57 or inline at the same level of
indentation as the code.
Indentation
-----------
Aesthetically, the act of programming is the act of formatting a big wall of
text. This canvas has a curious but essential property - it is indefinitely
tall, but finitely wide. The programmer's task as a visual designer is to
persuade code to flow down, not across.
The first law of Hoon indentation style is that all tall indentation is in
two-space increments. Single spaces are for wide only.
The second law of Hoon indentation is that everything in the kernel is good
indentation style. Or at least if it's not, it needs changed.
The third and most important law of Hoon indentation is that large twigs should
flow down and not across. Longer twigs should occur below shorter ones. Hoon
has several runes designed specifically to aid this task
The right margin is a precious resource not to be wasted. It's this law, when
properly applied, that makes casual readers wonder if Hoon is a functional
language at all. It doesn't have a program counter, but it looks like it does -
at least when written right.
Naming Convention
-----------------
Names must follow one of the following naming conventions: Austere, Lapidary,
or Freehand.
In Austere Hoon, variables and arguments are named alphabetically with one
letter, a, b, c etc, in strict order of appearance in the text. This scheme is
only useful in the case of extremely regular and straightforward namespaces:
very short functions, for instance.
Austere arms must be gates or trays. Gate arms are three letters and try to
carry some mnemonic significance - for instance, ++dec. Tray arms are two
letters and try to resemble pronouns - for instance, ++by.
Austere structures must be short tuples, no wider than 5. The legs are named p,
q, r, s and/or t.
Conventional recursive structures use other standard names. The head of a list
is always i, the tail is always t. In a binary tree of nodes, the node is n,
the children l and r.
When in doubt, do not use Austere Hoon. In an ordinary context - not least
because Austere gates are easily mistaken for Lapidary variables - there should
be as few Austere arms as possible. And always remind yourself that Austere
Hoon makes it as hard as possible to refactor your code.
Lapidary Hoon is the ordinary style of most of Hoon and Arvo. In lapidary mode,
variables, arguments, attributes, etc, are three-letter strings, usually
consonant-vowel-consonant, generally meaningless. If the same string is used
more than once in the same file, it should be used for the same concept in some
sense, as often happens spontaneously in cutting and pasting. It would be nice
to have an editor with a macro that generated random unique TLV strings
automatically.
Lapidary arms are always four letters. They may or may not be English words,
which may or may not mean anything relevant.
In Freehand Hoon, do whatever you want. Note that while uppercase is not
permitted in a symbol, - is, suggesting a generally Lisp-like state of gross
hyphenated disorder. F-mode is best used for top-layer software which nothing
else is based on; prototyping and casual coding; etc. Freehand Hoon is not an acceptable style for any code in the Urbit source proper, and is discouraged for production applications.

View File

@ -0,0 +1,85 @@
Philosophy
==========
Hoon is a higher-order typed functional language that it compiles itself to
Nock in 3400 lines of Hoon. If this number is accurate (it is), Hoon is very
expressive, or very simple, or both. (It's both.) The bad news is that it
really has nothing at all in common, either syntactically or semantically, with
anything you've used before.
By understanding Nock tutorial, you've actually come closer than you realize to
knowing Hoon. Hoon is actually not much more than a fancy wrapper around Nock.
People who know C can think of Hoon as the C to Urbit's Nock - just a
sprinkling of syntax, wrapped around machine code and memory.
For instance, it's easy to imagine how instead of calculating tree axes by
hand, we could actually assign names to different parts of the tree - and those
names would stay the same as we pushed more data on the subject.
The way we're going to do this is by associating something called a type with
the subject. You may have heard of types before. Technically, Hoon is a
statically typed language, which just means that the type isn't a part of your
program: it's just a piece of data the compiler keeps around as it turns your
Hoon into Nock.
A lot of other languages use dynamic types, in which the type of a value is
carried along with the data as you use it. Even languages like Lisp, which are
nominally typeless, look rather typed from the Hoon perspective. For example, a
Lisp atom knows dynamically whether it's a symbol or an integer. A Hoon atom is
just a Nock atom, which is just a number. So without a static type, Hoon
doesn't even know how to print an atom properly.
Most higher-order typed languages, Haskell and ML being prominent examples, use
something called the Hindley-Milner unification algorithm. Hoon uses its own
special sauce instead.
Why? There are two obvious problems with Hindley-Milner as a functional type
system, the main one being the wall of heavy mathematics that greets you
instantly when you google it. We have heard some claims that Hindley-Milner is
actually quite simple. We urge all such claimants to hie themselves to its
Wikipedia page, which they'll surely be able to relieve of its present alarming
resemblance to some string-theory paper in Physics Review D.
Nor is this in any way an an anti-academic stance. Quite the contrary.
Frankly, OS guys really quite seldom find themselves in the math-department
lounge, cadging stray grants by shamelessly misrepresenting the CAP theorem as
a result in mathematics. It doesn't seem too much to expect the mathematicians
to reciprocate this basic academic courtesy.
Furthermore, besides the drawback that Hindley-Milner reeks of math and
programmers who love math are about as common as cats who love a bath - a
problem, but really only a marketing problem - Hindley-Milner has a genuine
product problem as well. It's too powerful.
Specifically, Hindley-Milner reasons both forward with evaluation, and backward
from constraints. Pretty unavoidable in any sort of unification algorithm,
obviously. But since the compiler has to think both forward and backward, and
the programmer has to predict what the compiler will do, the programmer has to
think backward as well.
Hoon's philosophy is that a language is a UI for programmers, and the basic
test of a UI is to be easy to use. It is impossible (for most programmers) to
learn a language properly unless they know what the compiler is doing, which in
practice means mentally stepping through the algorithms it uses (with the
exception of semantically neutral optimizations). Haskell is a hard language to
learn (for most programmers) because it's hard (for most programmers) to follow
what the Haskell compiler is thinking.
It's true that some programmers have an effective mathematical intuition that
let them "see" algorithms without working through them step by step. But this
is a rare talent, we feel. And even those who have a talent don't always enjoy
exercising it.
If a thorough understanding of any language demands high-grade mathematical
intuition in its programmers, the language as a UI is like a doorway that makes
you duck if you're over 6 feet tall. The only reason to build such a doorway in
your castle is if you and all your friends are short, and only your enemies are
tall. Is this really the case here?
Although an inference algorithm that reasons only forward must and does require
a few more annotations from the programmer, the small extra burden on her
fingers is more than offset by the lighter load on her hippocampus.
Furthermore, programs also exist to be read. The modern code monkey is above
all things a replaceable part, and some of these annotations (which a smarter
algorithm might infer by steam) may annoy the actual author of the code but be
a lifesaver for her replacement.

View File

@ -0,0 +1,226 @@
Phonology
=========
Glyphs
------
Hoon is a keyword-free language - any alphanumeric text in the program is part
of the program. Where other languages have reserved words, Hoon syntax uses
ASCII symbols, or glyphs. In normal English, many of these glyphs have
cumbersome multisyllabic names. As Hoon uses these glyphs heavily, it has its
own, more concise, naming scheme for them:
ace space gal < per )
bar | gar > sel [
bas \ hax # sem ;
buc $ hep - ser ]
cab _ kel { sig ~
cen % ker } soq '
col : ket ^ tar *
com , lus + tec `
doq " pam & tis =
dot . pat @ wut ?
fas / pel ( zap !
A language is meant to be spoken. Even a programming language. Studies have
shown that even when we read silently, we activate the motor cortex that
controls our vocal cords. Even if we never speak these symbols, they're easier
to think if bound to simple sounds.
Mnemonic aids for memorizing the above glyphs can be found in the comments of section 2eF of the Urbit Source, which is reprinted here:
```
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: section 2eF, parsing (ascii) ::
::
++ ace (just ' ') :: spACE
++ bar (just '|') :: vertical BAR
++ bas (just '\\') :: Back Slash (escaped)
++ buc (just '$') :: dollars BUCks
++ cab (just '_') :: CABoose
++ cen (just '%') :: perCENt
++ col (just ':') :: COLon
++ com (just ',') :: COMma
++ doq (just '"') :: Double Quote
++ dot (just '.') :: dot dot dot ...
++ fas (just '/') :: Forward Slash
++ gal (just '<') :: Greater Left
++ gar (just '>') :: Greater Right
++ hax (just '#') :: Hash
++ kel (just '{') :: Curly Left
++ ker (just '}') :: Curly Right
++ ket (just '^') :: CareT
++ lus (just '+') :: pLUS
++ hep (just '-') :: HyPhen
++ pel (just '(') :: Paren Left
++ pam (just '&') :: AMPersand pampersand
++ per (just ')') :: Paren Right
++ pat (just '@') :: AT pat
++ sel (just '[') :: Square Left
++ sem (just ';') :: SEMicolon
++ ser (just ']') :: Square Right
++ sig (just '~') :: SIGnature squiggle
++ soq (just '\'') :: Single Quote
++ tar (just '*') :: sTAR
++ tec (just '`') :: backTiCk
++ tis (just '=') :: 'tis tis, it is
++ wut (just '?') :: wut, what?
++ zap (just '!') :: zap! bang! crash!!
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
```
Digraph Glyphs: Runes
--------------------
The fundamental building block of Hoon is the digraph glyph or rune. TThe choice of glyph is not random. The first defines a semantic category. That is, all runes whose first glyph is `|` or `bar` are conceptually related. See Morphology for details.
To pronounce a rune, concatenate the glyph names, stressing the first syllable
and softening the second vowel into a "schwa." Hence, to say `~&`, say
"sigpam." To say `|=`, say "bartis."
Punctuation Runes
----------------
The following runes are used as punctuation in Tall Form Hoon (See Syntax for details) and have mandatory special pronunciation:
-- hephep phep
+- lushep slep
++ luslus slus
== tistis stet
Wing Runes
---------
The following runes are used to access specific axes or wings in a noun. See Morphology. They have optional alternate phonology.
+< lusgal glus
+> lusgar gras
-< hepgal gelp
-> hepgar garp
Tile Runes
---------
The following runes comprise the set of "Tile Runes" and are used to generate
complex types (See Morphology for details). They have an optional alternate
phonology, which describes the tile they generate:
$% buccen kelp
$^ bucket herb
$: buccol tile
$= buctis bark
$& bucpam bush
$? bucwut fern
$| bucbar reed
The following glyphs are not runes, but are commonly used with tile runes to specify basic types. (See Morphology for details). In context, they have an optional alternate phonology:
@ "atom"
^ "cell"
* "noun"
? "bean"
~ "null"
Irregular Runes
--------------
The following glyphs have optional special pronunciation when they appear as
the irregular form as certain digraph runes. It is perfectly acceptable to
pronounce the characters, but some may find the alternate phonology useful,
especially in cases where multiple irregular forms occur in sequence.
Irregular Regular Pronunciation
,p $,(p) "clam p"
_p $_(p) "bunt p"
p@q $@(p q) "whip p into q"
!p ?!(p) "NOT p"
&(p q) ?&(p q) "AND p q
|(p q) ?|(p q) "OR p q"
?(p q) $?(p q) "fern p q"
`p`q ^-(p q) "cast p q"
p=q ^=(p q) "p is q"
~[p q] :~(a b) "list p q"
`[p] [~ p] "unit p"
p^q [p q] "cell p q"
[p q] :*(p) "cell p q"
+(p) .+(p) "bump p"
=(p q) .=(p q) "equals p q"
p:q =<(p q) "p of q"
p(q r) %=(p q r) "toss p q r"
(p q) %-(p q) "slam p q"
~(p q r) %~(p q r) "slug p q r"
Nouns
-----
Some nouns also have an alternate phonology:
& "yes"
%& "yes"
%.y "yes"
| "no"
%| "no"
%.n "no"
42 "forty-two"
0i42 "dec four two"
0x2e "hex two e"
0b10 "bin one zero"
0v3t "base thirty two three t"
0wA4 "base sixty-four big a four"
'foo' "cord foo"
"foo" "tape foo"
Example
-------
Take the following snippet of Hoon:
++ dec :: decrement
~/ %dec
|= a=@
~| %decrement-underflow
?< =(0 a)
=+ b=0
|- ^- @
?: =(a +(b))
b
$(b +(b))
Omitting the spaces and comments (which only a real purist would include), the
above is pronounced:
slus dec
sigfas cen-dec
bartis A tis pat
sigbar cen-decrement-underflow
wutgal tis zero A
tislus B tis zero
barhep kethep pat
wutcol tis A lus B
B
buc B lus B
Or using the alternate phonology:
slus dec
sigfas cen-dec
bartis A is atom
sigbar cen-decrement-underflow
wutgal equals zero A
tislus B is zero
barhep kethep atom
wutcol equals A lus B
B
buc B lus B
Which is very similar. The alternate phonology exists as a result of common
speech patterns observed amongst Hoon programmers in the wild. In any language
actually spoken by actual humans, laziness soon rounds off any rough edges.

View File

@ -0,0 +1,130 @@
Syntax
======
Syntax: Twigs
------------
A twig is an abstract syntax tree (or AST). Everything the Hoon programmer types is parsed into a twig.
##Noun
All constant data in Hoon is
%dtzy
%dtzz
##Runes
Tall forms
When all is said and done, the programmer is formatting a big wall of text. This canvas has a curious but essential property - it is indefinitely tall, but finitely wide. We strongly encourage an 80-column standard.
So the programmer's task as a visual designer is to persuade her code to flow down, not across. The usual way to lay out a tree which does not fit on one line is to indent the subtrees and enclose them in parens, brackets or braces. Which might look like this (not Hoon syntax):
?: {
&
47
52
}
Hoon, like other functional languages, has very deep expression trees. In this simple, classic syntax model, a functional language develops huge piles of closing parens at the end of large blocks, which is manageable but ugly. Less manageably, as each subtree is indented to the right, the width of the text window bounds the depth of the expression tree.
Other languages skip the braces and parse whitespace, using indentation to express tree depth. This actually is valid (but ugly) Hoon:
?:
&
47
52
This gets rid of the terminator problem, but keeps the width problem. And parsing whitespace is horrible. Whitespace in Hoon is not significant, though its presence or absence is. (Note also that hard TAB characters are zutiefst verboten).
Hoon notices a couple of things about this problem. First, most Hoon twigs have small constant fanout. A parser shouldn't need either significant whitespace or a terminator to figure out how many twigs follow ?: - the answer is always 3.
Second, our goal is to descend into a deep tree without losing right margin. With the backstep pattern
?: &
47
52
or
=+ a=3
b
we step two spaces backward at each subtwig, till the last one is at the same indentation as its parent.
This preserves your right margin in one and only one case - where the bottom twig is the heaviest. For example, if we write
?: &
47
?: |
52
?: &
97
=+ 35
b
we see a tree that flows neatly down the screen. It's obviously much nicer than, say (not Hoon syntax):
?: {
&
47
?: {
|
52
?: {
&
97
=+ {
35
b
}
}
}
}
or any similar abortion. But its downward flow depends on the coincidence of the bottom twig being the heavy one:
?: &
?: |
52
?: &
97
=+ 35
b
47
To handle this, Hoon has a reasonable selection of reverse hoons, which have the same semantics but inverse order. For instance, if ?: is "if," ?. (wutdot) is "unless":
?. &
47
?: |
52
?: &
97
=+ 35
b
Wide forms
Observe that in the tall syntax, there are always at least two spaces (or one newline) between tokens. Other than this, nothing requires anything to be tall. For instance, it is normal and only slightly aggressive to write:
?. & 47
?: | 52
?: & 97
=+ 35
b
But we could even go so far as:
?. & 47 ?: | 52 ?: & 97 =+ 35 b
Few would find this readable, which is why Hoon also has a wide syntax:
?.(& 47 ?:(| 52 ?:(& 97 =+(35 b))))
On a single line, the parentheses - while a parser could get away with skipping them - are needed to actually read the expression. The hoon attaches directly to the left paren (pel), and a double space or a newline is a syntax error.
The semantics of tall and wide syntax are identical, of course. The choice is entirely up to the programmer. Some languages can be formatted automatically - turning an abstract syntax tree into a tall, handsome Hoon file is an art form. We won't say a program could never do it - but it'd be work.
Wide forms are also nice because our immature and incomplete command-line shell can't process multi-line input.
Irregular forms
For a very large set of primitives, neither tall nor wide form is tight enough. If you go to ++scat in hoon.hoon, you can see them all, organized by initial character.
This isn't the place to go over the irregular forms directly - we'll introduce them when we talk about individual runes, or when we run into them and we can't go around.