Some more doc

This commit is contained in:
johncburnham 2014-08-26 10:19:08 -07:00
parent b180f779ea
commit a8931ecb19
5 changed files with 8474 additions and 172 deletions

View File

@ -0,0 +1,553 @@
Lexicon: Nouns
=======
Atom Syntax
----------
###Canonical Atom Odors
An odor is an atom format that specifies an atomic subtype.
```
@c UTF-32 codepoint
@d date
@da absolute date
@dr relative date (ie, timespan)
@f yes or no (inverse boolean)
@n nil
@p phonemic base
@r IEEE floating-point
@rd double precision (64 bits)
@rh half precision (16 bits)
@rq quad precision (128 bits)
@rs single precision (32 bits)
@s signed integer, sign bit low
@sb signed binary
@sd signed decimal
@sv signed base32
@sw signed base64
@sx signed hexadecimal
@t UTF-8 text (cord)
@ta ASCII text (span)
@tas ASCII symbol (term)
@u unsigned integer
@ub unsigned binary
@ud unsigned decimal
@uv unsigned base32
@uw unsigned base64
@ux unsigned hexadecimal
```
###Unsigned decimal, @ud
Hoon's unsigned decimal format is the normal Continental syntax. It differs
from the Anglo-American only in the use of periods, rather than commas, between
groups of 3:
~zod/try=> 19
19
~zod/try=> 1.024
1.024
An unsigned decimal not broken into groups is a syntax error. Also, whitespace
or even linebreaks can appear between the dot and the next group.
~zod/try=> 65. 536
65.536
###Unsigned hexadecimal, @ux
@ux has the same syntax as @ud, except that it's prefixed by 0x and uses groups
of four. Hex digits are lowercase only.
~zod/try=> 0x0
0x0
~zod/try=> `@ud`0x17
23
###Unsigned base64 and base32, @uv, @uw
The prefix is 0w for base64 and 0v for base32. The digits for @uw are, in
order: 0-9, a-z, A-Z, -, ~:
~zod/try=> `@ud`0w-
62
For @uv, the digits are 0-9, a-v.
Signed integers, @sd, @sx, @sw, @sv, @sb
Obviously, without finite-sized integers, the sign extension trick does not
work. A signed integer in Hoon is a different way to use atoms than an unsigned
integer; even for positive numbers, the signed integer cannot equal the
unsigned.
The prefix for a negative signed integer is a single - before the unsigned
syntax. The prefix for a positive signed integer is --. The sign bit is the low
bit:
~zod/try=> -1
-1
~zod/try=> --1
--1
~zod/try=> `@ud`-1
1
~zod/try=> `@ud`--1
2
~zod/try=> `@ud`-2
3
~zod/try=> `@ud`--2
4
~zod/try=> `@ux`-0x10
0x1f
~zod/try=> `@ux`--0x10
0x20
~zod/try=> `@ud`--0w-
124
~zod/try=> `@sw`124
--0w-
###Absolute date, @da
Urbit dates represent 128-bit chronological time, with 2^64 seconds from the
start of the universe to the end. 2^127 is 3:30:08 PM on December 5, AD 226,
for reasons not clear or relevant:
~zod/try=> `@da`(bex 127)
~226.12.5..15.30.08
~zod/try=> `@da`(dec (bex 127))
~226.12.5..15.30.07..ffff.ffff.ffff.ffff
The time of day and/or second fragment is optional:
~zod/try=> `@ux`~2013.12.7
0x8000.000d.2140.7280.0000.0000.0000.0000
~zod/try=> `@ux`~2013.12.7..15.30.07
0x8000.000d.2141.4c7f.0000.0000.0000.0000
~zod/try=> `@ux`~2013.12.7..15.30.07..1234
0x8000.000d.2141.4c7f.1234.0000.0000.0000
We also do BC:
~zod/try=> `@ux`~226-.12.5
0x7fff.fffc.afb1.b800.0000.0000.0000.0000
The semantics of the time system are that UGT (Urbit Galactic Time) is GMT/UTC
as of leap second 25. UGT is chronological and will never add leap seconds,
even if UTC continues this mistake. If a gap appears, it must be resolved in
the presentation layer, with timezones and other human curiosities.
See section 2cH of the source for more details.
###Relative date, @dr
It's also nice to have a syntax for basic time intervals:
~zod/try=> `@ux`~s1
0x1.0000.0000.0000.0000
~zod/try=> `@ux`~m1
0x3c.0000.0000.0000.0000
~zod/try=> (div ~m1 ~s1)
60
~zod/try=> (div ~h1 ~m1)
60
~zod/try=> (div ~h1 ~s1)
3.600
~zod/try=> (div ~d1 ~h1)
24
~zod/try=> `@da`(add ~2013.11.30 ~d1)
~2013.12.1
There are no @dr intervals under a second or over a day. Since the resolution
is so high, though, (div ~s1 1.000.000) produces a pretty accurate microsecond.
###Loobean, @f
A loobean, or just bean, is 0 or 1. 0 is yes, 1 is no:
~zod/try=> `@ud`.y
0
~zod/try=> `@ud`.n
1
###Nil, @n
Nil indicates an absence of information, as in a list terminator. The only
value is ~, 0.
~zod/try=> `@ud`~
0
###Unicode text, @t
@t is a sequence of UTF-8 bytes, LSB first - sometimes called a cord. For
lowercase numbers and letters, the canonical syntax is ~~text:
~zod/try=> ~~foo
'foo'
Note that the prettyprinter makes an unprincipled exception and prints the text
in a noncanonical format:
~zod/try=> `@ux`~~foo
0x6f.6f66
We want to be able to encode an arbitrary Unicode string as a single URL-safe
token, using no punctuation but .~-, in @t. Space is ., . is ~., ~ is ~~, - is
-:
~zod/try=> ~~foo.bar
'foo bar'
~zod/try=> ~~foo.bar~.baz~~moo-hoo
'foo bar.baz~moo-hoo'
For all other ASCII/Unicode characters, insert the Unicode codepoint in
lower-case hexadecimal, followed by .. For example, for U+2605 "BLACK STAR",
write:
~zod/try=> ~~foo~2605.bar
'foo★bar'
This UTF-32 codepoint is of course converted to UTF-8:
~zod/try=> `@ux`~~foo~2605.bar
0x72.6162.8598.e26f.6f66
###URL-safe ASCII text, @ta
@ta encodes the ASCII subset that all canonical atom syntaxes restrict
themselves to. The prefix is ~.. There are no escape sequences except ~~, which
means ~, and ~-, which means \_. - and . encode themselves. No other characters
besides numbers and lowercase letters need apply.
~zod/try=> `@t`~.foo
'foo'
~zod/try=> `@t`~.foo.bar
'foo.bar'
~zod/try=> `@t`~.foo~~bar
'foo~bar'
~zod/try=> `@t`~.foo~-bar
'foo_bar'
~zod/try=> `@t`~.foo-bar
'foo-bar'
A @ta atom is called a span.
###Codepoint, @c
Normally when we build atoms of Unicode text, we use a UTF-8 bytestream, LSB
first. But sometimes it's useful to build atoms of one or more UTF-32 words.
The codepoint syntax is the same as @t, except with a ~- prefix. Let's repeat
our examples, with hex display:
~zod/try=> `@ux`~-foo
0x6f.0000.006f.0000.0066
~zod/try=> `@ux`~-foo.bar
0x72.0000.0061.0000.0062.0000.0020.0000.006f.0000.006f.0000.0066
###Phonemic, @p
We've seen @p used for ships, of course. But it's not just for ships - it's for
any short number optimized for memorability, not for arithmetic. @p is great
for checksums, for instance.
That said, @p is subtly customized for the sociopolitical design of Urbit as a
digital republic. For example, one feature we don't want is the ability to see
at a glance which carrier and cruiser issued a destroyer. Consider the carrier
0x21:
~zod/try=> `@p`0x21
~mep
It issues 255 cruisers, including 0x4321:
~zod/try=> `@p`0x4321
~pasnut
Which issues 65.535 destroyers, including 0x8765.4321 and several successors:
~zod/try=> `@p`0x8765.4321
~famsyr-dirwes
~zod/try=> `@p`0x8766.4321
~lidlug-maprec
~zod/try=> `@p`0x8767.4321
~tidlus-roplen
~zod/try=> `@p`0x8768.4321
~lisnel-lonbet
Of course, anyone who can juggle bits can see that ~famsyr-dirwes is a close
cousin of ~lidlug-maprec. But she actually has to juggle bits to do it.
Obfuscation does not prevent calculated associations, just automatic ones.
But at the yacht level, we actually want to see a uniform 32-bit space of
yachts directly associated with the destroyer:
~zod/try=> `@p`0x9.8765.4321
~talfes-sibzod-famsyr-dirwes
~zod/try=> `@p`0xba9.8765.4321
~tacbep-ronreg-famsyr-dirwes
~zod/try=> `@p`0xd.cba9.8765.4321
~bicsub-ritbyt-famsyr-dirwes
~zod/try=> `@p`0xfed.cba9.8765.4321
~sivrep-hadfeb-famsyr-dirwes
###IPv4 and IPv6 addresses, @if, @is
Urbit lives atop IP and would be very foolish to not support a syntax for the
large atoms that are IPv4 and IPv6 addresses.
@if is the standard IPv4 syntax, prefixed with .:
~zod/try=> `@ux`.127.0.0.1
0x7f00.0001
@is is the same as @if, but with 8 groups of 4 hex digits:
~zod/try=> `@ux`.dead.beef.0.cafe.42.babe.dead.beef
0xdead.beef.0000.cafe.0042.babe.dead.beef
###Floating Point, @rs, @rd, @rq, @rh
The syntax for a single-precision float is the normal English syntax, with a . prefix:
.6.2832 :: τ as @rs
.-6.2832 :: -τ as @rs
.~6.2832 :: τ as @rd
.~-6.2832 :: -τ as @rd
.~~6.2832 :: τ as @rh
.~~~6.2832 :: τ as @rq
(Hoon is a Tauist language and promotes International Tau Day.)
###Transparent cell syntax
By adding _, we can encode arbitrary nouns in our safe subset. The prefix to a
canonical cell is ._; the separator is _; the terminator is __. Thus:
~zod/try=> ._3_4__
[3 4]
~zod/try=> :type; ._.127.0.0.1_._0x12_19___~tasfyn-partyv__
[.127.0.0.1 [0x12 19] ~tasfyn-partyv]
[@if [@ux @ud] @p]
Those who don't see utility in this strange feature have perhaps never needed
to jam a data structure into a URL.
###Opaque noun syntax
Speaking of jam, sometimes we really don't care what's inside our noun. Then,
the syntax to use is a variant of @uw prefixed by ~, which incorporates the
built-in jam and cue marshallers:
~zod/try=> (jam [3 4])
78.241
~zod/try=> `@uw`(jam [3 4])
0wj6x
~zod/try=> (cue 0wj6x)
[3 4]
~zod/try=> ~0wj6x
[3 4]
Noncanonical Syntax
--------------------
These are syntaxes for constants which don't fit the canonical character-set
constraints.
###Cubes, @tas
@tas, a term, is our most exclusive odor. The only characters permitted are
lowercase ASCII, - except as the first or last character, and 0-9 except as the
first character.
The syntax for @tas is the text itself, always preceded by %. This means a term
is always cubical. You can cast it to @tas if you like, but we just about
always want the cube:
~zod/try=> %dead-fish9
%dead-fish9
~zod/try=> -:!>(%dead-fish9)
[%cube p=271.101.667.197.767.630.546.276 q=[%atom p=%tas]]
The empty @tas has a special syntax, $:
~zod/try=> %$
%$
A term without % is not a constant, but a name:
~zod/try=> dead-fish9
! -find-limb.dead-fish9
! find-none
! exit
A common structure in Hoon is a noun with a cubical head and an arbitrary tail:
~zod/try=> [%foo 'bar']
[%foo 'bar']
This structure may be generated with the following irregular syntax:
~zod/try=> a/'bar'
[%a 'bar']
###Loobeans, @f
.y is a little cumbersome, so we can say & and |. The % prefix cubes as usual.
~zod/try=> `@ud`&
0
~zod/try=> `@ud`|
1
###Cords, @t
The canonical ~~ syntax for @t, while it has its place, is intolerable in a
number of ways - especially when it comes to escaping capitals. So @t is both
printed and parsed in a conventional-looking single-quote syntax:
~zod/try=> 'foo bar'
'foo bar'
~zod/try=> `@ux`'foo bar'
0x72.6162.206f.6f66
Escape ' with \:
~zod/try=> 'Foo \'bar'
'Foo \'bar'
~zod/try=> `@ux`'\''
0x27
###Strings
Text in Hoon is generally manipulated in two ways, depending on what you're
doing: as an atomic cord/span/term, or as a tape which is a list of bytes (not
codepoints).
To generate a tape, use double quotes:
~zod/try=> "foo"
"foo"
~zod/try=> `*`"foo"
[102 111 111 0]
We're getting off the constant reservation, but strings also interpolate with curly-braces:
~zod/try=> "hello {(weld "wor" "ld")} is a fun thing to say"
"hello world is a fun thing to say"
And they can be joined across space or lines with a .:
~zod/try=> "hello"."world"
"helloworld"
~zod/try=> "hello". "world"
"helloworld"
Lists
-----
A list in Hoon is a null-terminated tuple. See section 2bB for Hoon's List library.
~zod/try=> :type; `(list)`[3 4 ~]
~[3 4]
it(*)
The list type can be further specified by a subtype:
~zod/try=> :type; `(list ,@ud)`[3 4 ~]
~[3 4]
it(@ud)
The above example is a list of @ud, meaning the all values in the list must be of type @ud.
Polymorphic lists can be specified by constructing a more complex type:
~zod/try=> :type; `(list ?(@ud @ux))`[3 0xf ~]
~[3 15]
it({@ud @ux})
Not specifing a more complex type defaults to a list of raw nouns:
~zod/try=> :type; `(list)`[3 0xf ~]
~[3 15]
it(*)
Null-terminated tuples may be generated with the following syntax:
~zod/try=> ~[3 0xf]
[3 0xf ~]
~zod/try=> :type; ~[3 0xf]
[3 0xf ~]
[@ud @ux %~]
Note that this syntax is not automatically typed as a list, but may be cast as such:
~zod/try=> :type; `(list ?(@ux @ud))`~[3 0xf]
~[0x3 0xf]
it({@ux @ud})
Furthermore, a different syntax may be used to group the entire noun in the
head of a null terminated tuple:
~zod/try=> [3 0xf]~
[[3 0xf] ~]
This is often used to easily generate a list from a single noun.
~zod/try=> :type; `(list ,[@ud @ux])`[3 0xf]~
~[[3 0xf]]
it([@ud @ux])
The above is typed as a list of decimal hexadecimal pairs.
Units
-----
A Unit is Hoon's "maybe" type. As in, either some value or null. See section 2bA for Hoon's Unit library. Units are represented by a cell whose head is null and whose tail is some noun:
~zod/try=> `(unit ,@ud)`[~ 3]
[~ 3]
The unit type can be further specified by a subtype. The above example is a unit of @ud, meaning the optional value must be of type @ud.
~zod/try=> `(unit ,@ud)`[~ [3 3]]
! type-fail
! exit
~zod/try=> `(unit ,^)`[~ [3 3]]
[~ [3 3]]
For convenience, a null-headed noun may be specified with the following irregular syntax:
~zod/try=> `3
[~ 3]
~zod/try=> :type; `3
[~ 3]
[%~ @ud]
Note that this syntax is not automatically typed as a Unit, but may be cast as such:
~zod/try=> :type; `(unit)``3
[~ 3]
u(*)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,117 +1,70 @@
Morphology Morphology
========== ==========
Hoon is a statically typed language that compiles to Nock. You can test out Hoon is a statically typed language that compiles to Nock.
most of the compilation routines in the REPL, so we'll point out some of those
as we go along. At the highest level, there is `++make`, which turns text into
nock.
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<') Types
[8 [1 0 0] [1 0 6] 0 1] -----
If you want to know how Hoon code is compiled, start your investigation in Types are nouns the compiler keeps around as it turns your Hoon into Nock.
`++make`. Another way to do this, of course, is with `!=`.
~hoclur-bicrel/try=> !=(|=([@ @ud] +<)) A type serve two purposes:
[8 [1 0 0] [1 0 6] 0 1]
The difference here is of course that `!=` requires you to put in the code 1. It defines a set of nouns. Any finite noun is either in this set, or not in
literally and syntactically correctly. So, for example, you can't use tall it.
form runes within a wide form usage of `!=`. `++make` may be called on literal
text, which may even be programmatically generated. Essentially, `++make` is
more general, but `!=` is convenient for learning and debugging.
At a high level, the compilation process is as follows: 2. it ascribes semantics to all nouns in this set. For example, a Hoon type
exports a semantic namespace.
First, a runic expression is parsed into an abstact syntax tree, called a
`twig`:
text => twig These are defined in Hoon as one of the ten kinds of type found in `++type`.
This can be tested at the command line with `++ream`: At its most general, the `noun` type contains all Hoon nouns. Everything in
Hoon is a noun. The `%void` type contains no values at all. The `%atom` type
contains only atoms. The `%cube` type is parameterized by a value, and it
contains only that single value. The other types are defined in the lexicon.
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<') Type inference in Hoon works well enough that there is no direct
[ %brts
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
q=[%cnts p=~[[%.y p=6]] q=~]
]
This is simply a parser -- no compilation happens here. The details of the
parser can be found in the syntax section of this doc.
Next, we get into the real meat of the compiler. In the compiler proper, which
is `++mint` in `++ut`, we take a twig and a subject type and turn it into a
product type and a blob of nock.
[subject-type twig] => [product-type nock-formula]
For example, we can call `++mint` on the twig we produced earlier with a
subject type of any noun (pardon the necessity for splitting the input line).
~hoclur-bicrel/try=> (~(mint ut %noun) %noun
[%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
[ p
[ %core
p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
q
[ p=%gold
q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
]
]
q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
]
Note that in the result we get first a type, which is a gold core with an ash
gate. The second part, though, is exactly the nock that was produced by
`++make`. It looks a little different because it has some faces on it, but it
is in fact the same.
The astute reader will notice that we casually starting referring to types
without defining them. We must rectify this heinous atrocity.
A "type" is simply a set of possible values, combined with a set of semantics
for operating on these values. These are defined in Hoon as one of the ten
kinds of type found in `++type`. At its most general, the `noun` type contains
all Hoon nouns. Everything in Hoon is a noun. The `%void` type contains no
values at all. The `%atom` type contains only atoms. The `%cube` type is
parameterized by a value, and it contains only that single value. The other
types are defined in the lexicon.
Our type inference algorithms work well enough that in Hoon there is no direct
syntax for defining or declaring a type. There is only a syntax for syntax for defining or declaring a type. There is only a syntax for
constructing twigs. Types are always produced by inference. Of course, since constructing twigs. Types are always produced by inference.
we often need to refer to types, there are a number of runes that are specfically
used for referring to types. These runes act on `tile`s.
What are the usual things we want to do with types? We sometimes want to test
if some value is of a particular type. We sometimes want to declare that a
gate takes a particular type of argument, or that it produces a particular type
of result. We sometimes want to coerce an untyped value into a type so that we
always know what type we are operating on. We could write twigs to do each of
these things for every type we wish to use, and our type inference algorithms
will figure out what type we're talking about each time.
It would work, but it would be miserable. Thankfully, there's an easier way. When resolving a face, for example, the axis that
ends up in the nock formula depends on the where the face is in the subject.
We only know this because faces are in the subject type. Thus, in `=> [p=42
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
this reason that we say that a type defines not only a set of values, but also
the semantics for operating on those values.
Tiles
-----
What are the usual things we want to do with types?
- Test if a noun is of a particular type.
- Create a blank default example of a type
- Coerce a noun into a type
It is possible to generate twigs for each of the above use cases from two It is possible to generate twigs for each of the above use cases from two
pieces of information: (1) a tile describing a type and (2) a reference to pieces of information: (1) a tile describing a type and (2) a reference to
which case we want to generate. This eliminates much of the tedious which case we want to generate.
boilerplate.
The first and most important thing to remember about tiles is that tiles are Tiles are not types.
not types. The second and nearly as important thing to remember about tiles
is that every tile has a unique "icon", which is an associated type. A tile
is not a type, but every tile corresponds to some type. But what is a tile?
A tile is a nice little representation of a type that may be acted upon and
reduced in several ways. Remember, types never show up in our code -- they are
always produced by inference. But sometimes it's convenient to have little
macros to do all the little things we usually want to do with types without
having to rewrite them for every type.
Formalizing the operations on a tile, there are exactly four. We will briefy Every tile has a unique associated type, or "icon".
describe them here, but they are documented more thoroughly elsewhere.
A tile is not a type, but every tile corresponds to some type.
Formalizing the operations on a tile, there are exactly four.
asdf
adfa
Bunting a tile is simply creating a blank default example of the tile's icon. Bunting a tile is simply creating a blank default example of the tile's icon.
This may seem to have limited usefulness, but this is actually the most common This may seem to have limited usefulness, but this is actually the most common
use of tiles. This is due to the the way in which we create, for example, use of tiles. This is due to the the way in which we create, for example,
@ -139,88 +92,139 @@ never directly used in any Hoon source code. Whipping is used internally by
clamming. clamming.
In summary, a tile is simply a convenient syntax for creating well-typed nouns. In summary, a tile is simply a convenient syntax for creating well-typed nouns.
Say that again, *a tile is simply a convenient syntax for creating well-typed
nouns.* A tile is not a tiwg, but tiles always are reduced statically in one A tile is not a twig, but tiles always are reduced statically in one
of four ways to a twig. `++tile` is a sort of intermediate representation of four ways to a twig.
between text and twigs that is often used when we're referring to types.
Returning from our digression about types and tiles to our discussion of the
compilation process, recall that `++mint` takes a subject-type and a twig and Type Inference
compiles it into a product-type and a blob of nock. --------------
Hoon is a higher-order typed functional language. Most languages in this class,
Haskell and ML being prominent examples, use something called the
Hindley-Milner unification algorithm. Hoon uses its own special sauce instead.
Hoon's philosophy is that a language is a UI for programmers, and the basic
test of a UI is its predictability. It is impossible (for most programmers)
to learn a language properly unless they know what the compiler is doing, which
in practice means mentally stepping through the algorithms it uses (with the
exception of semantically neutral optimizations). Haskell is a hard language to
learn (for most programmers) because it's hard (for most programmers) to follow
what the Haskell compiler is thinking.
Broadly speaking, type inference in Hoon has three general limitations as
compared to Hindley-Milner inference.
1. Hoon does not think backwards. For instance, it cannot infer a function's
argument type (or rather, a gate's sample type) from its body.
2. Hoon can infer through tail recursion, but not head recursion. It can check
head recursion, however, given an annotation.
3. The compiler catches most but not all divergent inference problems - i.e.
you can put the compiler into an infinite loop or exponential equivalent.
An interrupt will still show you your error location.
Although an inference algorithm that reasons only forward must and does require
a few more annotations from the programmer, the small extra burden on her
fingers is more than offset by the lighter load on her hippocampus.
Furthermore, programs also exist to be read. Some of these annotations (which a
smarter algorithm might infer by steam) may annoy the actual author of the code
but be a lifesaver for her replacement.
Our experience is that these limitations are minor annoyances at worst and
prudent restrictions at best. Your mileage may vary.
Type inference is a frightening problem, especially if you've been exposed to
a wall of math. Your only real problem in learning Hoon is to learn not to
fear it. Once you work past this reasonable but entirely unfounded fear of
inference, your Hoon experience will be simple, refreshing and delightful. So
first, let's talk through a few reassuring facts:
1. Type inference in Hoon never sees a tile. It operates exclusively on twigs.
All tiles and synthetic twigs are reduced to natural twigs for the inference
engine's benefit.
2. The semantics of Hoon are in ++ut in hoon.hoon, and nowhere else.
3. Within ++ut, all the semantics of Hoon are in the call graph of one arm: ++mint.
++mint has a case for every natural hoon. So do ++play and ++mull,
but their semantics are redundant with ++mint.
4. One leg in the sample of ++mint - gol - which looks for all the world like a
mechanism for backward inference, is not. It is semantically irrelevant and
only exists to get better localization on error reports.
5. ++mint is the gate that maps [type twig] to [type nock]:
[subject-type twig] => [product-type nock-formula]
When we have a type that describes the subject for the formula we're trying to
generate, as we generate that formula we want to also generate a type for the
product of that formula on that subject. As long as subject-type is a
correct description of some subject, you can take any twig and compile it
against subject-type, producing a formula such that *(subject formula) is a
product correctly described by product-type.
Compilation
------------
`++make` is a top-level function that turns text into nock.
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<')
[8 [1 0 0] [1 0 6] 0 1]
Another way to do this is with `!=`.
~hoclur-bicrel/try=> !=(|=([@ @ud] +<))
[8 [1 0 0] [1 0 6] 0 1]
`++make` is more general, in that it can be called on programmatically
generated text, but `!=` is convenient for learning and debugging.
The compilation process is as follows:
First, a runic expression is parsed into an abstact syntax tree, called a
`twig`:
text => twig
Parsing code into a `twig` can be done with `++ream`:
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<')
[ %brts
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
q=[%cnts p=~[[%.y p=6]] q=~]
]
Refer to the Syntax section for more detail on parsing.
The compiler proper, is `++mint` in `++ut`.
++mint takes a twig and a subject type and produces a product type and a nock formula.
[subject-type twig] => [product-type nock-formula] [subject-type twig] => [product-type nock-formula]
As we compile each twig, we compile it against a subject type. During the For example, we can call `++mint` on the twig we produced earlier with a
compilation of a twig, we obviously don't have access to the value of the subject type of any noun:
subject (else compilation would include running the code). We do, however,
have some guarantees about its value. We have its type.
Most runes don't change the subject type, but some do. In the most simple ~hoclur-bicrel/try=>
example, consider `=> p q`, which means to evaluate `q` with a subject of (~(mint ut %noun) %noun [%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
`p`. Here, the subject type of `q` is simply the type of `p`.
In a slightly more complicated example, consider `?: p q r`, which means [ p
simply to evaluate `q` if `p` is true, else evaluate `r`. When compiling `q`, [ %core
we get to assume in the subject type that `p` is true while when compiling `r`, p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
we get to assume in the subject type that `p` is false. This is used to great q
practical purpose, for example, when handling lists. If `p` is a test whether [ p=%gold
the list is empty (which would be `?=(~ l)`), then in `q` we can assume that q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
the list is empty while in `r` we know that the list has a head and a tail. r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
Without that test, if you attempt to refer to the head of the list, the ]
compiler will complain that it cannot verify there is even a head to which to ]
refer. Thus, the compiler will generate `find-limb` and `find-fork` errors. q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
The `find-limb` refers to its inability to find the head of the list while the ]
`find-fork` refers to the fact that it's looking in a fork type -- that is, it
believes that the values is of one of multiple types, and in at least one of
the constituent types there is no head. The error messages are described in
detail in the lexicon.
Recapping, when compiling Hoon code, we compile each individual twig against
the subject with which it will eventually be called. The product is a nock
formula and the type of value that it may produce. Thus, both the nock formula
and the product type may depend on the both the subject type and the twig in
question.
It's obvious that the product type will usually depend on the subject type, but
it's less obvious when the nock formula will depend on the type. It does,
however, happen at times. When resolving a face, for example, the axis that
ends up in the nock formula depends on the where the face is in the subject.
We only know this because faces are in the subject type. Thus, in `=> [p=42
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
this reason that we say that a type defines not only a set of values, but also
the semantics for operating on those values.
As long as some value is in the subject type, you can run it against the
produced nock formula as `*[subject formula]` and get a value in the product
type.
We've ignored one question thus far: it's all well and good once we've started
compiling, for we know the subject type. But what subject type do we start
with? We could, of course, put some restrictions on the subject type of
compiled Hoon code, but (1) there's no reason to do that, and (2) since this
will be returned as nock, and nock is untyped, the compiler cannot actually
make any guarantees about what subject it will be called with. Thus, we start
the compilation with a subject type of all nouns.
Hoon has 120 [XX count] digraph runes. The choice of glyph is not random. The
first defines a semantic category (with some exceptions). These categories are:
| bar core construction
$ buc tiles and tiling
% cen invocations
: col tuples
. dot nock operators
^ ket type conversions
; sem miscellaneous macros
~ sig hints
= tis compositions
? wut conditionals, booleans, tests
! zap special operations
Note that the head of the above is a type, which in this case is a gold core
with an ash gate. The second part, though, is (with a few labels, or faces added) the nock that was
produced by `++make`.