mirror of
https://github.com/urbit/shrub.git
synced 2024-12-13 16:03:36 +03:00
Some more doc
This commit is contained in:
parent
b180f779ea
commit
a8931ecb19
553
main/pub/src/doc/ref/hoon/lex-noun.md
Normal file
553
main/pub/src/doc/ref/hoon/lex-noun.md
Normal file
@ -0,0 +1,553 @@
|
||||
Lexicon: Nouns
|
||||
=======
|
||||
|
||||
Atom Syntax
|
||||
----------
|
||||
|
||||
|
||||
###Canonical Atom Odors
|
||||
|
||||
An odor is an atom format that specifies an atomic subtype.
|
||||
|
||||
```
|
||||
@c UTF-32 codepoint
|
||||
@d date
|
||||
@da absolute date
|
||||
@dr relative date (ie, timespan)
|
||||
@f yes or no (inverse boolean)
|
||||
@n nil
|
||||
@p phonemic base
|
||||
@r IEEE floating-point
|
||||
@rd double precision (64 bits)
|
||||
@rh half precision (16 bits)
|
||||
@rq quad precision (128 bits)
|
||||
@rs single precision (32 bits)
|
||||
@s signed integer, sign bit low
|
||||
@sb signed binary
|
||||
@sd signed decimal
|
||||
@sv signed base32
|
||||
@sw signed base64
|
||||
@sx signed hexadecimal
|
||||
@t UTF-8 text (cord)
|
||||
@ta ASCII text (span)
|
||||
@tas ASCII symbol (term)
|
||||
@u unsigned integer
|
||||
@ub unsigned binary
|
||||
@ud unsigned decimal
|
||||
@uv unsigned base32
|
||||
@uw unsigned base64
|
||||
@ux unsigned hexadecimal
|
||||
```
|
||||
|
||||
###Unsigned decimal, @ud
|
||||
|
||||
Hoon's unsigned decimal format is the normal Continental syntax. It differs
|
||||
from the Anglo-American only in the use of periods, rather than commas, between
|
||||
groups of 3:
|
||||
|
||||
~zod/try=> 19
|
||||
19
|
||||
~zod/try=> 1.024
|
||||
1.024
|
||||
|
||||
An unsigned decimal not broken into groups is a syntax error. Also, whitespace
|
||||
or even linebreaks can appear between the dot and the next group.
|
||||
|
||||
~zod/try=> 65. 536
|
||||
65.536
|
||||
|
||||
###Unsigned hexadecimal, @ux
|
||||
|
||||
@ux has the same syntax as @ud, except that it's prefixed by 0x and uses groups
|
||||
of four. Hex digits are lowercase only.
|
||||
|
||||
~zod/try=> 0x0
|
||||
0x0
|
||||
~zod/try=> `@ud`0x17
|
||||
23
|
||||
|
||||
###Unsigned base64 and base32, @uv, @uw
|
||||
|
||||
The prefix is 0w for base64 and 0v for base32. The digits for @uw are, in
|
||||
order: 0-9, a-z, A-Z, -, ~:
|
||||
|
||||
~zod/try=> `@ud`0w-
|
||||
62
|
||||
|
||||
For @uv, the digits are 0-9, a-v.
|
||||
|
||||
Signed integers, @sd, @sx, @sw, @sv, @sb
|
||||
|
||||
Obviously, without finite-sized integers, the sign extension trick does not
|
||||
work. A signed integer in Hoon is a different way to use atoms than an unsigned
|
||||
integer; even for positive numbers, the signed integer cannot equal the
|
||||
unsigned.
|
||||
|
||||
The prefix for a negative signed integer is a single - before the unsigned
|
||||
syntax. The prefix for a positive signed integer is --. The sign bit is the low
|
||||
bit:
|
||||
|
||||
~zod/try=> -1
|
||||
-1
|
||||
~zod/try=> --1
|
||||
--1
|
||||
~zod/try=> `@ud`-1
|
||||
1
|
||||
~zod/try=> `@ud`--1
|
||||
2
|
||||
~zod/try=> `@ud`-2
|
||||
3
|
||||
~zod/try=> `@ud`--2
|
||||
4
|
||||
~zod/try=> `@ux`-0x10
|
||||
0x1f
|
||||
~zod/try=> `@ux`--0x10
|
||||
0x20
|
||||
~zod/try=> `@ud`--0w-
|
||||
124
|
||||
~zod/try=> `@sw`124
|
||||
--0w-
|
||||
|
||||
###Absolute date, @da
|
||||
|
||||
Urbit dates represent 128-bit chronological time, with 2^64 seconds from the
|
||||
start of the universe to the end. 2^127 is 3:30:08 PM on December 5, AD 226,
|
||||
for reasons not clear or relevant:
|
||||
|
||||
~zod/try=> `@da`(bex 127)
|
||||
~226.12.5..15.30.08
|
||||
|
||||
~zod/try=> `@da`(dec (bex 127))
|
||||
~226.12.5..15.30.07..ffff.ffff.ffff.ffff
|
||||
|
||||
The time of day and/or second fragment is optional:
|
||||
|
||||
~zod/try=> `@ux`~2013.12.7
|
||||
0x8000.000d.2140.7280.0000.0000.0000.0000
|
||||
|
||||
~zod/try=> `@ux`~2013.12.7..15.30.07
|
||||
0x8000.000d.2141.4c7f.0000.0000.0000.0000
|
||||
|
||||
~zod/try=> `@ux`~2013.12.7..15.30.07..1234
|
||||
0x8000.000d.2141.4c7f.1234.0000.0000.0000
|
||||
|
||||
We also do BC:
|
||||
|
||||
~zod/try=> `@ux`~226-.12.5
|
||||
0x7fff.fffc.afb1.b800.0000.0000.0000.0000
|
||||
|
||||
The semantics of the time system are that UGT (Urbit Galactic Time) is GMT/UTC
|
||||
as of leap second 25. UGT is chronological and will never add leap seconds,
|
||||
even if UTC continues this mistake. If a gap appears, it must be resolved in
|
||||
the presentation layer, with timezones and other human curiosities.
|
||||
|
||||
See section 2cH of the source for more details.
|
||||
|
||||
###Relative date, @dr
|
||||
|
||||
It's also nice to have a syntax for basic time intervals:
|
||||
|
||||
~zod/try=> `@ux`~s1
|
||||
0x1.0000.0000.0000.0000
|
||||
|
||||
~zod/try=> `@ux`~m1
|
||||
0x3c.0000.0000.0000.0000
|
||||
|
||||
~zod/try=> (div ~m1 ~s1)
|
||||
60
|
||||
|
||||
~zod/try=> (div ~h1 ~m1)
|
||||
60
|
||||
|
||||
~zod/try=> (div ~h1 ~s1)
|
||||
3.600
|
||||
|
||||
~zod/try=> (div ~d1 ~h1)
|
||||
24
|
||||
|
||||
~zod/try=> `@da`(add ~2013.11.30 ~d1)
|
||||
~2013.12.1
|
||||
|
||||
There are no @dr intervals under a second or over a day. Since the resolution
|
||||
is so high, though, (div ~s1 1.000.000) produces a pretty accurate microsecond.
|
||||
|
||||
###Loobean, @f
|
||||
|
||||
A loobean, or just bean, is 0 or 1. 0 is yes, 1 is no:
|
||||
|
||||
~zod/try=> `@ud`.y
|
||||
0
|
||||
~zod/try=> `@ud`.n
|
||||
1
|
||||
|
||||
###Nil, @n
|
||||
|
||||
Nil indicates an absence of information, as in a list terminator. The only
|
||||
value is ~, 0.
|
||||
|
||||
~zod/try=> `@ud`~
|
||||
0
|
||||
|
||||
###Unicode text, @t
|
||||
|
||||
@t is a sequence of UTF-8 bytes, LSB first - sometimes called a cord. For
|
||||
lowercase numbers and letters, the canonical syntax is ~~text:
|
||||
|
||||
~zod/try=> ~~foo
|
||||
'foo'
|
||||
|
||||
Note that the prettyprinter makes an unprincipled exception and prints the text
|
||||
in a noncanonical format:
|
||||
|
||||
~zod/try=> `@ux`~~foo
|
||||
0x6f.6f66
|
||||
|
||||
We want to be able to encode an arbitrary Unicode string as a single URL-safe
|
||||
token, using no punctuation but .~-, in @t. Space is ., . is ~., ~ is ~~, - is
|
||||
-:
|
||||
|
||||
~zod/try=> ~~foo.bar
|
||||
'foo bar'
|
||||
~zod/try=> ~~foo.bar~.baz~~moo-hoo
|
||||
'foo bar.baz~moo-hoo'
|
||||
|
||||
For all other ASCII/Unicode characters, insert the Unicode codepoint in
|
||||
lower-case hexadecimal, followed by .. For example, for U+2605 "BLACK STAR",
|
||||
write:
|
||||
|
||||
~zod/try=> ~~foo~2605.bar
|
||||
'foo★bar'
|
||||
|
||||
This UTF-32 codepoint is of course converted to UTF-8:
|
||||
|
||||
~zod/try=> `@ux`~~foo~2605.bar
|
||||
0x72.6162.8598.e26f.6f66
|
||||
|
||||
###URL-safe ASCII text, @ta
|
||||
|
||||
@ta encodes the ASCII subset that all canonical atom syntaxes restrict
|
||||
themselves to. The prefix is ~.. There are no escape sequences except ~~, which
|
||||
means ~, and ~-, which means \_. - and . encode themselves. No other characters
|
||||
besides numbers and lowercase letters need apply.
|
||||
|
||||
~zod/try=> `@t`~.foo
|
||||
'foo'
|
||||
~zod/try=> `@t`~.foo.bar
|
||||
'foo.bar'
|
||||
~zod/try=> `@t`~.foo~~bar
|
||||
'foo~bar'
|
||||
~zod/try=> `@t`~.foo~-bar
|
||||
'foo_bar'
|
||||
~zod/try=> `@t`~.foo-bar
|
||||
'foo-bar'
|
||||
|
||||
A @ta atom is called a span.
|
||||
|
||||
###Codepoint, @c
|
||||
|
||||
Normally when we build atoms of Unicode text, we use a UTF-8 bytestream, LSB
|
||||
first. But sometimes it's useful to build atoms of one or more UTF-32 words.
|
||||
|
||||
The codepoint syntax is the same as @t, except with a ~- prefix. Let's repeat
|
||||
our examples, with hex display:
|
||||
|
||||
~zod/try=> `@ux`~-foo
|
||||
0x6f.0000.006f.0000.0066
|
||||
|
||||
~zod/try=> `@ux`~-foo.bar
|
||||
0x72.0000.0061.0000.0062.0000.0020.0000.006f.0000.006f.0000.0066
|
||||
|
||||
###Phonemic, @p
|
||||
|
||||
We've seen @p used for ships, of course. But it's not just for ships - it's for
|
||||
any short number optimized for memorability, not for arithmetic. @p is great
|
||||
for checksums, for instance.
|
||||
|
||||
That said, @p is subtly customized for the sociopolitical design of Urbit as a
|
||||
digital republic. For example, one feature we don't want is the ability to see
|
||||
at a glance which carrier and cruiser issued a destroyer. Consider the carrier
|
||||
0x21:
|
||||
|
||||
~zod/try=> `@p`0x21
|
||||
~mep
|
||||
|
||||
It issues 255 cruisers, including 0x4321:
|
||||
|
||||
~zod/try=> `@p`0x4321
|
||||
~pasnut
|
||||
|
||||
Which issues 65.535 destroyers, including 0x8765.4321 and several successors:
|
||||
|
||||
~zod/try=> `@p`0x8765.4321
|
||||
~famsyr-dirwes
|
||||
~zod/try=> `@p`0x8766.4321
|
||||
~lidlug-maprec
|
||||
~zod/try=> `@p`0x8767.4321
|
||||
~tidlus-roplen
|
||||
~zod/try=> `@p`0x8768.4321
|
||||
~lisnel-lonbet
|
||||
|
||||
Of course, anyone who can juggle bits can see that ~famsyr-dirwes is a close
|
||||
cousin of ~lidlug-maprec. But she actually has to juggle bits to do it.
|
||||
Obfuscation does not prevent calculated associations, just automatic ones.
|
||||
|
||||
But at the yacht level, we actually want to see a uniform 32-bit space of
|
||||
yachts directly associated with the destroyer:
|
||||
|
||||
~zod/try=> `@p`0x9.8765.4321
|
||||
~talfes-sibzod-famsyr-dirwes
|
||||
~zod/try=> `@p`0xba9.8765.4321
|
||||
~tacbep-ronreg-famsyr-dirwes
|
||||
~zod/try=> `@p`0xd.cba9.8765.4321
|
||||
~bicsub-ritbyt-famsyr-dirwes
|
||||
~zod/try=> `@p`0xfed.cba9.8765.4321
|
||||
~sivrep-hadfeb-famsyr-dirwes
|
||||
|
||||
###IPv4 and IPv6 addresses, @if, @is
|
||||
|
||||
Urbit lives atop IP and would be very foolish to not support a syntax for the
|
||||
large atoms that are IPv4 and IPv6 addresses.
|
||||
|
||||
@if is the standard IPv4 syntax, prefixed with .:
|
||||
|
||||
~zod/try=> `@ux`.127.0.0.1
|
||||
0x7f00.0001
|
||||
|
||||
@is is the same as @if, but with 8 groups of 4 hex digits:
|
||||
|
||||
~zod/try=> `@ux`.dead.beef.0.cafe.42.babe.dead.beef
|
||||
0xdead.beef.0000.cafe.0042.babe.dead.beef
|
||||
|
||||
|
||||
###Floating Point, @rs, @rd, @rq, @rh
|
||||
|
||||
The syntax for a single-precision float is the normal English syntax, with a . prefix:
|
||||
|
||||
.6.2832 :: τ as @rs
|
||||
.-6.2832 :: -τ as @rs
|
||||
.~6.2832 :: τ as @rd
|
||||
.~-6.2832 :: -τ as @rd
|
||||
.~~6.2832 :: τ as @rh
|
||||
.~~~6.2832 :: τ as @rq
|
||||
|
||||
(Hoon is a Tauist language and promotes International Tau Day.)
|
||||
|
||||
###Transparent cell syntax
|
||||
|
||||
By adding _, we can encode arbitrary nouns in our safe subset. The prefix to a
|
||||
canonical cell is ._; the separator is _; the terminator is __. Thus:
|
||||
|
||||
~zod/try=> ._3_4__
|
||||
[3 4]
|
||||
|
||||
~zod/try=> :type; ._.127.0.0.1_._0x12_19___~tasfyn-partyv__
|
||||
[.127.0.0.1 [0x12 19] ~tasfyn-partyv]
|
||||
[@if [@ux @ud] @p]
|
||||
|
||||
Those who don't see utility in this strange feature have perhaps never needed
|
||||
to jam a data structure into a URL.
|
||||
|
||||
###Opaque noun syntax
|
||||
|
||||
Speaking of jam, sometimes we really don't care what's inside our noun. Then,
|
||||
the syntax to use is a variant of @uw prefixed by ~, which incorporates the
|
||||
built-in jam and cue marshallers:
|
||||
|
||||
~zod/try=> (jam [3 4])
|
||||
78.241
|
||||
~zod/try=> `@uw`(jam [3 4])
|
||||
0wj6x
|
||||
~zod/try=> (cue 0wj6x)
|
||||
[3 4]
|
||||
~zod/try=> ~0wj6x
|
||||
[3 4]
|
||||
|
||||
Noncanonical Syntax
|
||||
--------------------
|
||||
|
||||
These are syntaxes for constants which don't fit the canonical character-set
|
||||
constraints.
|
||||
|
||||
###Cubes, @tas
|
||||
|
||||
@tas, a term, is our most exclusive odor. The only characters permitted are
|
||||
lowercase ASCII, - except as the first or last character, and 0-9 except as the
|
||||
first character.
|
||||
|
||||
The syntax for @tas is the text itself, always preceded by %. This means a term
|
||||
is always cubical. You can cast it to @tas if you like, but we just about
|
||||
always want the cube:
|
||||
|
||||
~zod/try=> %dead-fish9
|
||||
%dead-fish9
|
||||
|
||||
~zod/try=> -:!>(%dead-fish9)
|
||||
[%cube p=271.101.667.197.767.630.546.276 q=[%atom p=%tas]]
|
||||
|
||||
The empty @tas has a special syntax, $:
|
||||
|
||||
~zod/try=> %$
|
||||
%$
|
||||
|
||||
A term without % is not a constant, but a name:
|
||||
|
||||
~zod/try=> dead-fish9
|
||||
! -find-limb.dead-fish9
|
||||
! find-none
|
||||
! exit
|
||||
|
||||
A common structure in Hoon is a noun with a cubical head and an arbitrary tail:
|
||||
|
||||
~zod/try=> [%foo 'bar']
|
||||
[%foo 'bar']
|
||||
|
||||
This structure may be generated with the following irregular syntax:
|
||||
|
||||
~zod/try=> a/'bar'
|
||||
[%a 'bar']
|
||||
|
||||
###Loobeans, @f
|
||||
|
||||
.y is a little cumbersome, so we can say & and |. The % prefix cubes as usual.
|
||||
|
||||
~zod/try=> `@ud`&
|
||||
0
|
||||
~zod/try=> `@ud`|
|
||||
1
|
||||
|
||||
###Cords, @t
|
||||
|
||||
The canonical ~~ syntax for @t, while it has its place, is intolerable in a
|
||||
number of ways - especially when it comes to escaping capitals. So @t is both
|
||||
printed and parsed in a conventional-looking single-quote syntax:
|
||||
|
||||
~zod/try=> 'foo bar'
|
||||
'foo bar'
|
||||
~zod/try=> `@ux`'foo bar'
|
||||
0x72.6162.206f.6f66
|
||||
Escape ' with \:
|
||||
|
||||
~zod/try=> 'Foo \'bar'
|
||||
'Foo \'bar'
|
||||
~zod/try=> `@ux`'\''
|
||||
0x27
|
||||
|
||||
###Strings
|
||||
|
||||
Text in Hoon is generally manipulated in two ways, depending on what you're
|
||||
doing: as an atomic cord/span/term, or as a tape which is a list of bytes (not
|
||||
codepoints).
|
||||
|
||||
To generate a tape, use double quotes:
|
||||
|
||||
~zod/try=> "foo"
|
||||
"foo"
|
||||
~zod/try=> `*`"foo"
|
||||
[102 111 111 0]
|
||||
|
||||
We're getting off the constant reservation, but strings also interpolate with curly-braces:
|
||||
|
||||
~zod/try=> "hello {(weld "wor" "ld")} is a fun thing to say"
|
||||
"hello world is a fun thing to say"
|
||||
|
||||
And they can be joined across space or lines with a .:
|
||||
|
||||
~zod/try=> "hello"."world"
|
||||
"helloworld"
|
||||
~zod/try=> "hello". "world"
|
||||
"helloworld"
|
||||
|
||||
Lists
|
||||
-----
|
||||
|
||||
A list in Hoon is a null-terminated tuple. See section 2bB for Hoon's List library.
|
||||
|
||||
~zod/try=> :type; `(list)`[3 4 ~]
|
||||
~[3 4]
|
||||
it(*)
|
||||
|
||||
The list type can be further specified by a subtype:
|
||||
|
||||
~zod/try=> :type; `(list ,@ud)`[3 4 ~]
|
||||
~[3 4]
|
||||
it(@ud)
|
||||
|
||||
The above example is a list of @ud, meaning the all values in the list must be of type @ud.
|
||||
|
||||
Polymorphic lists can be specified by constructing a more complex type:
|
||||
|
||||
~zod/try=> :type; `(list ?(@ud @ux))`[3 0xf ~]
|
||||
~[3 15]
|
||||
it({@ud @ux})
|
||||
|
||||
Not specifing a more complex type defaults to a list of raw nouns:
|
||||
|
||||
~zod/try=> :type; `(list)`[3 0xf ~]
|
||||
~[3 15]
|
||||
it(*)
|
||||
|
||||
Null-terminated tuples may be generated with the following syntax:
|
||||
|
||||
~zod/try=> ~[3 0xf]
|
||||
[3 0xf ~]
|
||||
|
||||
~zod/try=> :type; ~[3 0xf]
|
||||
[3 0xf ~]
|
||||
[@ud @ux %~]
|
||||
|
||||
Note that this syntax is not automatically typed as a list, but may be cast as such:
|
||||
|
||||
~zod/try=> :type; `(list ?(@ux @ud))`~[3 0xf]
|
||||
~[0x3 0xf]
|
||||
it({@ux @ud})
|
||||
|
||||
Furthermore, a different syntax may be used to group the entire noun in the
|
||||
head of a null terminated tuple:
|
||||
|
||||
|
||||
~zod/try=> [3 0xf]~
|
||||
[[3 0xf] ~]
|
||||
|
||||
This is often used to easily generate a list from a single noun.
|
||||
|
||||
~zod/try=> :type; `(list ,[@ud @ux])`[3 0xf]~
|
||||
~[[3 0xf]]
|
||||
it([@ud @ux])
|
||||
|
||||
The above is typed as a list of decimal hexadecimal pairs.
|
||||
|
||||
Units
|
||||
-----
|
||||
|
||||
A Unit is Hoon's "maybe" type. As in, either some value or null. See section 2bA for Hoon's Unit library. Units are represented by a cell whose head is null and whose tail is some noun:
|
||||
|
||||
~zod/try=> `(unit ,@ud)`[~ 3]
|
||||
[~ 3]
|
||||
|
||||
The unit type can be further specified by a subtype. The above example is a unit of @ud, meaning the optional value must be of type @ud.
|
||||
|
||||
~zod/try=> `(unit ,@ud)`[~ [3 3]]
|
||||
! type-fail
|
||||
! exit
|
||||
|
||||
~zod/try=> `(unit ,^)`[~ [3 3]]
|
||||
[~ [3 3]]
|
||||
|
||||
For convenience, a null-headed noun may be specified with the following irregular syntax:
|
||||
|
||||
~zod/try=> `3
|
||||
[~ 3]
|
||||
|
||||
~zod/try=> :type; `3
|
||||
[~ 3]
|
||||
[%~ @ud]
|
||||
|
||||
Note that this syntax is not automatically typed as a Unit, but may be cast as such:
|
||||
|
||||
~zod/try=> :type; `(unit)``3
|
||||
[~ 3]
|
||||
u(*)
|
||||
|
||||
|
||||
|
||||
|
2521
main/pub/src/doc/ref/hoon/lex-rune.md
Normal file
2521
main/pub/src/doc/ref/hoon/lex-rune.md
Normal file
File diff suppressed because it is too large
Load Diff
2612
main/pub/src/doc/ref/hoon/lex-tile.md
Normal file
2612
main/pub/src/doc/ref/hoon/lex-tile.md
Normal file
File diff suppressed because it is too large
Load Diff
2612
main/pub/src/doc/ref/hoon/lex-twig.md
Normal file
2612
main/pub/src/doc/ref/hoon/lex-twig.md
Normal file
File diff suppressed because it is too large
Load Diff
@ -1,117 +1,70 @@
|
||||
Morphology
|
||||
==========
|
||||
|
||||
Hoon is a statically typed language that compiles to Nock. You can test out
|
||||
most of the compilation routines in the REPL, so we'll point out some of those
|
||||
as we go along. At the highest level, there is `++make`, which turns text into
|
||||
nock.
|
||||
Hoon is a statically typed language that compiles to Nock.
|
||||
|
||||
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<')
|
||||
[8 [1 0 0] [1 0 6] 0 1]
|
||||
Types
|
||||
-----
|
||||
|
||||
If you want to know how Hoon code is compiled, start your investigation in
|
||||
`++make`. Another way to do this, of course, is with `!=`.
|
||||
Types are nouns the compiler keeps around as it turns your Hoon into Nock.
|
||||
|
||||
~hoclur-bicrel/try=> !=(|=([@ @ud] +<))
|
||||
[8 [1 0 0] [1 0 6] 0 1]
|
||||
A type serve two purposes:
|
||||
|
||||
The difference here is of course that `!=` requires you to put in the code
|
||||
literally and syntactically correctly. So, for example, you can't use tall
|
||||
form runes within a wide form usage of `!=`. `++make` may be called on literal
|
||||
text, which may even be programmatically generated. Essentially, `++make` is
|
||||
more general, but `!=` is convenient for learning and debugging.
|
||||
1. It defines a set of nouns. Any finite noun is either in this set, or not in
|
||||
it.
|
||||
|
||||
At a high level, the compilation process is as follows:
|
||||
2. it ascribes semantics to all nouns in this set. For example, a Hoon type
|
||||
exports a semantic namespace.
|
||||
|
||||
First, a runic expression is parsed into an abstact syntax tree, called a
|
||||
`twig`:
|
||||
|
||||
text => twig
|
||||
These are defined in Hoon as one of the ten kinds of type found in `++type`.
|
||||
|
||||
This can be tested at the command line with `++ream`:
|
||||
At its most general, the `noun` type contains all Hoon nouns. Everything in
|
||||
Hoon is a noun. The `%void` type contains no values at all. The `%atom` type
|
||||
contains only atoms. The `%cube` type is parameterized by a value, and it
|
||||
contains only that single value. The other types are defined in the lexicon.
|
||||
|
||||
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<')
|
||||
[ %brts
|
||||
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
|
||||
q=[%cnts p=~[[%.y p=6]] q=~]
|
||||
]
|
||||
|
||||
This is simply a parser -- no compilation happens here. The details of the
|
||||
parser can be found in the syntax section of this doc.
|
||||
|
||||
Next, we get into the real meat of the compiler. In the compiler proper, which
|
||||
is `++mint` in `++ut`, we take a twig and a subject type and turn it into a
|
||||
product type and a blob of nock.
|
||||
|
||||
[subject-type twig] => [product-type nock-formula]
|
||||
|
||||
For example, we can call `++mint` on the twig we produced earlier with a
|
||||
subject type of any noun (pardon the necessity for splitting the input line).
|
||||
|
||||
~hoclur-bicrel/try=> (~(mint ut %noun) %noun
|
||||
[%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
|
||||
|
||||
[ p
|
||||
[ %core
|
||||
p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||
q
|
||||
[ p=%gold
|
||||
q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||
r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
|
||||
]
|
||||
]
|
||||
q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
|
||||
]
|
||||
|
||||
Note that in the result we get first a type, which is a gold core with an ash
|
||||
gate. The second part, though, is exactly the nock that was produced by
|
||||
`++make`. It looks a little different because it has some faces on it, but it
|
||||
is in fact the same.
|
||||
|
||||
The astute reader will notice that we casually starting referring to types
|
||||
without defining them. We must rectify this heinous atrocity.
|
||||
|
||||
A "type" is simply a set of possible values, combined with a set of semantics
|
||||
for operating on these values. These are defined in Hoon as one of the ten
|
||||
kinds of type found in `++type`. At its most general, the `noun` type contains
|
||||
all Hoon nouns. Everything in Hoon is a noun. The `%void` type contains no
|
||||
values at all. The `%atom` type contains only atoms. The `%cube` type is
|
||||
parameterized by a value, and it contains only that single value. The other
|
||||
types are defined in the lexicon.
|
||||
|
||||
Our type inference algorithms work well enough that in Hoon there is no direct
|
||||
Type inference in Hoon works well enough that there is no direct
|
||||
syntax for defining or declaring a type. There is only a syntax for
|
||||
constructing twigs. Types are always produced by inference. Of course, since
|
||||
we often need to refer to types, there are a number of runes that are specfically
|
||||
used for referring to types. These runes act on `tile`s.
|
||||
constructing twigs. Types are always produced by inference.
|
||||
|
||||
What are the usual things we want to do with types? We sometimes want to test
|
||||
if some value is of a particular type. We sometimes want to declare that a
|
||||
gate takes a particular type of argument, or that it produces a particular type
|
||||
of result. We sometimes want to coerce an untyped value into a type so that we
|
||||
always know what type we are operating on. We could write twigs to do each of
|
||||
these things for every type we wish to use, and our type inference algorithms
|
||||
will figure out what type we're talking about each time.
|
||||
|
||||
It would work, but it would be miserable. Thankfully, there's an easier way.
|
||||
When resolving a face, for example, the axis that
|
||||
ends up in the nock formula depends on the where the face is in the subject.
|
||||
We only know this because faces are in the subject type. Thus, in `=> [p=42
|
||||
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
|
||||
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
|
||||
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
|
||||
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
|
||||
this reason that we say that a type defines not only a set of values, but also
|
||||
the semantics for operating on those values.
|
||||
|
||||
|
||||
Tiles
|
||||
-----
|
||||
|
||||
What are the usual things we want to do with types?
|
||||
|
||||
- Test if a noun is of a particular type.
|
||||
|
||||
- Create a blank default example of a type
|
||||
|
||||
- Coerce a noun into a type
|
||||
|
||||
It is possible to generate twigs for each of the above use cases from two
|
||||
pieces of information: (1) a tile describing a type and (2) a reference to
|
||||
which case we want to generate. This eliminates much of the tedious
|
||||
boilerplate.
|
||||
which case we want to generate.
|
||||
|
||||
The first and most important thing to remember about tiles is that tiles are
|
||||
not types. The second and nearly as important thing to remember about tiles
|
||||
is that every tile has a unique "icon", which is an associated type. A tile
|
||||
is not a type, but every tile corresponds to some type. But what is a tile?
|
||||
A tile is a nice little representation of a type that may be acted upon and
|
||||
reduced in several ways. Remember, types never show up in our code -- they are
|
||||
always produced by inference. But sometimes it's convenient to have little
|
||||
macros to do all the little things we usually want to do with types without
|
||||
having to rewrite them for every type.
|
||||
Tiles are not types.
|
||||
|
||||
Formalizing the operations on a tile, there are exactly four. We will briefy
|
||||
describe them here, but they are documented more thoroughly elsewhere.
|
||||
Every tile has a unique associated type, or "icon".
|
||||
|
||||
A tile is not a type, but every tile corresponds to some type.
|
||||
|
||||
Formalizing the operations on a tile, there are exactly four.
|
||||
asdf
|
||||
|
||||
adfa
|
||||
Bunting a tile is simply creating a blank default example of the tile's icon.
|
||||
This may seem to have limited usefulness, but this is actually the most common
|
||||
use of tiles. This is due to the the way in which we create, for example,
|
||||
@ -139,88 +92,139 @@ never directly used in any Hoon source code. Whipping is used internally by
|
||||
clamming.
|
||||
|
||||
In summary, a tile is simply a convenient syntax for creating well-typed nouns.
|
||||
Say that again, *a tile is simply a convenient syntax for creating well-typed
|
||||
nouns.* A tile is not a tiwg, but tiles always are reduced statically in one
|
||||
of four ways to a twig. `++tile` is a sort of intermediate representation
|
||||
between text and twigs that is often used when we're referring to types.
|
||||
|
||||
A tile is not a twig, but tiles always are reduced statically in one
|
||||
of four ways to a twig.
|
||||
|
||||
Returning from our digression about types and tiles to our discussion of the
|
||||
compilation process, recall that `++mint` takes a subject-type and a twig and
|
||||
compiles it into a product-type and a blob of nock.
|
||||
|
||||
Type Inference
|
||||
--------------
|
||||
|
||||
Hoon is a higher-order typed functional language. Most languages in this class,
|
||||
Haskell and ML being prominent examples, use something called the
|
||||
Hindley-Milner unification algorithm. Hoon uses its own special sauce instead.
|
||||
|
||||
Hoon's philosophy is that a language is a UI for programmers, and the basic
|
||||
test of a UI is its predictability. It is impossible (for most programmers)
|
||||
to learn a language properly unless they know what the compiler is doing, which
|
||||
in practice means mentally stepping through the algorithms it uses (with the
|
||||
exception of semantically neutral optimizations). Haskell is a hard language to
|
||||
learn (for most programmers) because it's hard (for most programmers) to follow
|
||||
what the Haskell compiler is thinking.
|
||||
|
||||
|
||||
Broadly speaking, type inference in Hoon has three general limitations as
|
||||
compared to Hindley-Milner inference.
|
||||
|
||||
1. Hoon does not think backwards. For instance, it cannot infer a function's
|
||||
argument type (or rather, a gate's sample type) from its body.
|
||||
|
||||
2. Hoon can infer through tail recursion, but not head recursion. It can check
|
||||
head recursion, however, given an annotation.
|
||||
|
||||
3. The compiler catches most but not all divergent inference problems - i.e.
|
||||
you can put the compiler into an infinite loop or exponential equivalent.
|
||||
An interrupt will still show you your error location.
|
||||
|
||||
Although an inference algorithm that reasons only forward must and does require
|
||||
a few more annotations from the programmer, the small extra burden on her
|
||||
fingers is more than offset by the lighter load on her hippocampus.
|
||||
Furthermore, programs also exist to be read. Some of these annotations (which a
|
||||
smarter algorithm might infer by steam) may annoy the actual author of the code
|
||||
but be a lifesaver for her replacement.
|
||||
|
||||
Our experience is that these limitations are minor annoyances at worst and
|
||||
prudent restrictions at best. Your mileage may vary.
|
||||
|
||||
Type inference is a frightening problem, especially if you've been exposed to
|
||||
a wall of math. Your only real problem in learning Hoon is to learn not to
|
||||
fear it. Once you work past this reasonable but entirely unfounded fear of
|
||||
inference, your Hoon experience will be simple, refreshing and delightful. So
|
||||
first, let's talk through a few reassuring facts:
|
||||
|
||||
1. Type inference in Hoon never sees a tile. It operates exclusively on twigs.
|
||||
All tiles and synthetic twigs are reduced to natural twigs for the inference
|
||||
engine's benefit.
|
||||
|
||||
2. The semantics of Hoon are in ++ut in hoon.hoon, and nowhere else.
|
||||
|
||||
3. Within ++ut, all the semantics of Hoon are in the call graph of one arm: ++mint.
|
||||
++mint has a case for every natural hoon. So do ++play and ++mull,
|
||||
but their semantics are redundant with ++mint.
|
||||
|
||||
4. One leg in the sample of ++mint - gol - which looks for all the world like a
|
||||
mechanism for backward inference, is not. It is semantically irrelevant and
|
||||
only exists to get better localization on error reports.
|
||||
|
||||
5. ++mint is the gate that maps [type twig] to [type nock]:
|
||||
|
||||
[subject-type twig] => [product-type nock-formula]
|
||||
|
||||
When we have a type that describes the subject for the formula we're trying to
|
||||
generate, as we generate that formula we want to also generate a type for the
|
||||
product of that formula on that subject. As long as subject-type is a
|
||||
correct description of some subject, you can take any twig and compile it
|
||||
against subject-type, producing a formula such that *(subject formula) is a
|
||||
product correctly described by product-type.
|
||||
|
||||
|
||||
Compilation
|
||||
------------
|
||||
|
||||
`++make` is a top-level function that turns text into nock.
|
||||
|
||||
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<')
|
||||
[8 [1 0 0] [1 0 6] 0 1]
|
||||
|
||||
Another way to do this is with `!=`.
|
||||
|
||||
~hoclur-bicrel/try=> !=(|=([@ @ud] +<))
|
||||
[8 [1 0 0] [1 0 6] 0 1]
|
||||
|
||||
`++make` is more general, in that it can be called on programmatically
|
||||
generated text, but `!=` is convenient for learning and debugging.
|
||||
|
||||
The compilation process is as follows:
|
||||
|
||||
First, a runic expression is parsed into an abstact syntax tree, called a
|
||||
`twig`:
|
||||
|
||||
text => twig
|
||||
|
||||
Parsing code into a `twig` can be done with `++ream`:
|
||||
|
||||
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<')
|
||||
[ %brts
|
||||
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
|
||||
q=[%cnts p=~[[%.y p=6]] q=~]
|
||||
]
|
||||
|
||||
Refer to the Syntax section for more detail on parsing.
|
||||
|
||||
The compiler proper, is `++mint` in `++ut`.
|
||||
|
||||
++mint takes a twig and a subject type and produces a product type and a nock formula.
|
||||
|
||||
[subject-type twig] => [product-type nock-formula]
|
||||
|
||||
As we compile each twig, we compile it against a subject type. During the
|
||||
compilation of a twig, we obviously don't have access to the value of the
|
||||
subject (else compilation would include running the code). We do, however,
|
||||
have some guarantees about its value. We have its type.
|
||||
For example, we can call `++mint` on the twig we produced earlier with a
|
||||
subject type of any noun:
|
||||
|
||||
Most runes don't change the subject type, but some do. In the most simple
|
||||
example, consider `=> p q`, which means to evaluate `q` with a subject of
|
||||
`p`. Here, the subject type of `q` is simply the type of `p`.
|
||||
~hoclur-bicrel/try=>
|
||||
(~(mint ut %noun) %noun [%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
|
||||
|
||||
In a slightly more complicated example, consider `?: p q r`, which means
|
||||
simply to evaluate `q` if `p` is true, else evaluate `r`. When compiling `q`,
|
||||
we get to assume in the subject type that `p` is true while when compiling `r`,
|
||||
we get to assume in the subject type that `p` is false. This is used to great
|
||||
practical purpose, for example, when handling lists. If `p` is a test whether
|
||||
the list is empty (which would be `?=(~ l)`), then in `q` we can assume that
|
||||
the list is empty while in `r` we know that the list has a head and a tail.
|
||||
Without that test, if you attempt to refer to the head of the list, the
|
||||
compiler will complain that it cannot verify there is even a head to which to
|
||||
refer. Thus, the compiler will generate `find-limb` and `find-fork` errors.
|
||||
The `find-limb` refers to its inability to find the head of the list while the
|
||||
`find-fork` refers to the fact that it's looking in a fork type -- that is, it
|
||||
believes that the values is of one of multiple types, and in at least one of
|
||||
the constituent types there is no head. The error messages are described in
|
||||
detail in the lexicon.
|
||||
|
||||
Recapping, when compiling Hoon code, we compile each individual twig against
|
||||
the subject with which it will eventually be called. The product is a nock
|
||||
formula and the type of value that it may produce. Thus, both the nock formula
|
||||
and the product type may depend on the both the subject type and the twig in
|
||||
question.
|
||||
|
||||
It's obvious that the product type will usually depend on the subject type, but
|
||||
it's less obvious when the nock formula will depend on the type. It does,
|
||||
however, happen at times. When resolving a face, for example, the axis that
|
||||
ends up in the nock formula depends on the where the face is in the subject.
|
||||
We only know this because faces are in the subject type. Thus, in `=> [p=42
|
||||
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
|
||||
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
|
||||
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
|
||||
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
|
||||
this reason that we say that a type defines not only a set of values, but also
|
||||
the semantics for operating on those values.
|
||||
|
||||
As long as some value is in the subject type, you can run it against the
|
||||
produced nock formula as `*[subject formula]` and get a value in the product
|
||||
type.
|
||||
|
||||
We've ignored one question thus far: it's all well and good once we've started
|
||||
compiling, for we know the subject type. But what subject type do we start
|
||||
with? We could, of course, put some restrictions on the subject type of
|
||||
compiled Hoon code, but (1) there's no reason to do that, and (2) since this
|
||||
will be returned as nock, and nock is untyped, the compiler cannot actually
|
||||
make any guarantees about what subject it will be called with. Thus, we start
|
||||
the compilation with a subject type of all nouns.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Hoon has 120 [XX count] digraph runes. The choice of glyph is not random. The
|
||||
first defines a semantic category (with some exceptions). These categories are:
|
||||
|
||||
| bar core construction
|
||||
$ buc tiles and tiling
|
||||
% cen invocations
|
||||
: col tuples
|
||||
. dot nock operators
|
||||
^ ket type conversions
|
||||
; sem miscellaneous macros
|
||||
~ sig hints
|
||||
= tis compositions
|
||||
? wut conditionals, booleans, tests
|
||||
! zap special operations
|
||||
[ p
|
||||
[ %core
|
||||
p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||
q
|
||||
[ p=%gold
|
||||
q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||
r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
|
||||
]
|
||||
]
|
||||
q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
|
||||
]
|
||||
|
||||
Note that the head of the above is a type, which in this case is a gold core
|
||||
with an ash gate. The second part, though, is (with a few labels, or faces added) the nock that was
|
||||
produced by `++make`.
|
||||
|
Loading…
Reference in New Issue
Block a user