mirror of
https://github.com/urbit/shrub.git
synced 2024-12-14 01:08:54 +03:00
Some more doc
This commit is contained in:
parent
b180f779ea
commit
a8931ecb19
553
main/pub/src/doc/ref/hoon/lex-noun.md
Normal file
553
main/pub/src/doc/ref/hoon/lex-noun.md
Normal file
@ -0,0 +1,553 @@
|
|||||||
|
Lexicon: Nouns
|
||||||
|
=======
|
||||||
|
|
||||||
|
Atom Syntax
|
||||||
|
----------
|
||||||
|
|
||||||
|
|
||||||
|
###Canonical Atom Odors
|
||||||
|
|
||||||
|
An odor is an atom format that specifies an atomic subtype.
|
||||||
|
|
||||||
|
```
|
||||||
|
@c UTF-32 codepoint
|
||||||
|
@d date
|
||||||
|
@da absolute date
|
||||||
|
@dr relative date (ie, timespan)
|
||||||
|
@f yes or no (inverse boolean)
|
||||||
|
@n nil
|
||||||
|
@p phonemic base
|
||||||
|
@r IEEE floating-point
|
||||||
|
@rd double precision (64 bits)
|
||||||
|
@rh half precision (16 bits)
|
||||||
|
@rq quad precision (128 bits)
|
||||||
|
@rs single precision (32 bits)
|
||||||
|
@s signed integer, sign bit low
|
||||||
|
@sb signed binary
|
||||||
|
@sd signed decimal
|
||||||
|
@sv signed base32
|
||||||
|
@sw signed base64
|
||||||
|
@sx signed hexadecimal
|
||||||
|
@t UTF-8 text (cord)
|
||||||
|
@ta ASCII text (span)
|
||||||
|
@tas ASCII symbol (term)
|
||||||
|
@u unsigned integer
|
||||||
|
@ub unsigned binary
|
||||||
|
@ud unsigned decimal
|
||||||
|
@uv unsigned base32
|
||||||
|
@uw unsigned base64
|
||||||
|
@ux unsigned hexadecimal
|
||||||
|
```
|
||||||
|
|
||||||
|
###Unsigned decimal, @ud
|
||||||
|
|
||||||
|
Hoon's unsigned decimal format is the normal Continental syntax. It differs
|
||||||
|
from the Anglo-American only in the use of periods, rather than commas, between
|
||||||
|
groups of 3:
|
||||||
|
|
||||||
|
~zod/try=> 19
|
||||||
|
19
|
||||||
|
~zod/try=> 1.024
|
||||||
|
1.024
|
||||||
|
|
||||||
|
An unsigned decimal not broken into groups is a syntax error. Also, whitespace
|
||||||
|
or even linebreaks can appear between the dot and the next group.
|
||||||
|
|
||||||
|
~zod/try=> 65. 536
|
||||||
|
65.536
|
||||||
|
|
||||||
|
###Unsigned hexadecimal, @ux
|
||||||
|
|
||||||
|
@ux has the same syntax as @ud, except that it's prefixed by 0x and uses groups
|
||||||
|
of four. Hex digits are lowercase only.
|
||||||
|
|
||||||
|
~zod/try=> 0x0
|
||||||
|
0x0
|
||||||
|
~zod/try=> `@ud`0x17
|
||||||
|
23
|
||||||
|
|
||||||
|
###Unsigned base64 and base32, @uv, @uw
|
||||||
|
|
||||||
|
The prefix is 0w for base64 and 0v for base32. The digits for @uw are, in
|
||||||
|
order: 0-9, a-z, A-Z, -, ~:
|
||||||
|
|
||||||
|
~zod/try=> `@ud`0w-
|
||||||
|
62
|
||||||
|
|
||||||
|
For @uv, the digits are 0-9, a-v.
|
||||||
|
|
||||||
|
Signed integers, @sd, @sx, @sw, @sv, @sb
|
||||||
|
|
||||||
|
Obviously, without finite-sized integers, the sign extension trick does not
|
||||||
|
work. A signed integer in Hoon is a different way to use atoms than an unsigned
|
||||||
|
integer; even for positive numbers, the signed integer cannot equal the
|
||||||
|
unsigned.
|
||||||
|
|
||||||
|
The prefix for a negative signed integer is a single - before the unsigned
|
||||||
|
syntax. The prefix for a positive signed integer is --. The sign bit is the low
|
||||||
|
bit:
|
||||||
|
|
||||||
|
~zod/try=> -1
|
||||||
|
-1
|
||||||
|
~zod/try=> --1
|
||||||
|
--1
|
||||||
|
~zod/try=> `@ud`-1
|
||||||
|
1
|
||||||
|
~zod/try=> `@ud`--1
|
||||||
|
2
|
||||||
|
~zod/try=> `@ud`-2
|
||||||
|
3
|
||||||
|
~zod/try=> `@ud`--2
|
||||||
|
4
|
||||||
|
~zod/try=> `@ux`-0x10
|
||||||
|
0x1f
|
||||||
|
~zod/try=> `@ux`--0x10
|
||||||
|
0x20
|
||||||
|
~zod/try=> `@ud`--0w-
|
||||||
|
124
|
||||||
|
~zod/try=> `@sw`124
|
||||||
|
--0w-
|
||||||
|
|
||||||
|
###Absolute date, @da
|
||||||
|
|
||||||
|
Urbit dates represent 128-bit chronological time, with 2^64 seconds from the
|
||||||
|
start of the universe to the end. 2^127 is 3:30:08 PM on December 5, AD 226,
|
||||||
|
for reasons not clear or relevant:
|
||||||
|
|
||||||
|
~zod/try=> `@da`(bex 127)
|
||||||
|
~226.12.5..15.30.08
|
||||||
|
|
||||||
|
~zod/try=> `@da`(dec (bex 127))
|
||||||
|
~226.12.5..15.30.07..ffff.ffff.ffff.ffff
|
||||||
|
|
||||||
|
The time of day and/or second fragment is optional:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~2013.12.7
|
||||||
|
0x8000.000d.2140.7280.0000.0000.0000.0000
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~2013.12.7..15.30.07
|
||||||
|
0x8000.000d.2141.4c7f.0000.0000.0000.0000
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~2013.12.7..15.30.07..1234
|
||||||
|
0x8000.000d.2141.4c7f.1234.0000.0000.0000
|
||||||
|
|
||||||
|
We also do BC:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~226-.12.5
|
||||||
|
0x7fff.fffc.afb1.b800.0000.0000.0000.0000
|
||||||
|
|
||||||
|
The semantics of the time system are that UGT (Urbit Galactic Time) is GMT/UTC
|
||||||
|
as of leap second 25. UGT is chronological and will never add leap seconds,
|
||||||
|
even if UTC continues this mistake. If a gap appears, it must be resolved in
|
||||||
|
the presentation layer, with timezones and other human curiosities.
|
||||||
|
|
||||||
|
See section 2cH of the source for more details.
|
||||||
|
|
||||||
|
###Relative date, @dr
|
||||||
|
|
||||||
|
It's also nice to have a syntax for basic time intervals:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~s1
|
||||||
|
0x1.0000.0000.0000.0000
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~m1
|
||||||
|
0x3c.0000.0000.0000.0000
|
||||||
|
|
||||||
|
~zod/try=> (div ~m1 ~s1)
|
||||||
|
60
|
||||||
|
|
||||||
|
~zod/try=> (div ~h1 ~m1)
|
||||||
|
60
|
||||||
|
|
||||||
|
~zod/try=> (div ~h1 ~s1)
|
||||||
|
3.600
|
||||||
|
|
||||||
|
~zod/try=> (div ~d1 ~h1)
|
||||||
|
24
|
||||||
|
|
||||||
|
~zod/try=> `@da`(add ~2013.11.30 ~d1)
|
||||||
|
~2013.12.1
|
||||||
|
|
||||||
|
There are no @dr intervals under a second or over a day. Since the resolution
|
||||||
|
is so high, though, (div ~s1 1.000.000) produces a pretty accurate microsecond.
|
||||||
|
|
||||||
|
###Loobean, @f
|
||||||
|
|
||||||
|
A loobean, or just bean, is 0 or 1. 0 is yes, 1 is no:
|
||||||
|
|
||||||
|
~zod/try=> `@ud`.y
|
||||||
|
0
|
||||||
|
~zod/try=> `@ud`.n
|
||||||
|
1
|
||||||
|
|
||||||
|
###Nil, @n
|
||||||
|
|
||||||
|
Nil indicates an absence of information, as in a list terminator. The only
|
||||||
|
value is ~, 0.
|
||||||
|
|
||||||
|
~zod/try=> `@ud`~
|
||||||
|
0
|
||||||
|
|
||||||
|
###Unicode text, @t
|
||||||
|
|
||||||
|
@t is a sequence of UTF-8 bytes, LSB first - sometimes called a cord. For
|
||||||
|
lowercase numbers and letters, the canonical syntax is ~~text:
|
||||||
|
|
||||||
|
~zod/try=> ~~foo
|
||||||
|
'foo'
|
||||||
|
|
||||||
|
Note that the prettyprinter makes an unprincipled exception and prints the text
|
||||||
|
in a noncanonical format:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~~foo
|
||||||
|
0x6f.6f66
|
||||||
|
|
||||||
|
We want to be able to encode an arbitrary Unicode string as a single URL-safe
|
||||||
|
token, using no punctuation but .~-, in @t. Space is ., . is ~., ~ is ~~, - is
|
||||||
|
-:
|
||||||
|
|
||||||
|
~zod/try=> ~~foo.bar
|
||||||
|
'foo bar'
|
||||||
|
~zod/try=> ~~foo.bar~.baz~~moo-hoo
|
||||||
|
'foo bar.baz~moo-hoo'
|
||||||
|
|
||||||
|
For all other ASCII/Unicode characters, insert the Unicode codepoint in
|
||||||
|
lower-case hexadecimal, followed by .. For example, for U+2605 "BLACK STAR",
|
||||||
|
write:
|
||||||
|
|
||||||
|
~zod/try=> ~~foo~2605.bar
|
||||||
|
'foo★bar'
|
||||||
|
|
||||||
|
This UTF-32 codepoint is of course converted to UTF-8:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~~foo~2605.bar
|
||||||
|
0x72.6162.8598.e26f.6f66
|
||||||
|
|
||||||
|
###URL-safe ASCII text, @ta
|
||||||
|
|
||||||
|
@ta encodes the ASCII subset that all canonical atom syntaxes restrict
|
||||||
|
themselves to. The prefix is ~.. There are no escape sequences except ~~, which
|
||||||
|
means ~, and ~-, which means \_. - and . encode themselves. No other characters
|
||||||
|
besides numbers and lowercase letters need apply.
|
||||||
|
|
||||||
|
~zod/try=> `@t`~.foo
|
||||||
|
'foo'
|
||||||
|
~zod/try=> `@t`~.foo.bar
|
||||||
|
'foo.bar'
|
||||||
|
~zod/try=> `@t`~.foo~~bar
|
||||||
|
'foo~bar'
|
||||||
|
~zod/try=> `@t`~.foo~-bar
|
||||||
|
'foo_bar'
|
||||||
|
~zod/try=> `@t`~.foo-bar
|
||||||
|
'foo-bar'
|
||||||
|
|
||||||
|
A @ta atom is called a span.
|
||||||
|
|
||||||
|
###Codepoint, @c
|
||||||
|
|
||||||
|
Normally when we build atoms of Unicode text, we use a UTF-8 bytestream, LSB
|
||||||
|
first. But sometimes it's useful to build atoms of one or more UTF-32 words.
|
||||||
|
|
||||||
|
The codepoint syntax is the same as @t, except with a ~- prefix. Let's repeat
|
||||||
|
our examples, with hex display:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~-foo
|
||||||
|
0x6f.0000.006f.0000.0066
|
||||||
|
|
||||||
|
~zod/try=> `@ux`~-foo.bar
|
||||||
|
0x72.0000.0061.0000.0062.0000.0020.0000.006f.0000.006f.0000.0066
|
||||||
|
|
||||||
|
###Phonemic, @p
|
||||||
|
|
||||||
|
We've seen @p used for ships, of course. But it's not just for ships - it's for
|
||||||
|
any short number optimized for memorability, not for arithmetic. @p is great
|
||||||
|
for checksums, for instance.
|
||||||
|
|
||||||
|
That said, @p is subtly customized for the sociopolitical design of Urbit as a
|
||||||
|
digital republic. For example, one feature we don't want is the ability to see
|
||||||
|
at a glance which carrier and cruiser issued a destroyer. Consider the carrier
|
||||||
|
0x21:
|
||||||
|
|
||||||
|
~zod/try=> `@p`0x21
|
||||||
|
~mep
|
||||||
|
|
||||||
|
It issues 255 cruisers, including 0x4321:
|
||||||
|
|
||||||
|
~zod/try=> `@p`0x4321
|
||||||
|
~pasnut
|
||||||
|
|
||||||
|
Which issues 65.535 destroyers, including 0x8765.4321 and several successors:
|
||||||
|
|
||||||
|
~zod/try=> `@p`0x8765.4321
|
||||||
|
~famsyr-dirwes
|
||||||
|
~zod/try=> `@p`0x8766.4321
|
||||||
|
~lidlug-maprec
|
||||||
|
~zod/try=> `@p`0x8767.4321
|
||||||
|
~tidlus-roplen
|
||||||
|
~zod/try=> `@p`0x8768.4321
|
||||||
|
~lisnel-lonbet
|
||||||
|
|
||||||
|
Of course, anyone who can juggle bits can see that ~famsyr-dirwes is a close
|
||||||
|
cousin of ~lidlug-maprec. But she actually has to juggle bits to do it.
|
||||||
|
Obfuscation does not prevent calculated associations, just automatic ones.
|
||||||
|
|
||||||
|
But at the yacht level, we actually want to see a uniform 32-bit space of
|
||||||
|
yachts directly associated with the destroyer:
|
||||||
|
|
||||||
|
~zod/try=> `@p`0x9.8765.4321
|
||||||
|
~talfes-sibzod-famsyr-dirwes
|
||||||
|
~zod/try=> `@p`0xba9.8765.4321
|
||||||
|
~tacbep-ronreg-famsyr-dirwes
|
||||||
|
~zod/try=> `@p`0xd.cba9.8765.4321
|
||||||
|
~bicsub-ritbyt-famsyr-dirwes
|
||||||
|
~zod/try=> `@p`0xfed.cba9.8765.4321
|
||||||
|
~sivrep-hadfeb-famsyr-dirwes
|
||||||
|
|
||||||
|
###IPv4 and IPv6 addresses, @if, @is
|
||||||
|
|
||||||
|
Urbit lives atop IP and would be very foolish to not support a syntax for the
|
||||||
|
large atoms that are IPv4 and IPv6 addresses.
|
||||||
|
|
||||||
|
@if is the standard IPv4 syntax, prefixed with .:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`.127.0.0.1
|
||||||
|
0x7f00.0001
|
||||||
|
|
||||||
|
@is is the same as @if, but with 8 groups of 4 hex digits:
|
||||||
|
|
||||||
|
~zod/try=> `@ux`.dead.beef.0.cafe.42.babe.dead.beef
|
||||||
|
0xdead.beef.0000.cafe.0042.babe.dead.beef
|
||||||
|
|
||||||
|
|
||||||
|
###Floating Point, @rs, @rd, @rq, @rh
|
||||||
|
|
||||||
|
The syntax for a single-precision float is the normal English syntax, with a . prefix:
|
||||||
|
|
||||||
|
.6.2832 :: τ as @rs
|
||||||
|
.-6.2832 :: -τ as @rs
|
||||||
|
.~6.2832 :: τ as @rd
|
||||||
|
.~-6.2832 :: -τ as @rd
|
||||||
|
.~~6.2832 :: τ as @rh
|
||||||
|
.~~~6.2832 :: τ as @rq
|
||||||
|
|
||||||
|
(Hoon is a Tauist language and promotes International Tau Day.)
|
||||||
|
|
||||||
|
###Transparent cell syntax
|
||||||
|
|
||||||
|
By adding _, we can encode arbitrary nouns in our safe subset. The prefix to a
|
||||||
|
canonical cell is ._; the separator is _; the terminator is __. Thus:
|
||||||
|
|
||||||
|
~zod/try=> ._3_4__
|
||||||
|
[3 4]
|
||||||
|
|
||||||
|
~zod/try=> :type; ._.127.0.0.1_._0x12_19___~tasfyn-partyv__
|
||||||
|
[.127.0.0.1 [0x12 19] ~tasfyn-partyv]
|
||||||
|
[@if [@ux @ud] @p]
|
||||||
|
|
||||||
|
Those who don't see utility in this strange feature have perhaps never needed
|
||||||
|
to jam a data structure into a URL.
|
||||||
|
|
||||||
|
###Opaque noun syntax
|
||||||
|
|
||||||
|
Speaking of jam, sometimes we really don't care what's inside our noun. Then,
|
||||||
|
the syntax to use is a variant of @uw prefixed by ~, which incorporates the
|
||||||
|
built-in jam and cue marshallers:
|
||||||
|
|
||||||
|
~zod/try=> (jam [3 4])
|
||||||
|
78.241
|
||||||
|
~zod/try=> `@uw`(jam [3 4])
|
||||||
|
0wj6x
|
||||||
|
~zod/try=> (cue 0wj6x)
|
||||||
|
[3 4]
|
||||||
|
~zod/try=> ~0wj6x
|
||||||
|
[3 4]
|
||||||
|
|
||||||
|
Noncanonical Syntax
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
These are syntaxes for constants which don't fit the canonical character-set
|
||||||
|
constraints.
|
||||||
|
|
||||||
|
###Cubes, @tas
|
||||||
|
|
||||||
|
@tas, a term, is our most exclusive odor. The only characters permitted are
|
||||||
|
lowercase ASCII, - except as the first or last character, and 0-9 except as the
|
||||||
|
first character.
|
||||||
|
|
||||||
|
The syntax for @tas is the text itself, always preceded by %. This means a term
|
||||||
|
is always cubical. You can cast it to @tas if you like, but we just about
|
||||||
|
always want the cube:
|
||||||
|
|
||||||
|
~zod/try=> %dead-fish9
|
||||||
|
%dead-fish9
|
||||||
|
|
||||||
|
~zod/try=> -:!>(%dead-fish9)
|
||||||
|
[%cube p=271.101.667.197.767.630.546.276 q=[%atom p=%tas]]
|
||||||
|
|
||||||
|
The empty @tas has a special syntax, $:
|
||||||
|
|
||||||
|
~zod/try=> %$
|
||||||
|
%$
|
||||||
|
|
||||||
|
A term without % is not a constant, but a name:
|
||||||
|
|
||||||
|
~zod/try=> dead-fish9
|
||||||
|
! -find-limb.dead-fish9
|
||||||
|
! find-none
|
||||||
|
! exit
|
||||||
|
|
||||||
|
A common structure in Hoon is a noun with a cubical head and an arbitrary tail:
|
||||||
|
|
||||||
|
~zod/try=> [%foo 'bar']
|
||||||
|
[%foo 'bar']
|
||||||
|
|
||||||
|
This structure may be generated with the following irregular syntax:
|
||||||
|
|
||||||
|
~zod/try=> a/'bar'
|
||||||
|
[%a 'bar']
|
||||||
|
|
||||||
|
###Loobeans, @f
|
||||||
|
|
||||||
|
.y is a little cumbersome, so we can say & and |. The % prefix cubes as usual.
|
||||||
|
|
||||||
|
~zod/try=> `@ud`&
|
||||||
|
0
|
||||||
|
~zod/try=> `@ud`|
|
||||||
|
1
|
||||||
|
|
||||||
|
###Cords, @t
|
||||||
|
|
||||||
|
The canonical ~~ syntax for @t, while it has its place, is intolerable in a
|
||||||
|
number of ways - especially when it comes to escaping capitals. So @t is both
|
||||||
|
printed and parsed in a conventional-looking single-quote syntax:
|
||||||
|
|
||||||
|
~zod/try=> 'foo bar'
|
||||||
|
'foo bar'
|
||||||
|
~zod/try=> `@ux`'foo bar'
|
||||||
|
0x72.6162.206f.6f66
|
||||||
|
Escape ' with \:
|
||||||
|
|
||||||
|
~zod/try=> 'Foo \'bar'
|
||||||
|
'Foo \'bar'
|
||||||
|
~zod/try=> `@ux`'\''
|
||||||
|
0x27
|
||||||
|
|
||||||
|
###Strings
|
||||||
|
|
||||||
|
Text in Hoon is generally manipulated in two ways, depending on what you're
|
||||||
|
doing: as an atomic cord/span/term, or as a tape which is a list of bytes (not
|
||||||
|
codepoints).
|
||||||
|
|
||||||
|
To generate a tape, use double quotes:
|
||||||
|
|
||||||
|
~zod/try=> "foo"
|
||||||
|
"foo"
|
||||||
|
~zod/try=> `*`"foo"
|
||||||
|
[102 111 111 0]
|
||||||
|
|
||||||
|
We're getting off the constant reservation, but strings also interpolate with curly-braces:
|
||||||
|
|
||||||
|
~zod/try=> "hello {(weld "wor" "ld")} is a fun thing to say"
|
||||||
|
"hello world is a fun thing to say"
|
||||||
|
|
||||||
|
And they can be joined across space or lines with a .:
|
||||||
|
|
||||||
|
~zod/try=> "hello"."world"
|
||||||
|
"helloworld"
|
||||||
|
~zod/try=> "hello". "world"
|
||||||
|
"helloworld"
|
||||||
|
|
||||||
|
Lists
|
||||||
|
-----
|
||||||
|
|
||||||
|
A list in Hoon is a null-terminated tuple. See section 2bB for Hoon's List library.
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list)`[3 4 ~]
|
||||||
|
~[3 4]
|
||||||
|
it(*)
|
||||||
|
|
||||||
|
The list type can be further specified by a subtype:
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list ,@ud)`[3 4 ~]
|
||||||
|
~[3 4]
|
||||||
|
it(@ud)
|
||||||
|
|
||||||
|
The above example is a list of @ud, meaning the all values in the list must be of type @ud.
|
||||||
|
|
||||||
|
Polymorphic lists can be specified by constructing a more complex type:
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list ?(@ud @ux))`[3 0xf ~]
|
||||||
|
~[3 15]
|
||||||
|
it({@ud @ux})
|
||||||
|
|
||||||
|
Not specifing a more complex type defaults to a list of raw nouns:
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list)`[3 0xf ~]
|
||||||
|
~[3 15]
|
||||||
|
it(*)
|
||||||
|
|
||||||
|
Null-terminated tuples may be generated with the following syntax:
|
||||||
|
|
||||||
|
~zod/try=> ~[3 0xf]
|
||||||
|
[3 0xf ~]
|
||||||
|
|
||||||
|
~zod/try=> :type; ~[3 0xf]
|
||||||
|
[3 0xf ~]
|
||||||
|
[@ud @ux %~]
|
||||||
|
|
||||||
|
Note that this syntax is not automatically typed as a list, but may be cast as such:
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list ?(@ux @ud))`~[3 0xf]
|
||||||
|
~[0x3 0xf]
|
||||||
|
it({@ux @ud})
|
||||||
|
|
||||||
|
Furthermore, a different syntax may be used to group the entire noun in the
|
||||||
|
head of a null terminated tuple:
|
||||||
|
|
||||||
|
|
||||||
|
~zod/try=> [3 0xf]~
|
||||||
|
[[3 0xf] ~]
|
||||||
|
|
||||||
|
This is often used to easily generate a list from a single noun.
|
||||||
|
|
||||||
|
~zod/try=> :type; `(list ,[@ud @ux])`[3 0xf]~
|
||||||
|
~[[3 0xf]]
|
||||||
|
it([@ud @ux])
|
||||||
|
|
||||||
|
The above is typed as a list of decimal hexadecimal pairs.
|
||||||
|
|
||||||
|
Units
|
||||||
|
-----
|
||||||
|
|
||||||
|
A Unit is Hoon's "maybe" type. As in, either some value or null. See section 2bA for Hoon's Unit library. Units are represented by a cell whose head is null and whose tail is some noun:
|
||||||
|
|
||||||
|
~zod/try=> `(unit ,@ud)`[~ 3]
|
||||||
|
[~ 3]
|
||||||
|
|
||||||
|
The unit type can be further specified by a subtype. The above example is a unit of @ud, meaning the optional value must be of type @ud.
|
||||||
|
|
||||||
|
~zod/try=> `(unit ,@ud)`[~ [3 3]]
|
||||||
|
! type-fail
|
||||||
|
! exit
|
||||||
|
|
||||||
|
~zod/try=> `(unit ,^)`[~ [3 3]]
|
||||||
|
[~ [3 3]]
|
||||||
|
|
||||||
|
For convenience, a null-headed noun may be specified with the following irregular syntax:
|
||||||
|
|
||||||
|
~zod/try=> `3
|
||||||
|
[~ 3]
|
||||||
|
|
||||||
|
~zod/try=> :type; `3
|
||||||
|
[~ 3]
|
||||||
|
[%~ @ud]
|
||||||
|
|
||||||
|
Note that this syntax is not automatically typed as a Unit, but may be cast as such:
|
||||||
|
|
||||||
|
~zod/try=> :type; `(unit)``3
|
||||||
|
[~ 3]
|
||||||
|
u(*)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
2521
main/pub/src/doc/ref/hoon/lex-rune.md
Normal file
2521
main/pub/src/doc/ref/hoon/lex-rune.md
Normal file
File diff suppressed because it is too large
Load Diff
2612
main/pub/src/doc/ref/hoon/lex-tile.md
Normal file
2612
main/pub/src/doc/ref/hoon/lex-tile.md
Normal file
File diff suppressed because it is too large
Load Diff
2612
main/pub/src/doc/ref/hoon/lex-twig.md
Normal file
2612
main/pub/src/doc/ref/hoon/lex-twig.md
Normal file
File diff suppressed because it is too large
Load Diff
@ -1,117 +1,70 @@
|
|||||||
Morphology
|
Morphology
|
||||||
==========
|
==========
|
||||||
|
|
||||||
Hoon is a statically typed language that compiles to Nock. You can test out
|
Hoon is a statically typed language that compiles to Nock.
|
||||||
most of the compilation routines in the REPL, so we'll point out some of those
|
|
||||||
as we go along. At the highest level, there is `++make`, which turns text into
|
|
||||||
nock.
|
|
||||||
|
|
||||||
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<')
|
Types
|
||||||
[8 [1 0 0] [1 0 6] 0 1]
|
-----
|
||||||
|
|
||||||
If you want to know how Hoon code is compiled, start your investigation in
|
Types are nouns the compiler keeps around as it turns your Hoon into Nock.
|
||||||
`++make`. Another way to do this, of course, is with `!=`.
|
|
||||||
|
|
||||||
~hoclur-bicrel/try=> !=(|=([@ @ud] +<))
|
A type serve two purposes:
|
||||||
[8 [1 0 0] [1 0 6] 0 1]
|
|
||||||
|
|
||||||
The difference here is of course that `!=` requires you to put in the code
|
1. It defines a set of nouns. Any finite noun is either in this set, or not in
|
||||||
literally and syntactically correctly. So, for example, you can't use tall
|
it.
|
||||||
form runes within a wide form usage of `!=`. `++make` may be called on literal
|
|
||||||
text, which may even be programmatically generated. Essentially, `++make` is
|
|
||||||
more general, but `!=` is convenient for learning and debugging.
|
|
||||||
|
|
||||||
At a high level, the compilation process is as follows:
|
2. it ascribes semantics to all nouns in this set. For example, a Hoon type
|
||||||
|
exports a semantic namespace.
|
||||||
|
|
||||||
First, a runic expression is parsed into an abstact syntax tree, called a
|
|
||||||
`twig`:
|
|
||||||
|
|
||||||
text => twig
|
These are defined in Hoon as one of the ten kinds of type found in `++type`.
|
||||||
|
|
||||||
This can be tested at the command line with `++ream`:
|
At its most general, the `noun` type contains all Hoon nouns. Everything in
|
||||||
|
Hoon is a noun. The `%void` type contains no values at all. The `%atom` type
|
||||||
|
contains only atoms. The `%cube` type is parameterized by a value, and it
|
||||||
|
contains only that single value. The other types are defined in the lexicon.
|
||||||
|
|
||||||
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<')
|
Type inference in Hoon works well enough that there is no direct
|
||||||
[ %brts
|
|
||||||
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
|
|
||||||
q=[%cnts p=~[[%.y p=6]] q=~]
|
|
||||||
]
|
|
||||||
|
|
||||||
This is simply a parser -- no compilation happens here. The details of the
|
|
||||||
parser can be found in the syntax section of this doc.
|
|
||||||
|
|
||||||
Next, we get into the real meat of the compiler. In the compiler proper, which
|
|
||||||
is `++mint` in `++ut`, we take a twig and a subject type and turn it into a
|
|
||||||
product type and a blob of nock.
|
|
||||||
|
|
||||||
[subject-type twig] => [product-type nock-formula]
|
|
||||||
|
|
||||||
For example, we can call `++mint` on the twig we produced earlier with a
|
|
||||||
subject type of any noun (pardon the necessity for splitting the input line).
|
|
||||||
|
|
||||||
~hoclur-bicrel/try=> (~(mint ut %noun) %noun
|
|
||||||
[%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
|
|
||||||
|
|
||||||
[ p
|
|
||||||
[ %core
|
|
||||||
p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
|
||||||
q
|
|
||||||
[ p=%gold
|
|
||||||
q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
|
||||||
r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
|
|
||||||
]
|
|
||||||
]
|
|
||||||
q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
|
|
||||||
]
|
|
||||||
|
|
||||||
Note that in the result we get first a type, which is a gold core with an ash
|
|
||||||
gate. The second part, though, is exactly the nock that was produced by
|
|
||||||
`++make`. It looks a little different because it has some faces on it, but it
|
|
||||||
is in fact the same.
|
|
||||||
|
|
||||||
The astute reader will notice that we casually starting referring to types
|
|
||||||
without defining them. We must rectify this heinous atrocity.
|
|
||||||
|
|
||||||
A "type" is simply a set of possible values, combined with a set of semantics
|
|
||||||
for operating on these values. These are defined in Hoon as one of the ten
|
|
||||||
kinds of type found in `++type`. At its most general, the `noun` type contains
|
|
||||||
all Hoon nouns. Everything in Hoon is a noun. The `%void` type contains no
|
|
||||||
values at all. The `%atom` type contains only atoms. The `%cube` type is
|
|
||||||
parameterized by a value, and it contains only that single value. The other
|
|
||||||
types are defined in the lexicon.
|
|
||||||
|
|
||||||
Our type inference algorithms work well enough that in Hoon there is no direct
|
|
||||||
syntax for defining or declaring a type. There is only a syntax for
|
syntax for defining or declaring a type. There is only a syntax for
|
||||||
constructing twigs. Types are always produced by inference. Of course, since
|
constructing twigs. Types are always produced by inference.
|
||||||
we often need to refer to types, there are a number of runes that are specfically
|
|
||||||
used for referring to types. These runes act on `tile`s.
|
|
||||||
|
|
||||||
What are the usual things we want to do with types? We sometimes want to test
|
|
||||||
if some value is of a particular type. We sometimes want to declare that a
|
|
||||||
gate takes a particular type of argument, or that it produces a particular type
|
|
||||||
of result. We sometimes want to coerce an untyped value into a type so that we
|
|
||||||
always know what type we are operating on. We could write twigs to do each of
|
|
||||||
these things for every type we wish to use, and our type inference algorithms
|
|
||||||
will figure out what type we're talking about each time.
|
|
||||||
|
|
||||||
It would work, but it would be miserable. Thankfully, there's an easier way.
|
When resolving a face, for example, the axis that
|
||||||
|
ends up in the nock formula depends on the where the face is in the subject.
|
||||||
|
We only know this because faces are in the subject type. Thus, in `=> [p=42
|
||||||
|
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
|
||||||
|
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
|
||||||
|
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
|
||||||
|
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
|
||||||
|
this reason that we say that a type defines not only a set of values, but also
|
||||||
|
the semantics for operating on those values.
|
||||||
|
|
||||||
|
|
||||||
|
Tiles
|
||||||
|
-----
|
||||||
|
|
||||||
|
What are the usual things we want to do with types?
|
||||||
|
|
||||||
|
- Test if a noun is of a particular type.
|
||||||
|
|
||||||
|
- Create a blank default example of a type
|
||||||
|
|
||||||
|
- Coerce a noun into a type
|
||||||
|
|
||||||
It is possible to generate twigs for each of the above use cases from two
|
It is possible to generate twigs for each of the above use cases from two
|
||||||
pieces of information: (1) a tile describing a type and (2) a reference to
|
pieces of information: (1) a tile describing a type and (2) a reference to
|
||||||
which case we want to generate. This eliminates much of the tedious
|
which case we want to generate.
|
||||||
boilerplate.
|
|
||||||
|
|
||||||
The first and most important thing to remember about tiles is that tiles are
|
Tiles are not types.
|
||||||
not types. The second and nearly as important thing to remember about tiles
|
|
||||||
is that every tile has a unique "icon", which is an associated type. A tile
|
|
||||||
is not a type, but every tile corresponds to some type. But what is a tile?
|
|
||||||
A tile is a nice little representation of a type that may be acted upon and
|
|
||||||
reduced in several ways. Remember, types never show up in our code -- they are
|
|
||||||
always produced by inference. But sometimes it's convenient to have little
|
|
||||||
macros to do all the little things we usually want to do with types without
|
|
||||||
having to rewrite them for every type.
|
|
||||||
|
|
||||||
Formalizing the operations on a tile, there are exactly four. We will briefy
|
Every tile has a unique associated type, or "icon".
|
||||||
describe them here, but they are documented more thoroughly elsewhere.
|
|
||||||
|
|
||||||
|
A tile is not a type, but every tile corresponds to some type.
|
||||||
|
|
||||||
|
Formalizing the operations on a tile, there are exactly four.
|
||||||
|
asdf
|
||||||
|
|
||||||
|
adfa
|
||||||
Bunting a tile is simply creating a blank default example of the tile's icon.
|
Bunting a tile is simply creating a blank default example of the tile's icon.
|
||||||
This may seem to have limited usefulness, but this is actually the most common
|
This may seem to have limited usefulness, but this is actually the most common
|
||||||
use of tiles. This is due to the the way in which we create, for example,
|
use of tiles. This is due to the the way in which we create, for example,
|
||||||
@ -139,88 +92,139 @@ never directly used in any Hoon source code. Whipping is used internally by
|
|||||||
clamming.
|
clamming.
|
||||||
|
|
||||||
In summary, a tile is simply a convenient syntax for creating well-typed nouns.
|
In summary, a tile is simply a convenient syntax for creating well-typed nouns.
|
||||||
Say that again, *a tile is simply a convenient syntax for creating well-typed
|
|
||||||
nouns.* A tile is not a tiwg, but tiles always are reduced statically in one
|
A tile is not a twig, but tiles always are reduced statically in one
|
||||||
of four ways to a twig. `++tile` is a sort of intermediate representation
|
of four ways to a twig.
|
||||||
between text and twigs that is often used when we're referring to types.
|
|
||||||
|
|
||||||
Returning from our digression about types and tiles to our discussion of the
|
|
||||||
compilation process, recall that `++mint` takes a subject-type and a twig and
|
Type Inference
|
||||||
compiles it into a product-type and a blob of nock.
|
--------------
|
||||||
|
|
||||||
|
Hoon is a higher-order typed functional language. Most languages in this class,
|
||||||
|
Haskell and ML being prominent examples, use something called the
|
||||||
|
Hindley-Milner unification algorithm. Hoon uses its own special sauce instead.
|
||||||
|
|
||||||
|
Hoon's philosophy is that a language is a UI for programmers, and the basic
|
||||||
|
test of a UI is its predictability. It is impossible (for most programmers)
|
||||||
|
to learn a language properly unless they know what the compiler is doing, which
|
||||||
|
in practice means mentally stepping through the algorithms it uses (with the
|
||||||
|
exception of semantically neutral optimizations). Haskell is a hard language to
|
||||||
|
learn (for most programmers) because it's hard (for most programmers) to follow
|
||||||
|
what the Haskell compiler is thinking.
|
||||||
|
|
||||||
|
|
||||||
|
Broadly speaking, type inference in Hoon has three general limitations as
|
||||||
|
compared to Hindley-Milner inference.
|
||||||
|
|
||||||
|
1. Hoon does not think backwards. For instance, it cannot infer a function's
|
||||||
|
argument type (or rather, a gate's sample type) from its body.
|
||||||
|
|
||||||
|
2. Hoon can infer through tail recursion, but not head recursion. It can check
|
||||||
|
head recursion, however, given an annotation.
|
||||||
|
|
||||||
|
3. The compiler catches most but not all divergent inference problems - i.e.
|
||||||
|
you can put the compiler into an infinite loop or exponential equivalent.
|
||||||
|
An interrupt will still show you your error location.
|
||||||
|
|
||||||
|
Although an inference algorithm that reasons only forward must and does require
|
||||||
|
a few more annotations from the programmer, the small extra burden on her
|
||||||
|
fingers is more than offset by the lighter load on her hippocampus.
|
||||||
|
Furthermore, programs also exist to be read. Some of these annotations (which a
|
||||||
|
smarter algorithm might infer by steam) may annoy the actual author of the code
|
||||||
|
but be a lifesaver for her replacement.
|
||||||
|
|
||||||
|
Our experience is that these limitations are minor annoyances at worst and
|
||||||
|
prudent restrictions at best. Your mileage may vary.
|
||||||
|
|
||||||
|
Type inference is a frightening problem, especially if you've been exposed to
|
||||||
|
a wall of math. Your only real problem in learning Hoon is to learn not to
|
||||||
|
fear it. Once you work past this reasonable but entirely unfounded fear of
|
||||||
|
inference, your Hoon experience will be simple, refreshing and delightful. So
|
||||||
|
first, let's talk through a few reassuring facts:
|
||||||
|
|
||||||
|
1. Type inference in Hoon never sees a tile. It operates exclusively on twigs.
|
||||||
|
All tiles and synthetic twigs are reduced to natural twigs for the inference
|
||||||
|
engine's benefit.
|
||||||
|
|
||||||
|
2. The semantics of Hoon are in ++ut in hoon.hoon, and nowhere else.
|
||||||
|
|
||||||
|
3. Within ++ut, all the semantics of Hoon are in the call graph of one arm: ++mint.
|
||||||
|
++mint has a case for every natural hoon. So do ++play and ++mull,
|
||||||
|
but their semantics are redundant with ++mint.
|
||||||
|
|
||||||
|
4. One leg in the sample of ++mint - gol - which looks for all the world like a
|
||||||
|
mechanism for backward inference, is not. It is semantically irrelevant and
|
||||||
|
only exists to get better localization on error reports.
|
||||||
|
|
||||||
|
5. ++mint is the gate that maps [type twig] to [type nock]:
|
||||||
|
|
||||||
|
[subject-type twig] => [product-type nock-formula]
|
||||||
|
|
||||||
|
When we have a type that describes the subject for the formula we're trying to
|
||||||
|
generate, as we generate that formula we want to also generate a type for the
|
||||||
|
product of that formula on that subject. As long as subject-type is a
|
||||||
|
correct description of some subject, you can take any twig and compile it
|
||||||
|
against subject-type, producing a formula such that *(subject formula) is a
|
||||||
|
product correctly described by product-type.
|
||||||
|
|
||||||
|
|
||||||
|
Compilation
|
||||||
|
------------
|
||||||
|
|
||||||
|
`++make` is a top-level function that turns text into nock.
|
||||||
|
|
||||||
|
~hoclur-bicrel/try=> `*`(make '|= [@ @ud] +<')
|
||||||
|
[8 [1 0 0] [1 0 6] 0 1]
|
||||||
|
|
||||||
|
Another way to do this is with `!=`.
|
||||||
|
|
||||||
|
~hoclur-bicrel/try=> !=(|=([@ @ud] +<))
|
||||||
|
[8 [1 0 0] [1 0 6] 0 1]
|
||||||
|
|
||||||
|
`++make` is more general, in that it can be called on programmatically
|
||||||
|
generated text, but `!=` is convenient for learning and debugging.
|
||||||
|
|
||||||
|
The compilation process is as follows:
|
||||||
|
|
||||||
|
First, a runic expression is parsed into an abstact syntax tree, called a
|
||||||
|
`twig`:
|
||||||
|
|
||||||
|
text => twig
|
||||||
|
|
||||||
|
Parsing code into a `twig` can be done with `++ream`:
|
||||||
|
|
||||||
|
~hoclur-bicrel/try=> (ream '|= [@ @ud] +<')
|
||||||
|
[ %brts
|
||||||
|
p=[p=[%axil p=[%atom p=~.]] q=[%axil p=[%atom p=~.ud]]]
|
||||||
|
q=[%cnts p=~[[%.y p=6]] q=~]
|
||||||
|
]
|
||||||
|
|
||||||
|
Refer to the Syntax section for more detail on parsing.
|
||||||
|
|
||||||
|
The compiler proper, is `++mint` in `++ut`.
|
||||||
|
|
||||||
|
++mint takes a twig and a subject type and produces a product type and a nock formula.
|
||||||
|
|
||||||
[subject-type twig] => [product-type nock-formula]
|
[subject-type twig] => [product-type nock-formula]
|
||||||
|
|
||||||
As we compile each twig, we compile it against a subject type. During the
|
For example, we can call `++mint` on the twig we produced earlier with a
|
||||||
compilation of a twig, we obviously don't have access to the value of the
|
subject type of any noun:
|
||||||
subject (else compilation would include running the code). We do, however,
|
|
||||||
have some guarantees about its value. We have its type.
|
|
||||||
|
|
||||||
Most runes don't change the subject type, but some do. In the most simple
|
~hoclur-bicrel/try=>
|
||||||
example, consider `=> p q`, which means to evaluate `q` with a subject of
|
(~(mint ut %noun) %noun [%brts [[%axil [%atom ~.]] [%axil [%atom ~.ud]]] [%cnts ~[[%.y 6]] ~]])
|
||||||
`p`. Here, the subject type of `q` is simply the type of `p`.
|
|
||||||
|
|
||||||
In a slightly more complicated example, consider `?: p q r`, which means
|
[ p
|
||||||
simply to evaluate `q` if `p` is true, else evaluate `r`. When compiling `q`,
|
[ %core
|
||||||
we get to assume in the subject type that `p` is true while when compiling `r`,
|
p=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||||
we get to assume in the subject type that `p` is false. This is used to great
|
q
|
||||||
practical purpose, for example, when handling lists. If `p` is a test whether
|
[ p=%gold
|
||||||
the list is empty (which would be `?=(~ l)`), then in `q` we can assume that
|
q=[%cell p=[%cell p=[%atom p=%$] q=[%atom p=%ud]] q=%noun]
|
||||||
the list is empty while in `r` we know that the list has a head and a tail.
|
r=[p=[0 6] q={[p=%$ q=[%ash p=[%cnts p=~[[%.y p=6]] q=~]]]}]
|
||||||
Without that test, if you attempt to refer to the head of the list, the
|
]
|
||||||
compiler will complain that it cannot verify there is even a head to which to
|
]
|
||||||
refer. Thus, the compiler will generate `find-limb` and `find-fork` errors.
|
q=[%8 p=[%1 p=[0 0]] q=[p=[%1 p=[0 6]] q=[%0 p=1]]]
|
||||||
The `find-limb` refers to its inability to find the head of the list while the
|
]
|
||||||
`find-fork` refers to the fact that it's looking in a fork type -- that is, it
|
|
||||||
believes that the values is of one of multiple types, and in at least one of
|
|
||||||
the constituent types there is no head. The error messages are described in
|
|
||||||
detail in the lexicon.
|
|
||||||
|
|
||||||
Recapping, when compiling Hoon code, we compile each individual twig against
|
|
||||||
the subject with which it will eventually be called. The product is a nock
|
|
||||||
formula and the type of value that it may produce. Thus, both the nock formula
|
|
||||||
and the product type may depend on the both the subject type and the twig in
|
|
||||||
question.
|
|
||||||
|
|
||||||
It's obvious that the product type will usually depend on the subject type, but
|
|
||||||
it's less obvious when the nock formula will depend on the type. It does,
|
|
||||||
however, happen at times. When resolving a face, for example, the axis that
|
|
||||||
ends up in the nock formula depends on the where the face is in the subject.
|
|
||||||
We only know this because faces are in the subject type. Thus, in `=> [p=42
|
|
||||||
q=1.337] p`, the `p` twig compiles to nock `[0 2]` while in `=> [q=42 p=1.337]
|
|
||||||
p`, the `p` twig compiles to nock `[0 3]`. This is true even though the actual
|
|
||||||
nock produced by `[p=42 q=1.337]` is the same as that produced by `[q=42
|
|
||||||
p=1.337]`. Thus, the nock formula may depend on the subject type. It is for
|
|
||||||
this reason that we say that a type defines not only a set of values, but also
|
|
||||||
the semantics for operating on those values.
|
|
||||||
|
|
||||||
As long as some value is in the subject type, you can run it against the
|
|
||||||
produced nock formula as `*[subject formula]` and get a value in the product
|
|
||||||
type.
|
|
||||||
|
|
||||||
We've ignored one question thus far: it's all well and good once we've started
|
|
||||||
compiling, for we know the subject type. But what subject type do we start
|
|
||||||
with? We could, of course, put some restrictions on the subject type of
|
|
||||||
compiled Hoon code, but (1) there's no reason to do that, and (2) since this
|
|
||||||
will be returned as nock, and nock is untyped, the compiler cannot actually
|
|
||||||
make any guarantees about what subject it will be called with. Thus, we start
|
|
||||||
the compilation with a subject type of all nouns.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Hoon has 120 [XX count] digraph runes. The choice of glyph is not random. The
|
|
||||||
first defines a semantic category (with some exceptions). These categories are:
|
|
||||||
|
|
||||||
| bar core construction
|
|
||||||
$ buc tiles and tiling
|
|
||||||
% cen invocations
|
|
||||||
: col tuples
|
|
||||||
. dot nock operators
|
|
||||||
^ ket type conversions
|
|
||||||
; sem miscellaneous macros
|
|
||||||
~ sig hints
|
|
||||||
= tis compositions
|
|
||||||
? wut conditionals, booleans, tests
|
|
||||||
! zap special operations
|
|
||||||
|
|
||||||
|
Note that the head of the above is a type, which in this case is a gold core
|
||||||
|
with an ash gate. The second part, though, is (with a few labels, or faces added) the nock that was
|
||||||
|
produced by `++make`.
|
||||||
|
Loading…
Reference in New Issue
Block a user