urbit/pub/docs/dev/hoon/reference/odors.md
Galen Wolfe-Pauly bb495844be doc -> docs
2015-10-20 10:51:45 -07:00

687 lines
15 KiB
Markdown

Odors
=====
Overview
--------
Since Everything in Hoon is a natural number, the interpreter needs to
know both how to render them and subject them to type enforcement.
Odors, which are just ASCII spans beginning with a `@`, carry all of the
information necessary for the interpreter to do this. For instance, the
interpreter knows to render an atom of odor `@t` as UTF-8 text.
The span composing the Odor consists of two parts: a lowercase prefix
carrying type information, and an upper-case suffix containing
information about its size. The prefix is a taxonomy that grows more
specific to the right. For example, atoms of the odor `@ta` are URL-safe
ASCII text, and Atoms of the odor `@tas` are the more specific subset of
ASCII text that is acceptable in hoon.
The general principle of type enforcement is that atoms change freely
either up or down the taxonomy, but not across. You can treat a `@tas`
as a `@t`, as in a strong type system; but you can also treat a `@t` as
a `@tas`, or an `@` as anything. However, passing a `@t` to a function
that expects an `@ux` is a type error.
XXDIAGRAMXX
For example, you can cast a `@t` to a `@tas`, or vice-versa:
~zod/try=> =a `@t`'permitted'
~zod/try=> a
'permitted'
~zod/try=> `@tas`a
%permitted
However, you cannot pass a `@ux` to a function that expects a `@t`
without casting it to a `@` first:
~zod/try=> (|=(a=@t [%foo a]) 0x20)
! type-fail
! exit
~zod/try=> =a 0x21
~zod/try=> (|=(a=@t [%foo a]) `@t`a)
[%foo '!']
Note that when explicitly casting a `@ux` to a `@t`, the interpreter
automatically casts the `@ux` to a `@` first.
Comprehensive list of the Hoon Odors
------------------------------------
@c UTF-32 codepoint
@d date
@da absolute date
@dr relative date (ie, timespan)
@f yes or no (inverse boolean)
@n nil
@p phonemic base
@r IEEE floating-point
@rd double precision (64 bits)
@rh half precision (16 bits)
@rq quad precision (128 bits)
@rs single precision (32 bits)
@s signed integer, sign bit low
@sb signed binary
@sd signed decimal
@sv signed base32
@sw signed base64
@sx signed hexadecimal
@t UTF-8 text (cord)
@ta ASCII text (span)
@tas ASCII symbol (term)
@u unsigned integer
@ub unsigned binary
@ud unsigned decimal
@uv unsigned base32
@uw unsigned base64
@ux unsigned hexadecimal
Odor Size Suffixes
------------------
The suffix of an odor, if present, is a single upper-case character A-Z
`c`, which indicates the size of an atom. This is possible, because in
Hoon, a letter maps to an ASCII code number, thus a number.
Size is specified in bits in the form of `1 << (c - 'A')` resp. `2^c`,
since most data aligns to the power of 2 or can be composed of such
blocks.
The size of a block of size `N` can be calculated for example using
`(bex (sub 'N' 'A'))` in bits or `(div (bex (sub 'N' 'A')) 8)` in bytes.
Thus, `@tD` is one UTF-8 byte (whatever that means) and `@tN` is 1
kilobyte or less of UTF-8.
For reference:
A 1 bit
B 2 bits
C 4 bits
D 1 byte
E 2 bytes
F 4 bytes
G 8 bytes
H 16 bytes
I 32 bytes
J 64 bytes
K 128 bytes
L 256 bytes
M 512 bytes
N 1K
O 2K
P 4K
Q 8K
R 16K
S 32K
T 64K
U 128K
V 256K
W 512K
X 1MB
Y 2MB
Z 4MB
It is possible to construct an atom bigger than 4Mb in size, but the
type system would not be able to express an odor size for it.
There is also the datatype `++bloq` to hold a to-the-power-of block size
(though it is just an alias for `@`).
------------------------------------------------------------------------
### @c
UTF-32 codepoint
Atoms of the odor `@c` represent Unicode text, constructed with a UTF-32
bytestream, with the lowest-significant bit first. Although we usually
use a UTF-8 bytestream, sometimes it's useful to build atoms of one or
more UTF-32 words.
##### Forms
`~-[text]`
##### Examples
~zod/try=> :type; ~-foo
~-foo
@c
~zod/try=> ~-i~2764.u
~-i~2764.u
~zod/try=> (tuft ~-i~2764.u)
'i❤u'
~zod/try=> `@ux`~-foo
0x6f.0000.006f.0000.0066
~zod/try=> `@ux`~-i~2764.u
0x75.0000.2764.0000.0069
~zod/try=> `@ux`(tuft ~-i~2764.u)
0x75.a49d.e269
------------------------------------------------------------------------
### @d
Date
#### @da
Absolute date
Atoms of the odor `@da` represent absolute Urbit dates. Urbit dates
represent 128-bit chronological time, with 2\^64 seconds from the start
of the universe. For example, 2\^127 is 3:30:08 PM on December 5, AD
226. The time of day and/or second fragment is optional. As the last
example shows, BC times are also possible.
##### Forms
`~~[year].[month].[date]..[hour].[minute].[second]..[millisecond]`
Note: the time of day and/or millisecond fragment is optional.
##### Examples
~zod/try=> ~2014.1.1
~2014.1.1
~zod/try=> :type; ~2014.1.1
~2014.1.1
@da
~zod/try=> ~2014.1.1..01.01.01
~2014.1.1..01.01.01
~zod/try=> :type; ~2014.1.1..01.01.01
~2014.1.1..01.01.01
@da
~zod/try=> ~2014.1.1..01.01.01..1234
~2014.1.1..01.01.01..1234
~zod/try=> `@da`(bex 127)
~226.12.5..15.30.08
~zod/try=> `@da`(dec (bex 127))
~226.12.5..15.30.07..ffff.ffff.ffff.ffff
~zod/try=> `@ux`~2013.12.7
0x8000.000d.2140.7280.0000.0000.0000.0000
~zod/try=> `@ux`~2013.12.7..15.30.07
0x8000.000d.2141.4c7f.0000.0000.0000.0000
~zod/try=> `@ux`~2013.12.7..15.30.07..1234
0x8000.000d.2141.4c7f.1234.0000.0000.0000
~zod/try=> `@ux`~2013.12.7..15.30.07..1234
0x8000.000d.2141.4c7f.1234.0000.0000.0000
------------------------------------------------------------------------
#### @dr
Relative date (ie, timespan)
Atoms of the odor `@dr` atoms represent basic time intervals in
milliseconds. There are no `@dr` intervals under a second or over a day
in length.
##### Forms
`~d[day].h[hour]m[minute].s[second]..[fractionals]`
Note: Every measurement is optional, so long as those that are present
are in order. The largest measurement is preceded by a `~`.
##### Examples
~zod/try=> ~d1.h19.m5.s29
~d1.h19.m5.s29
~zod/try=> :type; ~d1.h19.m5.s29
~d1.h19.m5.s29
@dr
~zod/try=> `@dr`(div ~s1 1.000)
~.s0..0041.8937.4bc6.a7ef
~zod/try=> ~d1.h19.m5.s29..0041
~d1.h19.m5.s29..0041
~zod/try=> `@ux`~s1
0x1.0000.0000.0000.0000
~zod/try=> `@ux`~m1
0x3c.0000.0000.0000.0000
~zod/try=> `@dr`(div ~d1 5)
~h4.m48
~zod/try=> (div ~m1 ~s1)
60
~zod/try=> (div ~h1 ~m1)
60
~zod/try=> (div ~h1 ~s1)
3.600
~zod/try=> (div ~d1 ~h1)
24
~zod/try=> `@da`(add ~2013.11.30 ~d1)
~2013.12.1
------------------------------------------------------------------------
#### @f
Loobean(inverse boolean)
Atoms of the odor `@f` represent loobeans, where `0` is yes and `1` is
no. Loobeans are often represented in their cubic and runic forms shown
below.
##### Forms
`0`, `1` as numbers. `%.y`, `%.n` as [`%cube`]()s. `&`, `|` as short
forms.
##### Examples
~zod/try=> `@ud`.y
0
~zod/try=> :type; .y
%.y
{%.y %.n}
~zod/try=> `@ud`.n
1
~zod/try=> .y
%.y
~zod/try=> &
%.y
~zod/try=> |
%.n
------------------------------------------------------------------------
### @n
Nil
Atoms of the odor `@n` indicate an absence of information, as in a list
terminator. The only value is `~`, which is just `0`.
~zod/try=> :type; ~
~
%~
~zod/try=> `@ud`%~
0
~zod/try=> `@ud`~
0
------------------------------------------------------------------------
### @p
Phonemic base
Atoms of `@p` are primarily used to represent ships names, but they can
be used to optimize any short number for memorability. For example, it
is great for checksums.
##### Forms
`~[phonemic]`
Every syllable is a byte. Pairs of two bytes are separated by `-`, and
phrases of four pairs are separated by `--`.
~zod/try=> ~pasnut
~pasnut
~zod/try=> :type; ~pasnut
~pasnut
@p
~zod/try=> `@p`0x4321
~pasnut
~zod/try=> `@p`0x8765.4321
~famsyr-dirwes
~zod/try=> `@p`0x8766.4321
~lidlug-maprec
~zod/try=> `@p`(shaf %foo %bar)
~ralnyl-panned-tinmul-winpex--togtux-ralsem-lanrus-pagrup
------------------------------------------------------------------------
### @r
IEEE floating-points
Hoon does not yet support floating point, so these syntaxes don't work
yet. But the syntax for a single-precision float is the normal English
syntax, with a `.` prefix:
------------------------------------------------------------------------
### @s
Signed integer, sign bit low
Without finite-sized integers, the sign extension trick obviously does
not work. A signed integer in Hoon is a different way to use atoms than
an unsigned integer; even for positive numbers, the signed integer
cannot equal the unsigned.
The prefix for a negative signed integer is a single `-` before the
unsigned syntax. The prefix for a positive signed integer is `--`. The
least significant represents the sign. The representation is similar to
a folded number line.
#### @sb
Signed binary
Atoms of the odor `@sb` represent signed binary numbers.
#### Forms
`-0b[negative_binary]` `--0b[postive_binary]`
#### Examples
~zod/try=> :type; -0b1
-0b1
@sb
~zod/try=> `@sd`-0b1
-1
~zod/try=> `@sd`--0b1
--1
~zod/try=> `@sd`--0b11
--3
~zod/try=> `@sb`(sum:si -0b10 -0b10)
-0b100
------------------------------------------------------------------------
#### @sd
Signed decimal
Atoms of odor `@sd` represent signed decimal numbers.
#### Forms
`-[negative[decimal]()]` `--[postive_[decimal]()]`
#### Examples
~zod/try=> -234
-234
~zod/try=> :type; -234
-234
@sd
~zod/try=> :type; --234
--234
@sd
~zod/try=> (sum:si -234 --234)
--0
------------------------------------------------------------------------
#### @sv
Signed base32
Atoms of odor `@sv` represent signed base32 numbers.
##### Forms
`-0v[negative_base32]` The digits are, in order, `0-9`, `a-v`.
`--0v[positive_base32]`
##### Examples
~zod/try=> -0vv
-0vv
~zod/try=> :type; -0vv
-0vv
@sv
~zod/try=> --0vb
--0vb
~zod/try=> `@sd`-0vv
-31
~zod/try=> `@sd`--0vb
--11
~zod/try=> `@sd`(sum:si -0vv --0vb)
-20
------------------------------------------------------------------------
#### @sw
Signed base64
Atoms of odor `@sw` represent base64 numbers.
##### Forms
`-0w[negative_base64]` The digits are, in order, `0-9`, `a-z`,
`A-Z`,`-`, and `~`. `--0w[positive_base64]` The digits are, in order,
`0-9`
##### Examples
~zod/try=> -0w--
-0w--
~zod/try=> :type; -0w--
-0w--
@sw
~zod/try=> `@sd`(sum:si -0w-A -0w--)
-8.034
#### @sx
Signed hexadecimal
Atoms of odor `@sx` represent signed hexadecimal numbers.
##### Forms
`-[negative_hexadecimal]` `--[positive_hexadecimal]`
##### Examples
~~zod/try=> -0x0
--0x0
~zod/try=> `@sd`--0x17
--23
~zod/try=> `@ux`(bex 20)
0x10.0000
~zod/try=> 0x10. 0000
0x10.0000
~zod/try=> `@sd`(sum:si --0x17 -0x0)
--23
~zod/try=> `@sd`(sum:si --0x17 -0xa)
--13
------------------------------------------------------------------------
### @t
UTF-8 text (cord)
Atoms of the odor `@t` represent a
[cord](http://en.wikipedia.org/wiki/Rope_data_structure), sequence of
UTF-8 bytes, LSB first. It is sometimes called a cord.
##### Forms
`~~[text]` `'[text]'`
##### Examples
~zod/try=> ~~foo
'foo'
~zod/try=> :type; 'foo'
'foo'
@t
~zod/try=> :type; ~~foo
'foo'
@t
------------------------------------------------------------------------
#### @ta
ASCII text (span)
Atoms of the odor `@ta` represent the ASCII text subset used in hoon
literals: `a-z`, `0-9`, `~`, `-`, `.`, `_`.
##### Forms
`~.[text]` There are no escape sequences.
##### Examples
~zod/try=> ~..asdf
~..asdf
~zod/try=> :type; ~.asdf
~.asdf
@ta
~zod/try=> `@t`~.asdf
'asdf'
------------------------------------------------------------------------
#### @tas
ASCII symbol (term)
Atoms of `@tas` represent [`++term`]()s, the most exclusive text odor.
The only characters permitted are lowercase ASCII, except as the first
or last character, and 0-9, except as the first character.
##### Forms
`%[text]` This means a term is always [cubical]().
##### Examples
~zod/try=> %dead-fish9
%dead-fish9
~zod/try=> -:!>(%dead-fish9)
[%cube p=271.101.667.197.767.630.546.276 q=[%atom p=%tas]]
------------------------------------------------------------------------
### @u
Unsigned integer
------------------------------------------------------------------------
#### @ub
Unsigned binary
Atoms of the odor `@ub` represent unsigned binary numbers.
#### Forms
`0b[number]` Numbers are least-significant bit first.
##### Examples
~zod/try=> `@`0b1
1
~zod/try=> :type; 0b1
0b1
@ub
~zod/try=> `@`0b10
2
~zod/try=> `@`0b100
4
------------------------------------------------------------------------
#### @ud
Unsigned decimal
Atoms of `@ud` represent unsigned decimal numbers. It is the default
print format for both `@u` and and `@u`, namely unsigned numbers with no
printing preference, as well as opaque atoms.
##### Forms
Numbers of more than three digits must be delimited by `.`s. Whitespace
and linebreaks can appear between the dot and the next group.
~zod/try=> 0
0
~zod/try=> 19
19
~zod/try=> :type; 19
19
@ud
~zod/try=> 1.024
1.024
~zod/try=> 65.536
65.536
~zod/try=> (bex 20)
1.048.576
------------------------------------------------------------------------
#### @uv
Unsigned base32
Atoms of the odor `@uv` represent unsigned base64 numbers.
##### Forms
`0v[number]` The digits are, in order, `0-9`, `a-v`.
##### Examples
~zod/try=> `@ud`0vv
31
~zod/try=> :type; 0vv
0vv
@uv
~zod/try=> `@ud`(add 0vv 0v9)
40
------------------------------------------------------------------------
#### @uw
Unsigned base64
##### Forms
`ow[number]` The digits are, in order, `0-9`, `a-z`, `A-Z`,`-`, and `~`.
##### Examples
~zod/try=> 0w~
0w~
~zod/try=> :type; 0w~
0w~
@uw
~zod/try=> `@uv`(add 0w~ 0wZ)
0v3s
~zod/try=> `@ud``@uv`(add 0w~ 0wZ)
124
------------------------------------------------------------------------
#### @ux
Unsigned hexadecimal
Atoms of the odor `@ux` represent hexadecimal numbers.
##### Forms
`0x`number. Numbers with more than four digits must be delimited by
`.`s. Hex digits are lowercase only.
~zod/try=> 0x0
0x0
~zod/try=> :type; 0x0
0x0
@ux
~zod/try=> `@ud`0x17
23
~zod/try=> `@ux`(bex 20)
0x10.0000
~zod/try=> 0x10. 0000
0x10.0000
------------------------------------------------------------------------