10 KiB
HVM-Lang
HVM-Lang is a lambda-calculus based language and serves as an Intermediate Representation for HVM-Core, offering a higher level syntax for writing programs based on the Interaction-Calculus.
Installation
With the nightly version of rust installed, clone the repository:
git clone https://github.com/HigherOrderCO/hvm-lang.git
cd hvm-lang
Install using cargo:
cargo install --path .
Hello World!
First things first, let's write a basic program that adds the numbers 3 and 2.
main = (+ 3 2)
HVM-Lang searches for the main | Main
definitions as entrypoint of the program.
To run a program, use the run
argument:
hvml run <file>
It will show the number 5.
Adding the --stats
option displays some runtime stats like time and rewrites.
To limit the runtime memory, use the --mem <size> option.
The default is 1GB:
hvml --mem 65536 run <file>
You can specify the memory size in bytes (default), kilobytes (k), megabytes (m), or gigabytes (g), e.g., --mem 200m.
To compile a program use the compile
argument:
hvml compile <file>
This will output the compiled file to stdout.
Language Syntax
HVM-Lang syntax consists in Terms and Definitions. A Term represents a value, such as a Number, an Application, Function, etc. A Definition points to a Term.
Here we are defining 'two' as the number 2:
two = 2
Numbers can also be written as binary or hexadecimal by using the prefix 0b
and 0x
respectively.
You can use _
as a digit separator, it can be used on large numbers to make them more readable:
decimal = 1194684
binary = 0b100_100_011_101_010_111_100
hexadecimal = 0x123_abc
Currently, the only supported type of machine numbers are unsigned 60-bit integers.
Strings are delimited by "
"
and support Unicode characters.
main = "Hello, 🌎"
A string is desugared to a tuple containing its length and the list of chars. The chars have a tagged lambda with label 'str' for fast concatenation.
(5, λ#str x ('H', ('e', ('l', ('l', ('o', x))))))
Characters are delimited by '
'
and support Unicode escape sequences. They have a numeric value associated with them.
main = '\u4242'
A lambda where the body is the variable x
:
id = λx x
Lambdas can also be defined using @
.
To not bind the value of the lambda to any name, use *
:
True = @t @* t
False = λ* λf f
Applications are enclosed by (
)
.
(λx x λx x λx x)
This term is the same as:
(((λx x) (λx x)) (λx x))
Parentheses around lambdas are optional.
*
can also be used to define an eraser term.
It compiles to an inet
node with only one port that deletes anything that’s plugged into it.
era = *
Operations can handle just 2 terms at time:
some_val = (+ (+ 7 4) (* 2 3))
The current operations include +, -, *, /, %, ==, !=, <, >, <=, >=, &, |, ^, ~, <<, >>
.
A let term binds some value to the next term, in this case (* result 2)
:
let result = (+ 1 2); (* result 2)
It is possible to define tuples:
tup = (2, 2)
And destructuring tuples with let
:
let (x, y) = tup; (+ x y)
Term duplication is done automatically when a variable is used more than once. But it's possible to manually duplicate a term using dup
:
// the number 2 in church encoding using dup.
ch2 = λf λx dup f1 f2 = f; (f1 (f2 x))
// the number 3 in church encoding using dup.
ch3 = λf λx dup f0 f1 = f; dup f2 f3 = f0; (f1 (f2 (f3 x)))
A sup
is a superposition of two values, it is defined using curly brackets with two terms inside
sup = {3 7}
Sups can be used anywhere a value is expected, if anything interacts with the superposition, the result is the superposition of that interaction on both the possible values:
mul = λa λb (* a b)
result = (mul 2 5) // returns 10
result_sup = (mul 2 {5 7}) // returns {10 14}
multi_sup = (mul {2 3} {5 7}) // returns {{10 14} {15 21}}
To access both values of a superposition, dups
with labels are needed.
A dup
generally just duplicates the term it points to:
// each dup variable now has a copy of the {1 2} superposition
dup x1 x2 = {1 2}
Both dups
and sups
support labels, that is, a field starting with #
to identify their counterpart:
// here, x1 now contains the value of 1, and x2 the value of 2
dup #i x1 x2 = {#i 1 2}
Due to how dups are compiled, dup tags between two interacting terms should not contain the same label. For example, an application of the church numeral 2 with itself:
c2 = λf λx dup f1 f2 = f; (f1 (f2 x))
main = (c2 c2)
To avoid label collision, HVM-Lang automatically generates new dup labels for each dup in the code. But with cases like the example above, when the interacting dups comes from the same place, the result is an invalid reduction.
To fix the problem, its necessary to re-create the term so that a new label is assigned, or manually assign one:
c2 = λf λx dup f1 f2 = f; (f1 (f2 x))
c2_ = λf λx dup #label f1 f2 = f; (f1 (f2 x))
main = (c2 c2_)
A match syntax for machine numbers.
We match the case 0 and the case where the number is greater
than 0, if n
is the matched variable, n-1
binds the value of the number - 1:
Number.to_church = λn λf λx
match n {
0: x
+: (f (Number.to_church n-1 f x))
}
It is possible to define Data types using data
.
If a constructor has any arguments, parenthesis are necessary around it:
data Option = (Some val) | None
If the data type has a single constructor, it can be destructured using let
:
data Boxed = (Box val)
let (Box value) = boxed; value
Otherwise, there are two pattern syntaxes for matching on data types.
One which binds implicitly the matched variable name plus .
and the fields names on each constructor:
Option.map = λoption λf
match option {
Some: (Some (f option.val))
None: None
}
And another one which deconstructs the matched variable with explicit bindings:
Option.map = λoption λf
match option {
(Some value): (Some (f value))
(None): None
}
Rules can also have patterns. It functions like match expressions with explicit bindings:
(Option.map (Some value) f) = (Some (f value))
(Option.map None f) = None
But with the extra ability to match on multiple values at once:
data Boolean = True | False
(Option.is_both_some (Some lft_val) (Some rgt_val)) = True
(Option.is_both_some lft rgt) = False
Advanced
Scopeless Lambdas
It is possible have lambdas whose variables can occour outside of their scopes. To use it, add $
on the lambda declaration and on the variable using it:
($a (λ$a 1 λb b))
This term will reduce with the following steps:
(λb b 1)
1
Tagged Lambdas
Similarly to dups and sups, lambdas and applications can have labels too.
For example, data types can be encoded as tagged lambdas:
// data Bool = T | F
T = λ#Bool t λ#Bool f t
F = λ#Bool t λ#Bool f f
// data List = (Cons x xs) | Nil
Cons = λx λxs λ#List c λ#List n (#List.Cons.xs (#List.Cons.x c x) xs)
Nil = λ#List c λ#List n n
When encoding the pattern matching, the application can then use the same label:
// not = λbool match bool { T: (F) F: (T) }
not = λbool (#Bool bool F T)
This allows, in some limited* scenarios, automatic vectorization:
// vectorizes to: (Cons F (Cons T (Cons F Nil)))
main = (not (Cons T (Cons F (Cons T Nil))))
The tagged lambda and applications are compiled to dup
inet
nodes with different tag values for each label. This allows for different commutation rules, check HVM-Core to learn more about it.
*limitations:
- The function must not be recursive
- There must not be labels in common between the function and what you want to vectorize over
Compilation of Terms to HVM-core
How terms are compiled into interaction net nodes?
HVM-Core has a bunch of useful nodes to write IC programs.
Every node contains one main
port 0
and two auxiliary
ports, 1
and 2
.
There are 6 kinds of nodes, Erased, Constructor, Reference, Number, Operation and Match.
A lambda λx x
compiles into a Constructor node.
An application ((λx x) (λx x))
also compiles into a Constructor node.
We differentiate then by using the ports.
0 - Points to the lambda occurrence 0 - Points to the function
| |
λ @
/ \ / \
1 2 - Points to the lambda body 1 2 - Points to the application occurrence
| |
Points to the lambda variable Points to the argument
So, if we visit a Constructor at port 0 it's a Lambda, if we visit at port 2 it's an Application.
Also, nodes have labels, we use the label to store data in the node's memory and also differentiate them.
The Number
node uses the label to store it's number.
An Op2
node uses the label to store it's operation.
And a Constructor
node can have a label too! The label is used for Dup
and Tuple
nodes.
Check HVM-Core to know more about.
Runtime and Compiler optimizations
The HVM-Core is an eager runtime, for both CPU and parallel GPU implementations. Because of that, is recommended to use supercombinator formulation to make terms be unrolled lazily, preventing infinite expansion in recursive function bodies.
Consider the code below:
Zero = λf λx x
Succ = λn λf λx (f n)
ToMachine = λn (n λp (+ 1 (ToMachine p)) 0)
The lambda terms without free variables are extracted to new definitions.
ToMachine0 = λp (+ 1 (ToMachine p))
ToMachine = λn (n ToMachine0 0)
Definitions are lazy in the runtime. Lifting lambda terms to new definitions will prevent infinite expansion.
Consider this code:
Ch_2 = λf λx (f (f x))
As you can see the variable f
is used more than once, so HVM-Lang optimizes this and generates a duplication tree.
Ch_2 = λf λx dup f0 f0_ = f; dup f1 f1_ = f0_ = (f0 (f1 x))