Bend/README.md

136 lines
4.6 KiB
Markdown
Raw Normal View History

2024-05-06 20:55:33 +03:00
# Bend
2023-09-01 18:44:59 +03:00
2024-05-16 02:48:16 +03:00
Bend is a massively parallel, high-level programming language.
Unlike low-level alternatives like CUDA and Metal, Bend has the feeling and
features of expressive languages like Python and Haskell, including fast object
allocations, higher-order functions with full closure support, unrestricted
recursion, even continuations. Yet, it runs on massively parallel hardware like
GPUs, with near-linear speedup based on core count, and zero explicit parallel
annotations: no thread spawning, no locks, mutexes, atomics. Bend is powered by
2024-05-17 00:45:57 +03:00
the [HVM2](https://github.com/HigherOrderCO/hvm) runtime.
2024-05-16 02:56:27 +03:00
## Using Bend
First, install [Rust nightly](https://www.oreilly.com/library/view/rust-programming-by/9781788390637/e07dc768-de29-482e-804b-0274b4bef418.xhtml). Then, install both HVM2 and Bend with:
```sh
2024-05-16 02:40:16 +03:00
cargo +nightly install hvm
cargo +nightly install bend-lang
2023-10-25 23:03:47 +03:00
```
2024-05-16 04:19:37 +03:00
Finally, write some Bend file, and run it with one of these commands:
2024-05-06 20:55:33 +03:00
```sh
bend run <file.hvm> # uses the Rust interpreter (sequential)
bend run-c <file.hvm> # uses the C interpreter (parallel)
bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)
2023-10-25 23:03:47 +03:00
```
You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
2024-05-16 18:55:06 +03:00
`gen-cu`, for maximum performance. But keep in mind our code gen is still on its
2024-05-16 04:19:37 +03:00
infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.
2024-05-14 17:56:05 +03:00
## Parallel Programming in Bend
2023-10-25 23:03:47 +03:00
To write parallel programs in Bend, all you have to do is... **nothing**. Other
than not making it *inherently sequential*! For example, the expression:
2024-05-14 17:56:05 +03:00
```python
(((1 + 2) + 3) + 4)
```
2024-05-16 04:19:37 +03:00
Can **not** run in parallel, because `+4` depends on `+3` which
depends on `(1+2)`. But the following expression:
2024-05-14 17:56:05 +03:00
```python
((1 + 2) + (3 + 4))
```
2024-05-16 04:24:38 +03:00
Can run in parallel, because `(1+2)` and `(3+4)` are independent; and it *will*,
per Bend's fundamental pledge:
2024-05-14 17:56:05 +03:00
> Everything that **can** run in parallel, **will** run in parallel.
2024-05-06 20:55:33 +03:00
For a more complete example, consider:
2024-05-06 20:55:33 +03:00
```python
def sum(depth, x):
switch depth:
case 0:
return x
case _:
fst = sum(depth-1, x*2+0) # adds the fst half
snd = sum(depth-1, x*2+1) # adds the snd half
return fst + snd
2024-05-06 20:55:33 +03:00
def main:
return sum(30, 0)
```
2024-05-14 17:56:05 +03:00
This code adds all numbers from 0 up to (but not including) 2^30. But, instead
of a loop, we use a recursive divide-and-conquer approach. Since this approach
is *inherently parallel*, Bend will run it multi-threaded. Some benchmarks:
2024-05-06 20:55:33 +03:00
- CPU, Apple M3 Max, 1 thread: **3.5 minutes**
- CPU, Apple M3 Max, 16 threads: **10.26 seconds**
2024-05-14 17:56:05 +03:00
- GPU, NVIDIA RTX 4090, 32k threads: **1.88 seconds**
That's a **111x speedup** by doing nothing. No thread spawning, no explicit
2024-05-16 04:35:09 +03:00
management of locks, mutexes. We just asked bend to run our program on RTX, and
it did. Simple as that. Note that, for now, Bend only supports 24-bit machine
ints (`u24`), thus, results are always `mod 2^24`.
Bend isn't limited to a specific paradigm, like tensors or matrices. Any
concurrent system, from shaders to Erlang-like actor models can be emulated on
Bend. For example, to render images in real time, we could simply allocate an
immutable tree on each frame:
2024-05-16 20:15:54 +03:00
```python
# given a shader, returns a square image
def render(depth, shader):
bend d = 0, i = 0:
when d < depth:
color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
2024-05-16 20:15:54 +03:00
else:
width = depth / 2
color = demo_shader(i % width, i / width)
return color
# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
bend i = 0:
2024-05-17 05:04:18 +03:00
when i < 5000:
color = fork(i + 1)
2024-05-16 20:15:54 +03:00
else:
color = 0x000001
return color
# renders a 256x256 image using demo_shader
def main:
return render(16, demo_shader)
```
2024-05-16 20:21:52 +03:00
And it would actually work. Even involved algorithms, such as a [Bitonic Sort
using tree rotations](examples/bitonic_sort.bend), parallelize well on Bend.
Long-distance communication is performed by *global beta-reduction* (as per the
[Interaction Calculus](https://github.com/VictorTaelin/Interaction-Calculus)),
and synchronized correctly and efficiently by
2024-05-16 18:55:06 +03:00
[HVM2](https://github.com/HigherOrderCO/HVM)'s *atomic linker*.
- To jump straight into action, check Bend's [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).
- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).
2024-05-16 04:24:38 +03:00
2024-05-16 20:20:20 +03:00
- To understand the tech behind Bend, check HVM2's [paper](https://github.com/HigherOrderCO/HVM/raw/main/PAPER.pdf).
2024-05-17 04:01:20 +03:00
2024-05-17 04:13:29 +03:00
- Bend is developed by [HigherOrderCO.com](https://HigherOrderCO.com) - join our [Discord](https://discord.HigherOrderCO.com)!
2024-05-17 04:09:00 +03:00
2024-05-17 04:01:20 +03:00
A Quick Demo
------------
2024-05-17 04:06:10 +03:00
![bendlivedemo](https://github.com/VictorTaelin/media/blob/main/bend_live_demo.gif?raw=true)