Bend/README.md

# Bend

Bend is a massively parallel, high-level programming language.

Unlike low-level alternatives like CUDA and Metal, Bend has the feeling and
features of expressive languages like Python and Haskell, including fast object
allocations, higher-order functions with full closure support, unrestricted
recursion, even continuations. Yet, it runs on massively parallel hardware like
GPUs, with near-linear speedup based on core count, and zero explicit parallel
annotations: no thread spawning, no locks, mutexes, atomics. Bend is powered by
the [HVM2](https://github.com/HigherOrderCO/hvm) runtime.

## Using Bend

First, install [Rust nightly](https://www.oreilly.com/library/view/rust-programming-by/9781788390637/e07dc768-de29-482e-804b-0274b4bef418.xhtml). Then, install both HVM2 and Bend with:

```sh
cargo +nightly install hvm
cargo +nightly install bend-lang
```

Finally, write some Bend file, and run it with one of these commands:

```sh
bend run    <file.hvm> # uses the Rust interpreter (sequential)
bend run-c  <file.hvm> # uses the C interpreter (parallel)
bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)
```

You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
`gen-cu`, for maximum performance. But keep in mind our code gen is still on its
infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.

## Parallel Programming in Bend

To write parallel programs in Bend, all you have to do is... **nothing**. Other
than not making it *inherently sequential*! For example, the expression:

```python
(((1 + 2) + 3) + 4)
```

Can **not** run in parallel, because `+4` depends on `+3` which
depends on `(1+2)`. But the following expression:

```python
((1 + 2) + (3 + 4))
```

Can run in parallel, because `(1+2)` and `(3+4)` are independent; and it *will*,
per Bend's fundamental pledge:

> Everything that **can** run in parallel, **will** run in parallel.

For a more complete example, consider:

```python
def sum(depth, x):
  switch depth:
    case 0:
      return x
    case _:
      fst = sum(depth-1, x*2+0) # adds the fst half
      snd = sum(depth-1, x*2+1) # adds the snd half
      return fst + snd
    
def main:
  return sum(30, 0)
```

This code adds all numbers from 0 up to (but not including) 2^30. But, instead
of a loop, we use a recursive divide-and-conquer approach. Since this approach
is *inherently parallel*, Bend will run it multi-threaded. Some benchmarks:

- CPU, Apple M3 Max, 1 thread: **3.5 minutes**

- CPU, Apple M3 Max, 16 threads: **10.26 seconds**

- GPU, NVIDIA RTX 4090, 32k threads: **1.88 seconds**

That's a **111x speedup** by doing nothing. No thread spawning, no explicit
management of locks, mutexes. We just asked bend to run our program on RTX, and
it did. Simple as that. Note that, for now, Bend only supports 24-bit machine
ints (`u24`), thus, results are always `mod 2^24`.

Bend isn't limited to a specific paradigm, like tensors or matrices. Any
concurrent system, from shaders to Erlang-like actor models can be emulated on
Bend. For example, to render images in real time, we could simply allocate an
immutable tree on each frame:

```python
# given a shader, returns a square image
def render(depth, shader):
  bend d = 0, i = 0:
    when d < depth:
      color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
    else:
      width = depth / 2
      color = demo_shader(i % width, i / width)
  return color

# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
  bend i = 0:
    when i < 5000:
      color = fork(i + 1)
    else:
      color = 0x000001
  return color

# renders a 256x256 image using demo_shader
def main:
  return render(16, demo_shader)
```

And it would actually work. Even involved algorithms, such as a [Bitonic Sort
using tree rotations](examples/bitonic_sort.bend), parallelize well on Bend.
Long-distance communication is performed by *global beta-reduction* (as per the
[Interaction Calculus](https://github.com/VictorTaelin/Interaction-Calculus)),
and synchronized correctly and efficiently by
[HVM2](https://github.com/HigherOrderCO/HVM)'s *atomic linker*.

- To jump straight into action, check Bend's [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).

- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).

- To understand the tech behind Bend, check HVM2's [paper](https://github.com/HigherOrderCO/HVM/raw/main/PAPER.pdf).

- Bend is developed by [HigherOrderCO.com](https://HigherOrderCO.com) - join our [Discord](https://discord.HigherOrderCO.com)!

A Quick Demo
------------

![bendlivedemo](https://github.com/VictorTaelin/media/blob/main/bend_live_demo.gif?raw=true)
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00			`# Bend`
Add readme 2023-09-01 18:44:59 +03:00
update readme 2024-05-16 02:48:16 +03:00			`Bend is a massively parallel, high-level programming language.`

			`Unlike low-level alternatives like CUDA and Metal, Bend has the feeling and`
			`features of expressive languages like Python and Haskell, including fast object`
			`allocations, higher-order functions with full closure support, unrestricted`
			`recursion, even continuations. Yet, it runs on massively parallel hardware like`
			`GPUs, with near-linear speedup based on core count, and zero explicit parallel`
			`annotations: no thread spawning, no locks, mutexes, atomics. Bend is powered by`
fix HVM broken link 2024-05-17 00:45:57 +03:00			`the [HVM2](https://github.com/HigherOrderCO/hvm) runtime.`
add demo 2024-05-16 02:56:27 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`## Using Bend`
[sc-549] Clarify that hvml is the intended hvmc IR and target for compilers 2024-04-08 17:57:34 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`First, install [Rust nightly](https://www.oreilly.com/library/view/rust-programming-by/9781788390637/e07dc768-de29-482e-804b-0274b4bef418.xhtml). Then, install both HVM2 and Bend with:`
[sc-549] Clarify that hvml is the intended hvmc IR and target for compilers 2024-04-08 17:57:34 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```sh
add nightly flag 2024-05-16 02:40:16 +03:00			`cargo +nightly install hvm`
			`cargo +nightly install bend-lang`
Add initial readme 2023-10-25 23:03:47 +03:00			```

words 2024-05-16 04:19:37 +03:00			`Finally, write some Bend file, and run it with one of these commands:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```sh
			`bend run <file.hvm> # uses the Rust interpreter (sequential)`
			`bend run-c <file.hvm> # uses the C interpreter (parallel)`
			`bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)`
Add initial readme 2023-10-25 23:03:47 +03:00			```

update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
fix some typos and wrong linking 2024-05-16 18:55:06 +03:00			`gen-cu`, for maximum performance. But keep in mind our code gen is still on its
words 2024-05-16 04:19:37 +03:00			`infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`## Parallel Programming in Bend`
Add initial readme 2023-10-25 23:03:47 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`To write parallel programs in Bend, all you have to do is... nothing. Other`
			`than not making it inherently sequential! For example, the expression:`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`(((1 + 2) + 3) + 4)`
Merge branch 'improve-docs' of https://github.com/HigherOrderCO/hvm-lang into improve-docs 2024-01-09 22:58:37 +03:00			```
Add example of equivalence of string syntax sugar 2024-01-16 15:24:15 +03:00
words 2024-05-16 04:19:37 +03:00			Can not run in parallel, because `+4` depends on `+3` which
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			depends on `(1+2)`. But the following expression:
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`((1 + 2) + (3 + 4))`
Add information about tags on the readme 2023-12-11 23:48:08 +03:00			```

reorder lines on README 2024-05-16 04:24:38 +03:00			Can run in parallel, because `(1+2)` and `(3+4)` are independent; and it will,
			`per Bend's fundamental pledge:`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`> Everything that can run in parallel, will run in parallel.`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`For a more complete example, consider:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`def sum(depth, x):`
			`switch depth:`
			`case 0:`
			`return x`
			`case _:`
			`fst = sum(depth-1, x*2+0) # adds the fst half`
			`snd = sum(depth-1, x*2+1) # adds the snd half`
			`return fst + snd`

[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00			`def main:`
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`return sum(30, 0)`
Add information about tags on the readme 2023-12-11 23:48:08 +03:00			```
Update README.md 2024-05-14 17:56:05 +03:00
[sc-697] Update documentation, add use stmts, fix list parser 2024-05-17 13:54:25 +03:00			`This code adds all numbers from 0 up to (but not including) 2^30. But, instead`
			`of a loop, we use a recursive divide-and-conquer approach. Since this approach`
			`is inherently parallel, Bend will run it multi-threaded. Some benchmarks:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- CPU, Apple M3 Max, 1 thread: 3.5 minutes`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- CPU, Apple M3 Max, 16 threads: 10.26 seconds`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- GPU, NVIDIA RTX 4090, 32k threads: 1.88 seconds`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`That's a 111x speedup by doing nothing. No thread spawning, no explicit`
reinforce a sentence 2024-05-16 04:35:09 +03:00			`management of locks, mutexes. We just asked bend to run our program on RTX, and`
add clarification about current u24 numbers - addresses #320 2024-05-17 02:30:57 +03:00			`it did. Simple as that. Note that, for now, Bend only supports 24-bit machine`
			ints (`u24`), thus, results are always `mod 2^24`.

			`Bend isn't limited to a specific paradigm, like tensors or matrices. Any`
			`concurrent system, from shaders to Erlang-like actor models can be emulated on`
			`Bend. For example, to render images in real time, we could simply allocate an`
			`immutable tree on each frame:`
add render example on readme 2024-05-16 20:15:54 +03:00
			```python
			`# given a shader, returns a square image`
			`def render(depth, shader):`
			`bend d = 0, i = 0:`
			`when d < depth:`
fix example to use fork (new syntax) instead of go (old syntax) 2024-05-16 23:47:43 +03:00			`color = (fork(d+1, i2+0), fork(d+1, i2+1))`
add render example on readme 2024-05-16 20:15:54 +03:00			`else:`
			`width = depth / 2`
			`color = demo_shader(i % width, i / width)`
			`return color`

			`# given a position, returns a color`
			`# for this demo, it just busy loops`
			`def demo_shader(x, y):`
			`bend i = 0:`
make example smaller 2024-05-17 05:04:18 +03:00			`when i < 5000:`
fix example to use fork (new syntax) instead of go (old syntax) 2024-05-16 23:47:43 +03:00			`color = fork(i + 1)`
add render example on readme 2024-05-16 20:15:54 +03:00			`else:`
			`color = 0x000001`
			`return color`

			`# renders a 256x256 image using demo_shader`
			`def main:`
			`return render(16, demo_shader)`
			```

simpler wording 2024-05-16 20:21:52 +03:00			`And it would actually work. Even involved algorithms, such as a [Bitonic Sort`
			`using tree rotations](examples/bitonic_sort.bend), parallelize well on Bend.`
			`Long-distance communication is performed by global beta-reduction (as per the`
			`[Interaction Calculus](https://github.com/VictorTaelin/Interaction-Calculus)),`
			`and synchronized correctly and efficiently by`
fix some typos and wrong linking 2024-05-16 18:55:06 +03:00			`[HVM2](https://github.com/HigherOrderCO/HVM)'s atomic linker.`
Tweak call/cc example. Reorder links and add book emojis to show difficulty. 2024-01-09 22:46:49 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- To jump straight into action, check Bend's [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).`
reorder lines on README 2024-05-16 04:24:38 +03:00
link bitonic sort example on readme 2024-05-16 20:20:20 +03:00			`- To understand the tech behind Bend, check HVM2's [paper](https://github.com/HigherOrderCO/HVM/raw/main/PAPER.pdf).`
add demo gif 2024-05-17 04:01:20 +03:00
better spacing 2024-05-17 04:13:29 +03:00			`- Bend is developed by [HigherOrderCO.com](https://HigherOrderCO.com) - join our [Discord](https://discord.HigherOrderCO.com)!`
add website link 2024-05-17 04:09:00 +03:00
add demo gif 2024-05-17 04:01:20 +03:00			`A Quick Demo`
			`------------`

host live demo on github 2024-05-17 04:06:10 +03:00			`![bendlivedemo](https://github.com/VictorTaelin/media/blob/main/bend_live_demo.gif?raw=true)`