Bend/README.md

# Bend

Bend is a massively parallel, high-level programming language. Unlike existing
alternatives like CUDA, OpenCL and Metal, which are low-level and limited, Bend
has the feel and features of a modern language like Python and Haskell. Yet, it
runs with 1000's of cores, on CPUs and GPUs, powered by the
[HVM2](https://github.com/HigherOrderCO/hvm2).

## Using Bend

First, install [Rust nightly](https://www.oreilly.com/library/view/rust-programming-by/9781788390637/e07dc768-de29-482e-804b-0274b4bef418.xhtml). Then, install both HVM2 and Bend with:

```sh
cargo install hvm
cargo install bend-lang
```

Then, just write a Bend file, and run it with:

```sh
bend run    <file.hvm> # uses the Rust interpreter (sequential)
bend run-c  <file.hvm> # uses the C interpreter (parallel)
bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)
```

You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
`gen-cu`, for maximum possible performance.

## Parallel Programming in Bend

To write parallel programs in Bend, all you have to do is... **nothing**. Other
than not making it *inherently sequential*! For example, the expression:

```python
(((1 + 2) + 3) + 4)
```

Can **not** run in parallel, inherently so, because `+4` depends on `+3` which
depends on `(1+2)`. But the following expression:

```python
((1 + 2) + (3 + 4))
```

Can run in parallel, and will, due to Bend's fundamental pledge:

> Everything that **can** run in parallel, **will** run in parallel.

For a more complete example, consider:

```python
def sum(depth, x):
  switch depth:
    case 0:
      return x
    case _:
      fst = sum(depth-1, x*2+0) # adds the fst half
      snd = sum(depth-1, x*2+1) # adds the snd half
      return fst + snd
    
def main:
  return sum(30, 0)
```

This code adds all numbers from 0 to 2^30, but, instead of a loop, we use a
recursive divide-and-conquer approach. Since this approach is *inherently
parallel*, the Bend executable will run in many cores. Here are some benchmarks:

- CPU, Apple M3 Max, 1 thread: **3.5 minutes**

- CPU, Apple M3 Max, 16 threads: **10.26 seconds**

- GPU, NVIDIA RTX 4090, 32k threads: **1.88 seconds**

That's a **111x speedup** by doing nothing. No thread spawning, no explicit
management of locks, mutexes. From shaders, to transformers, to Erlang-like
actor-based systems, every concurrent setup can be implemented on Bend with no
explicit annotations. Long-distance communication is performed by global
beta-reduction, and handled correctly and efficiently by the
[HVM2](https://github.com/HigherOrderCO/HVM2) runtime.

- For more in-depth information, check HVM's [paper](https://github.com/HigherOrderCO/HVM/raw/main/PAPER.pdf).

- To jump straight into action, check Bend's [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).

- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00			`# Bend`
Add readme 2023-09-01 18:44:59 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`Bend is a massively parallel, high-level programming language. Unlike existing`
			`alternatives like CUDA, OpenCL and Metal, which are low-level and limited, Bend`
			`has the feel and features of a modern language like Python and Haskell. Yet, it`
			`runs with 1000's of cores, on CPUs and GPUs, powered by the`
			`[HVM2](https://github.com/HigherOrderCO/hvm2).`
Add readme 2023-09-01 18:44:59 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`## Using Bend`
[sc-549] Clarify that hvml is the intended hvmc IR and target for compilers 2024-04-08 17:57:34 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`First, install [Rust nightly](https://www.oreilly.com/library/view/rust-programming-by/9781788390637/e07dc768-de29-482e-804b-0274b4bef418.xhtml). Then, install both HVM2 and Bend with:`
[sc-549] Clarify that hvml is the intended hvmc IR and target for compilers 2024-04-08 17:57:34 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```sh
			`cargo install hvm`
update the package name to bend-lang 2024-05-15 22:45:48 +03:00			`cargo install bend-lang`
Add initial readme 2023-10-25 23:03:47 +03:00			```

update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`Then, just write a Bend file, and run it with:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```sh
			`bend run <file.hvm> # uses the Rust interpreter (sequential)`
			`bend run-c <file.hvm> # uses the C interpreter (parallel)`
			`bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)`
Add initial readme 2023-10-25 23:03:47 +03:00			```

update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
			`gen-cu`, for maximum possible performance.
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`## Parallel Programming in Bend`
Add initial readme 2023-10-25 23:03:47 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`To write parallel programs in Bend, all you have to do is... nothing. Other`
			`than not making it inherently sequential! For example, the expression:`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`(((1 + 2) + 3) + 4)`
Merge branch 'improve-docs' of https://github.com/HigherOrderCO/hvm-lang into improve-docs 2024-01-09 22:58:37 +03:00			```
Add example of equivalence of string syntax sugar 2024-01-16 15:24:15 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			Can not run in parallel, inherently so, because `+4` depends on `+3` which
			depends on `(1+2)`. But the following expression:
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`((1 + 2) + (3 + 4))`
Add information about tags on the readme 2023-12-11 23:48:08 +03:00			```

update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`Can run in parallel, and will, due to Bend's fundamental pledge:`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`> Everything that can run in parallel, will run in parallel.`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`For a more complete example, consider:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			```python
			`def sum(depth, x):`
			`switch depth:`
			`case 0:`
			`return x`
			`case _:`
			`fst = sum(depth-1, x*2+0) # adds the fst half`
			`snd = sum(depth-1, x*2+1) # adds the snd half`
			`return fst + snd`

[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00			`def main:`
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`return sum(30, 0)`
Add information about tags on the readme 2023-12-11 23:48:08 +03:00			```
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`This code adds all numbers from 0 to 2^30, but, instead of a loop, we use a`
			`recursive divide-and-conquer approach. Since this approach is *inherently`
			`parallel*, the Bend executable will run in many cores. Here are some benchmarks:`
[sc-630] Update the readme for Bend 2024-05-06 20:55:33 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- CPU, Apple M3 Max, 1 thread: 3.5 minutes`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- CPU, Apple M3 Max, 16 threads: 10.26 seconds`
Update README.md 2024-05-14 17:56:05 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- GPU, NVIDIA RTX 4090, 32k threads: 1.88 seconds`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`That's a 111x speedup by doing nothing. No thread spawning, no explicit`
			`management of locks, mutexes. From shaders, to transformers, to Erlang-like`
			`actor-based systems, every concurrent setup can be implemented on Bend with no`
			`explicit annotations. Long-distance communication is performed by global`
			`beta-reduction, and handled correctly and efficiently by the`
			`[HVM2](https://github.com/HigherOrderCO/HVM2) runtime.`
Tweak call/cc example. Reorder links and add book emojis to show difficulty. 2024-01-09 22:46:49 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- For more in-depth information, check HVM's [paper](https://github.com/HigherOrderCO/HVM/raw/main/PAPER.pdf).`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- To jump straight into action, check Bend's [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).`
Add data types and patten matching to the readme 2023-12-07 17:07:50 +03:00
update readme, guide, and add features.md 2024-05-16 02:38:15 +03:00			`- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).`