mirror of https://github.com/HigherOrderCO/Bend.git synced 2024-11-05 04:51:40 +03:00

A massively parallel, high-level programming language

Go to file

Nicolas Abril a2c5c81b79 Merge pull request #327 from HigherOrderCO/feature/sc-699/enable-obligatory-in-imp-syntax [sc-699] Require , in list-like things in imp syntax		2024-05-17 18:14:25 +02:00
.github/workflows	Simplify for hvm32, add net size check	2024-04-26 22:37:36 +02:00
docs	Fix empty map parsing and improve syntax doc	2024-05-17 13:07:09 -03:00
examples	[sc-699] Require , in list-like things in imp syntax	2024-05-17 18:11:07 +02:00
src	[sc-699] Require , in list-like things in imp syntax	2024-05-17 18:11:07 +02:00
tests	[sc-699] Require , in list-like things in imp syntax	2024-05-17 18:11:07 +02:00
.clippy.toml	Fix parsing and conversion to hvm core. Tidy up	2023-09-01 16:52:58 +02:00
.gitignore	Update tests to use .bend extension, fix test uite	2024-05-15 00:25:46 +02:00
.rustfmt.toml	Initial commit of hvm-lang with basic parser	2023-08-29 22:43:01 +02:00
Cargo.lock	[sc-692] Add simple readback of tuples	2024-05-17 02:12:50 +02:00
Cargo.toml	Ignore big files in published package	2024-05-16 22:30:37 +02:00
cspell.json	[sc-699] Require , in list-like things in imp syntax	2024-05-17 18:11:07 +02:00
FEATURES.md	[sc-697] Update documentation, add use stmts, fix list parser	2024-05-17 12:54:25 +02:00
GUIDE.md	Merge pull request #324 from HigherOrderCO/feature/sc-697/update-features-md-to-the-current-syntax	2024-05-17 13:55:46 +02:00
justfile	[sc-627] Initial update for hvm32	2024-04-22 19:03:56 +02:00
README.md	give a proper name to the live demo gif	2024-05-17 18:13:48 +02:00
rust-toolchain.toml	Make hvm lang compatible with ptr refactor [sc-362]	2024-02-22 15:00:29 -03:00

README.md

Bend

Bend is a massively parallel, high-level programming language.

Unlike low-level alternatives like CUDA and Metal, Bend has the feeling and features of expressive languages like Python and Haskell, including fast object allocations, higher-order functions with full closure support, unrestricted recursion, even continuations. Yet, it runs on massively parallel hardware like GPUs, with near-linear speedup based on core count, and zero explicit parallel annotations: no thread spawning, no locks, mutexes, atomics. Bend is powered by the HVM2 runtime.

A Quick Demo

Using Bend

First, install Rust nightly. Then, install both HVM2 and Bend with:

cargo +nightly install hvm
cargo +nightly install bend-lang

Finally, write some Bend file, and run it with one of these commands:

bend run    <file.hvm> # uses the Rust interpreter (sequential)
bend run-c  <file.hvm> # uses the C interpreter (parallel)
bend run-cu <file.hvm> # uses the CUDA interpreter (massively parallel)

You can also compile Bend to standalone C/CUDA files with gen-c and gen-cu, for maximum performance. But keep in mind our code gen is still on its infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.

Parallel Programming in Bend

To write parallel programs in Bend, all you have to do is... nothing. Other than not making it inherently sequential! For example, the expression:

(((1 + 2) + 3) + 4)

Can not run in parallel, because +4 depends on +3 which depends on (1+2). But the following expression:

((1 + 2) + (3 + 4))

Can run in parallel, because (1+2) and (3+4) are independent; and it will, per Bend's fundamental pledge:

Everything that can run in parallel, will run in parallel.

For a more complete example, consider:

def sum(depth, x):
  switch depth:
    case 0:
      return x
    case _:
      fst = sum(depth-1, x*2+0) # adds the fst half
      snd = sum(depth-1, x*2+1) # adds the snd half
      return fst + snd
    
def main:
  return sum(30, 0)

This code adds all numbers from 0 up to (but not including) 2^30. But, instead of a loop, we use a recursive divide-and-conquer approach. Since this approach is inherently parallel, Bend will run it multi-threaded. Some benchmarks:

CPU, Apple M3 Max, 1 thread: 3.5 minutes
CPU, Apple M3 Max, 16 threads: 10.26 seconds
GPU, NVIDIA RTX 4090, 32k threads: 1.88 seconds

That's a 111x speedup by doing nothing. No thread spawning, no explicit management of locks, mutexes. We just asked bend to run our program on RTX, and it did. Simple as that. Note that, for now, Bend only supports 24-bit machine ints (u24), thus, results are always mod 2^24.

Bend isn't limited to a specific paradigm, like tensors or matrices. Any concurrent system, from shaders to Erlang-like actor models can be emulated on Bend. For example, to render images in real time, we could simply allocate an immutable tree on each frame:

# given a shader, returns a square image
def render(depth, shader):
  bend d = 0, i = 0:
    when d < depth:
      color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
    else:
      width = depth / 2
      color = demo_shader(i % width, i / width)
  return color

# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
  bend i = 0:
    when i < 5000:
      color = fork(i + 1)
    else:
      color = 0x000001
  return color

# renders a 256x256 image using demo_shader
def main:
  return render(16, demo_shader)

And it would actually work. Even involved algorithms, such as a Bitonic Sort using tree rotations, parallelize well on Bend. Long-distance communication is performed by global beta-reduction (as per the Interaction Calculus), and synchronized correctly and efficiently by HVM2's atomic linker.

To jump straight into action, check Bend's GUIDE.md.
For an extensive list of features, check FEATURES.md.
To understand the tech behind Bend, check HVM2's paper.
Bend is developed by HigherOrderCO.com - join our Discord!