From b68ecee29b70f7df98752c5346e8e9e3880b7aaa Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:16:02 -0300
Subject: [PATCH 01/10] Update README.md

---
 README.md | 322 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 214 insertions(+), 108 deletions(-)
diff --git a/README.md b/README.md
index de9c7698..25e62a01 100644
--- a/README.md
+++ b/README.md
@@ -1,81 +1,192 @@
-# Bend
+<h1 >Bend</h1>
+<p>A high-level, massively parallel programming language</p>
 
-Bend is a massively parallel, high-level programming language.
+## Index
+1. [Introduction](#introduction)
+2. [Important Notes](#important-notes)
+3. [Install](#install)
+4. [Getting Started](#getting-started)
+5. [Speedup Example](#speedup-example)
+6. [Additional Resources](#additional-resources)
 
-Unlike low-level alternatives like CUDA and Metal, Bend has the feeling and
-features of expressive languages like Python and Haskell, including fast object
-allocations, higher-order functions with full closure support, unrestricted
-recursion, even continuations. Yet, it runs on massively parallel hardware like
-GPUs, with near-linear speedup based on core count, and zero explicit parallel
-annotations: no thread spawning, no locks, mutexes, atomics. Bend is powered by
-the [HVM2](https://github.com/HigherOrderCO/hvm) runtime.
+## Introduction
 
-A Quick Demo
-------------
+Bend offers the feel and features of expressive languages like Python and Haskell. This includes fast object allocations, full support for higher-order functions with closures, unrestricted recursion, and even continuations.                             
+Bend scales like CUDA, it runs on massively parallel hardware like GPUs, with nearly linear acceleration based on core count, and without explicit parallelism annotations: no thread creation, locks, mutexes, or atomics.                     
+Bend is powered by the [HVM2](https://github.com/higherorderco/hvm) runtime.
 
-[![Bend live demo](https://github.com/VictorTaelin/media/blob/main/bend_live_demo.gif?raw=true)](https://x.com/i/status/1791213162525524076)
 
-- For a more in-depth explanation on how to setup and use Bend, check [GUIDE.md](https://github.com/HigherOrderCO/bend/blob/main/GUIDE.md).
+## Important Notes
 
-- For an extensive list of features, check [FEATURES.md](https://github.com/HigherOrderCO/bend/blob/main/FEATURES.md).
+* Bend is designed to excel in scaling performance with cores, supporting over 10000 concurrent threads.
+* The current version may have lower single-core performance.
+* You can expect substantial improvements in performance as we advance our code generation and optimization techniques.
+* We are still working to support Windows. Use [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) as an alternative solution.
+* [We only support NVIDIA Gpus currently](https://github.com/HigherOrderCO/Bend/issues/341).
 
-## Using Bend
 
-> Currently not working on Windows, please use [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) as a workaround.
 
-> If you're having issues or have a question about Bend, please first read the [FAQ](https://github.com/HigherOrderCO/Bend/blob/main/FAQ.md) page and check if your question has already been addressed.
+## Install
 
-First, install [Rust](https://www.rust-lang.org/tools/install).
+### Install depedencies
 
-If you want to use the C runtime, install a C compiler (like GCC or Clang).
-If you want to use the CUDA runtime, install the CUDA toolkit (CUDA and `nvcc`) version 12.x.
+#### On Linux
+```py
+# Install Rust if you haven't it already.
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
-> **_Note_: [Only Nvidia GPUs are supported at the moment](https://github.com/HigherOrderCO/Bend/issues/341).**
+# For the C version of Bend, use GCC. We recommend a version up to 12.x.
+sudo apt install gcc
+```
+For the CUDA runtime [install the CUDA toolkit for Linux](https://developer.nvidia.com/cuda-downloads?target_os=Linux) version 12.x.
 
-Then, install both HVM2 and Bend with:
 
-```sh
-cargo install hvm
-cargo install bend-lang
+#### On Mac
+```py
+# Install Rust if you haven't it already.
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+
+# For the C version of Bend, use GCC. We recommend a version up to 12.x.
+brew install gcc
 ```
 
-Finally, write some Bend file, and run it with one of these commands:
 
+### Install Bend
+
+1. Install HVM2 by running:
+```sh
+# HVM2 is HOC's massively parallel Interaction Combinator evaluator.
+cargo install hvm
+
+# This ensures HVM is correctly installed and accessible.
+hvm --version
+```
+2. Install Bend by running:
+```sh
+# This command will install Bend
+cargo install bend-lang
+
+# This ensures Bend is correctly installed and accessible.
+bend --version
+```
+
+### Getting Started
+#### Running Bend Programs
 ```sh
 bend run    <file.bend> # uses the Rust interpreter (sequential)
 bend run-c  <file.bend> # uses the C interpreter (parallel)
 bend run-cu <file.bend> # uses the CUDA interpreter (massively parallel)
+
+# Notes
+# You can also compile Bend to standalone C/CUDA files using gen-c and gen-cu for maximum performance.
+# The code generator is still in its early stages and not as mature as compilers like GCC and GHC.
+# You can use the -s flag to have more information on
+  # Reductions
+  # Time the code took to run
+  # Interaction per second (In millions)
 ```
 
-You can also compile `Bend` to standalone C/CUDA files with `gen-c` and
-`gen-cu`, for maximum performance. But keep in mind our code gen is still in its
-infancy, and is nowhere as mature as SOTA compilers like GCC and GHC.
+#### Testing Bend Programs
+The example below sums all the numbers in the range from `start` to `target`. It can be written in two different methods: one that is inherently sequential (and thus cannot be parallelized), and another that is easily parallelizable. (We will be using the `-s`flag in most examples, for the sake of visibility)
 
-## Parallel Programming in Bend
+#### Sequential version:
+First, create a file named `ssum.bend`
+```sh
+# Write this command on your terminal
+touch ssum.bend
+```
+Then with your text editor, open the file `ssum.bend`, copy the code below and paste in the file.
 
-To write parallel programs in Bend, all you have to do is... **nothing**. Other
-than not making it *inherently sequential*! For example, the expression:
+```py
+# Defines the function Sum with two parameters: start and target
+def Sum(start, target):
+    # If the value of start is the same as target, returns start
+  if start == target:
+    return start
+    # If start is not equal to target, recursively call Sum with start incremented by 1, and add the result to start
+  else:
+    return start + Sum(start + 1, target)  
 
-```python
-(((1 + 2) + 3) + 4)
+def main():
+# This translates to (1 + (2 + (3 + (...... + (79999999 + 80000000)))))
+  return Sum(1, 80000000)
 ```
 
-Can **not** run in parallel, because `+4` depends on `+3` which
-depends on `(1+2)`. But the following expression:
-
-```python
-((1 + 2) + (3 + 4))
+##### Running the file
+You can run it using Rust interpreter (Sequential)
+```sh
+bend run ssum.bend -s
 ```
 
-Can run in parallel, because `(1+2)` and `(3+4)` are independent; and it *will*,
-per Bend's fundamental pledge:
+Or you can run it using C interpreter (Sequential)
+```sh
+bend run-c ssum.bend -s
+```
 
-> Everything that **can** run in parallel, **will** run in parallel.
+If you have a NVIDIA GPU, you can also run in CUDA (Sequential)
+```sh
+bend run-cu ssum.bend -s
+```
 
-For a more complete example, consider:
+In this version, the next value to be calculated depends on the previous sum, meaning that it cannot proceed until the current computation is complete. Now, let's look at the easily parallelizable version.
 
-```python
-# Sorting Network = just rotate trees!
+
+#### Parallelizable version:
+First close the old file and then proceed to your terminal to create `psum.bend`
+```sh
+# Write this command on your terminal
+touch psum.bend
+```
+Then with your text editor, open the file `psum.bend`, copy the code below and paste in the file.
+
+```py
+# Defines the function Sum with two parameters: start and target
+def Sum(start, target):
+  # If the value of start is the same as target, returns start
+  if start == target:
+    return start
+  # If start is not equal to target, calculate the midpoint (half), then recursively call Sum on both halves
+  else:
+    half = (start + target) / 2
+    left = Sum(start, half)  # (Start -> Half)
+    right = Sum(half + 1, target)
+    return left + right
+
+# Main function to demonstrate the parallelizable sum from 1 to 80000000
+def main():
+# This translates to ((1 + 2) + (3 + 4)+ ... (79999999 + 80000000)...)
+  return Sum(1, 80000000)
+```
+
+In this example, the (3 + 4) sum does not depend on the (1 + 2), meaning that it can run in parallel because both computations can happen at the same time. 
+
+##### Running the file
+You can run it using Rust interpreter (Sequential)
+```sh
+bend run psum.bend -s
+```
+
+Or you can run it using C interpreter (Parallel)
+```sh
+bend run-c ssum.bend -s
+```
+
+If you have a NVIDIA GPU, you can also run in CUDA (Massivelly parallel)
+```sh
+bend run-cu ssum.bend -s
+```
+
+In Bend, it can be parallelized by just changing the run command. If your code **can** run in parallel it **will** run in parallel.
+
+
+### Speedup Examples
+The code snippet below implements a [bitonic sorter](https://en.wikipedia.org/wiki/Bitonic_sorter) with *immutable tree rotations*. It's not the type of algorithm you would expect to run fast on GPUs. However, since it uses a divide and conquer approach, which is inherently parallel, Bend will execute it on multiple threads, no thread creation, no explicit lock management.
+
+ <details>
+  <summary>Click here for the Bitonic Sorter code </summary> 
+   
+ ```py
+ # Sorting Network = just rotate trees!
 def sort(d, s, tree):
   switch d:
     case 0:
@@ -84,7 +195,7 @@ def sort(d, s, tree):
       (x,y) = tree
       lft   = sort(d-1, 0, x)
       rgt   = sort(d-1, 1, y)
-      return rots(d, s, lft, rgt)
+      return rots(d, s, (lft, rgt))
 
 # Rotates sub-trees (Blue/Green Box)
 def rots(d, s, tree):
@@ -95,76 +206,71 @@ def rots(d, s, tree):
       (x,y) = tree
       return down(d, s, warp(d-1, s, x, y))
 
-(...)
-```
+# Swaps distant values (Red Box)
+def warp(d, s, a, b):
+  switch d:
+    case 0:
+      return swap(s ^ (a > b), a, b)
+    case _:
+      (a.a, a.b) = a
+      (b.a, b.b) = b
+      (A.a, A.b) = warp(d-1, s, a.a, b.a)
+      (B.a, B.b) = warp(d-1, s, a.b, b.b)
+      return ((A.a,B.a),(A.b,B.b))
 
-This
-[file](https://gist.github.com/VictorTaelin/face210ca4bc30d96b2d5980278d3921)
-implements a [bitonic sorter](https://en.wikipedia.org/wiki/Bitonic_sorter) with
-*immutable tree rotations*. It is not the kind of algorithm you'd expect to
-run fast on GPUs. Yet, since it uses a divide-and-conquer approach, which is
-*inherently parallel*, Bend will run it multi-threaded. Some benchmarks:
+# Propagates downwards
+def down(d,s,t):
+  switch d:
+    case 0:
+      return t
+    case _:
+      (t.a, t.b) = t
+      return (rots(d-1, s, t.a), rots(d-1, s, t.b))
 
-- CPU, Apple M3 Max, 1 thread: **12.15 seconds**
+# Swaps a single pair
+def swap(s, a, b):
+  switch s:
+    case 0:
+      return (a,b)
+    case _:
+      return (b,a)
 
-- CPU, Apple M3 Max, 16 threads: **0.96 seconds**
+# Testing
+# -------
 
-- GPU, NVIDIA RTX 4090, 16k threads: **0.21 seconds**
+# Generates a big tree
+def gen(d, x):
+  switch d:
+    case 0:
+      return x
+    case _:
+      return (gen(d-1, x * 2 + 1), gen(d-1, x * 2))
 
-That's a **57x speedup** by doing nothing. No thread spawning, no explicit
-management of locks, mutexes. We just asked Bend to run our program on RTX, and
-it did. Simple as that.
+# Sums a big tree
+def sum(d, t):
+  switch d:
+    case 0:
+      return t
+    case _:
+      (t.a, t.b) = t
+      return sum(d-1, t.a) + sum(d-1, t.b)
 
-Bend isn't limited to a specific paradigm, like tensors or matrices. Any
-concurrent system, from shaders to Erlang-like actor models can be emulated on
-Bend. For example, to render images in real time, we could simply allocate an
-immutable tree on each frame:
-
-```python
-# given a shader, returns a square image
-def render(depth, shader):
-  bend d = 0, i = 0:
-    when d < depth:
-      color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
-    else:
-      width = depth / 2
-      color = shader(i % width, i / width)
-  return color
-
-# given a position, returns a color
-# for this demo, it just busy loops
-def demo_shader(x, y):
-  bend i = 0:
-    when i < 5000:
-      color = fork(i + 1)
-    else:
-      color = 0x000001
-  return color
-
-# renders a 256x256 image using demo_shader
+# Sorts a big tree
 def main:
-  return render(16, demo_shader)
+  return sum(20, sort(20, 0, gen(20, 0)))
 ```
 
-And it would actually work. Even involved algorithms parallelize well on Bend.
-Long-distance communication is performed by *global beta-reduction* (as per the
-[Interaction Calculus](https://github.com/VictorTaelin/Interaction-Calculus)),
-and synchronized correctly and efficiently by
-[HVM2](https://github.com/HigherOrderCO/HVM)'s *atomic linker*.
+#### Benchmark
+
+- `bend run`: CPU, Apple M3 Max: 12.15 seconds
+- `bend run-c`: CPU, Apple M3 Max: 0.96 seconds
+- `bend run-cu`: GPU, NVIDIA RTX 4090: 0.21 seconds
+
+</details>
+  
+if you are interested in some other algorithms, you can check our [examples folder](https://github.com/HigherOrderCO/Bend/tree/main/examples)
 
 
-- To understand the tech behind Bend, check HVM2's [paper](https://paper.higherorderco.com).
-
-- Bend is developed by [HigherOrderCO](https://HigherOrderCO.com) - join our [Discord](https://discord.HigherOrderCO.com)!
-
-## Note
-
-It is very important to reinforce that, while Bend does what it was built to
-(i.e., scale in performance with cores, up to 10000+ concurrent threads), its
-single-core performance is still extremely sub-par. This is the first version of
-the system, and we haven't put much effort into a proper compiler yet. You can
-expect the raw performance to substantially improve on every release, as we work
-towards a proper codegen (including a constellation of missing optimizations).
-Meanwhile, you can use the interpreters today, to have a glimpse of what
-massively parallel programming looks like, from the lens of a Pythonish,
-high-level language!
+### Additional Resources
+ - To understand the technology behind Bend, check out the HVM2 [paper](https://docs.google.com/viewer?url=https://raw.githubusercontent.com/HigherOrderCO/HVM/main/paper/PAPER.pdf). Bend is developed by [HigherOrderCO](https://higherorderco.com/) - join our [Discord](https://discord.gg/kindelia)!
+ - Watch the [live demo video](https://x.com/i/status/1791213162525524076).

From 64df76db0048a23531eb4268c9bc5e06771bd4e4 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 12:20:19 -0300
Subject: [PATCH 02/10] Update README.md

Fix Typos
---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 25e62a01..5e4b1f53 100644
--- a/README.md
+++ b/README.md
@@ -28,7 +28,7 @@ Bend is powered by the [HVM2](https://github.com/higherorderco/hvm) runtime.
 
 ## Install
 
-### Install depedencies
+### Install dependencies
 
 #### On Linux
 ```py
@@ -171,7 +171,7 @@ Or you can run it using C interpreter (Parallel)
 bend run-c ssum.bend -s
 ```
 
-If you have a NVIDIA GPU, you can also run in CUDA (Massivelly parallel)
+If you have a NVIDIA GPU, you can also run in CUDA (Massively parallel)
 ```sh
 bend run-cu ssum.bend -s
 ```

From 3aa450d1244497bd712dd9eceed3dd241155690d Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:07:59 -0300
Subject: [PATCH 03/10] Update cspell.json

---
 cspell.json | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/cspell.json b/cspell.json
index 05e82cca..b35cc051 100644
--- a/cspell.json
+++ b/cspell.json
@@ -111,6 +111,10 @@
     "undefer",
     "vectorize",
     "vectorizes",
+    "parallel_sum",
+    "sequential_sum",
+    "tlsv",
+    "proto",  
     "walkdir",
   ],
   "files": [

From 2ca35165738747c659377757fae7a3aaba619f02 Mon Sep 17 00:00:00 2001
From: Vitor <vitor.chiarelli@gmail.com>
Date: Wed, 5 Jun 2024 13:31:30 -0300
Subject: [PATCH 04/10] edited the requirements

---
 GUIDE.md    | 43 ++++++++++++++++++++++++++++++++++---------
 README.md   | 34 ++++++++++++++++++----------------
 cspell.json |  4 +---
 3 files changed, 53 insertions(+), 28 deletions(-)

diff --git a/GUIDE.md b/GUIDE.md
index 4eac58ae..53ebaf0a 100644
--- a/GUIDE.md
+++ b/GUIDE.md
@@ -22,22 +22,47 @@ you just want to dive straight into action - this guide is for you. Let's go!
 
 Installation
 ------------
+### Install dependencies
 
-To use Bend, first, install [Rust](https://www.rust-lang.org/tools/install). Then, install HVM2 and Bend itself with:
+#### On Linux
+```py
+# Install Rust if you haven't it already.
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
+# For the C version of Bend, use GCC. We recommend a version up to 12.x.
+sudo apt install gcc
 ```
+For the CUDA runtime [install the CUDA toolkit for Linux](https://developer.nvidia.com/cuda-downloads?target_os=Linux) version 12.x.
+
+
+#### On Mac
+```py
+# Install Rust if you haven't it already.
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+
+# For the C version of Bend, use GCC. We recommend a version up to 12.x.
+brew install gcc
+```
+
+
+### Install Bend
+
+1. Install HVM2 by running:
+```sh
+# HVM2 is HOC's massively parallel Interaction Combinator evaluator.
 cargo install hvm
+
+# This ensures HVM is correctly installed and accessible.
+hvm --version
+```
+2. Install Bend by running:
+```sh
+# This command will install Bend
 cargo install bend-lang
-```
-
-To test if it worked, type:
 
+# This ensures Bend is correctly installed and accessible.
+bend --version
 ```
-bend --help
-```
-
-For GPU support, you also need the CUDA toolkit (CUDA and `nvcc`) version `12.X`. **It needs to be installed _before_ you install HVM.**
-At the moment, **only Nvidia GPUs** are supported.
 
 Hello, World!
 -------------
diff --git a/README.md b/README.md
index 5e4b1f53..5c280ca4 100644
--- a/README.md
+++ b/README.md
@@ -26,12 +26,13 @@ Bend is powered by the [HVM2](https://github.com/higherorderco/hvm) runtime.
 
 
 
+
 ## Install
 
 ### Install dependencies
 
 #### On Linux
-```py
+```sh
 # Install Rust if you haven't it already.
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
@@ -42,7 +43,7 @@ For the CUDA runtime [install the CUDA toolkit for Linux](https://developer.nvid
 
 
 #### On Mac
-```py
+```sh
 # Install Rust if you haven't it already.
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
@@ -90,12 +91,12 @@ bend run-cu <file.bend> # uses the CUDA interpreter (massively parallel)
 The example below sums all the numbers in the range from `start` to `target`. It can be written in two different methods: one that is inherently sequential (and thus cannot be parallelized), and another that is easily parallelizable. (We will be using the `-s`flag in most examples, for the sake of visibility)
 
 #### Sequential version:
-First, create a file named `ssum.bend`
+First, create a file named `sequential_sum.bend`
 ```sh
 # Write this command on your terminal
-touch ssum.bend
+touch sequential_sum.bend
 ```
-Then with your text editor, open the file `ssum.bend`, copy the code below and paste in the file.
+Then with your text editor, open the file `sequential_sum.bend`, copy the code below and paste in the file.
 
 ```py
 # Defines the function Sum with two parameters: start and target
@@ -115,29 +116,29 @@ def main():
 ##### Running the file
 You can run it using Rust interpreter (Sequential)
 ```sh
-bend run ssum.bend -s
+bend run sequential_sum.bend -s
 ```
 
 Or you can run it using C interpreter (Sequential)
 ```sh
-bend run-c ssum.bend -s
+bend run-c sequential_sum.bend -s
 ```
 
 If you have a NVIDIA GPU, you can also run in CUDA (Sequential)
 ```sh
-bend run-cu ssum.bend -s
+bend run-cu sequential_sum.bend -s
 ```
 
 In this version, the next value to be calculated depends on the previous sum, meaning that it cannot proceed until the current computation is complete. Now, let's look at the easily parallelizable version.
 
 
 #### Parallelizable version:
-First close the old file and then proceed to your terminal to create `psum.bend`
+First close the old file and then proceed to your terminal to create `parallel_sum.bend`
 ```sh
 # Write this command on your terminal
-touch psum.bend
+touch parallel_sum.bend
 ```
-Then with your text editor, open the file `psum.bend`, copy the code below and paste in the file.
+Then with your text editor, open the file `parallel_sum.bend`, copy the code below and paste in the file.
 
 ```py
 # Defines the function Sum with two parameters: start and target
@@ -163,17 +164,17 @@ In this example, the (3 + 4) sum does not depend on the (1 + 2), meaning that it
 ##### Running the file
 You can run it using Rust interpreter (Sequential)
 ```sh
-bend run psum.bend -s
+bend run parallel_sum.bend -s
 ```
 
 Or you can run it using C interpreter (Parallel)
 ```sh
-bend run-c ssum.bend -s
+bend run-c sequential_sum.bend -s
 ```
 
 If you have a NVIDIA GPU, you can also run in CUDA (Massively parallel)
 ```sh
-bend run-cu ssum.bend -s
+bend run-cu sequential_sum.bend -s
 ```
 
 In Bend, it can be parallelized by just changing the run command. If your code **can** run in parallel it **will** run in parallel.
@@ -272,5 +273,6 @@ if you are interested in some other algorithms, you can check our [examples fold
 
 
 ### Additional Resources
- - To understand the technology behind Bend, check out the HVM2 [paper](https://docs.google.com/viewer?url=https://raw.githubusercontent.com/HigherOrderCO/HVM/main/paper/PAPER.pdf). Bend is developed by [HigherOrderCO](https://higherorderco.com/) - join our [Discord](https://discord.gg/kindelia)!
- - Watch the [live demo video](https://x.com/i/status/1791213162525524076).
+ - To understand the technology behind Bend, check out the HVM2 [paper](https://docs.google.com/viewer?url=https://raw.githubusercontent.com/HigherOrderCO/HVM/main/paper/PAPER.pdf).
+ - We are working  https://github.com/HigherOrderCO/Bend/blob/main/GUIDE.md
+ - Bend is developed by [HigherOrderCO](https://higherorderco.com/) - join our [Discord](https://discord.higherorderco.com)!
diff --git a/cspell.json b/cspell.json
index b35cc051..bb4d4f74 100644
--- a/cspell.json
+++ b/cspell.json
@@ -111,11 +111,9 @@
     "undefer",
     "vectorize",
     "vectorizes",
-    "parallel_sum",
-    "sequential_sum",
     "tlsv",
     "proto",  
-    "walkdir",
+    "walkdir"
   ],
   "files": [
     "**/*.rs",

From 3ca84bc653976da8be3371b2d74397d0850a8f85 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:32:13 -0300
Subject: [PATCH 05/10] Update GUIDE.md

change from py to sh
---
 GUIDE.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/GUIDE.md b/GUIDE.md
index 53ebaf0a..2118d813 100644
--- a/GUIDE.md
+++ b/GUIDE.md
@@ -25,7 +25,7 @@ Installation
 ### Install dependencies
 
 #### On Linux
-```py
+```sh
 # Install Rust if you haven't it already.
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
@@ -36,7 +36,7 @@ For the CUDA runtime [install the CUDA toolkit for Linux](https://developer.nvid
 
 
 #### On Mac
-```py
+```sh
 # Install Rust if you haven't it already.
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 

From 1d66b89db2e61239c4372c417f351e0364ab3dbe Mon Sep 17 00:00:00 2001
From: Vitor <vitor.chiarelli@gmail.com>
Date: Wed, 5 Jun 2024 13:34:15 -0300
Subject: [PATCH 06/10] added guide

---
 README.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 5c280ca4..021f8523 100644
--- a/README.md
+++ b/README.md
@@ -274,5 +274,6 @@ if you are interested in some other algorithms, you can check our [examples fold
 
 ### Additional Resources
  - To understand the technology behind Bend, check out the HVM2 [paper](https://docs.google.com/viewer?url=https://raw.githubusercontent.com/HigherOrderCO/HVM/main/paper/PAPER.pdf).
- - We are working  https://github.com/HigherOrderCO/Bend/blob/main/GUIDE.md
+ - We are working on an official documentation, meanwhile for a more in depth
+     explanation check [GUIDE.md](https://github.com/HigherOrderCO/Bend/blob/main/GUIDE.md)
  - Bend is developed by [HigherOrderCO](https://higherorderco.com/) - join our [Discord](https://discord.higherorderco.com)!

From 51259e084ba2accc40311328fd0db9ccddb41fd3 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:41:22 -0300
Subject: [PATCH 07/10] Update README.md

Update to add features.md
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 021f8523..22731336 100644
--- a/README.md
+++ b/README.md
@@ -276,4 +276,5 @@ if you are interested in some other algorithms, you can check our [examples fold
  - To understand the technology behind Bend, check out the HVM2 [paper](https://docs.google.com/viewer?url=https://raw.githubusercontent.com/HigherOrderCO/HVM/main/paper/PAPER.pdf).
  - We are working on an official documentation, meanwhile for a more in depth
      explanation check [GUIDE.md](https://github.com/HigherOrderCO/Bend/blob/main/GUIDE.md)
+ - Read about our features at [FEATURES.md](https://github.com/HigherOrderCO/Bend/blob/main/FEATURES.md)
  - Bend is developed by [HigherOrderCO](https://higherorderco.com/) - join our [Discord](https://discord.higherorderco.com)!

From b9b29058ab80df79c03089be99e8ca5b53020190 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 13:45:25 -0300
Subject: [PATCH 08/10] Update README.md

Change the benchmark location
---
 README.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 22731336..5721a215 100644
--- a/README.md
+++ b/README.md
@@ -183,6 +183,12 @@ In Bend, it can be parallelized by just changing the run command. If your code *
 ### Speedup Examples
 The code snippet below implements a [bitonic sorter](https://en.wikipedia.org/wiki/Bitonic_sorter) with *immutable tree rotations*. It's not the type of algorithm you would expect to run fast on GPUs. However, since it uses a divide and conquer approach, which is inherently parallel, Bend will execute it on multiple threads, no thread creation, no explicit lock management.
 
+#### Bitonic Sorter Benchmark
+
+- `bend run`: CPU, Apple M3 Max: 12.15 seconds
+- `bend run-c`: CPU, Apple M3 Max: 0.96 seconds
+- `bend run-cu`: GPU, NVIDIA RTX 4090: 0.21 seconds
+
  <details>
   <summary>Click here for the Bitonic Sorter code </summary> 
    
@@ -261,12 +267,6 @@ def main:
   return sum(20, sort(20, 0, gen(20, 0)))
 ```
 
-#### Benchmark
-
-- `bend run`: CPU, Apple M3 Max: 12.15 seconds
-- `bend run-c`: CPU, Apple M3 Max: 0.96 seconds
-- `bend run-cu`: GPU, NVIDIA RTX 4090: 0.21 seconds
-
 </details>
   
 if you are interested in some other algorithms, you can check our [examples folder](https://github.com/HigherOrderCO/Bend/tree/main/examples)

From 1410e8f0be1215d99f7faa142515ea22a2dd2b85 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 15:00:25 -0300
Subject: [PATCH 09/10] Update cspell.json

Removed not necessary stuff from cspell
---
 cspell.json | 2 --
 1 file changed, 2 deletions(-)

diff --git a/cspell.json b/cspell.json
index bb4d4f74..acaef3a6 100644
--- a/cspell.json
+++ b/cspell.json
@@ -111,8 +111,6 @@
     "undefer",
     "vectorize",
     "vectorizes",
-    "tlsv",
-    "proto",  
     "walkdir"
   ],
   "files": [

From fb2b9b45f85a248a83beb47f51019d767810a9c1 Mon Sep 17 00:00:00 2001
From: Sipher <77928770+Sipher@users.noreply.github.com>
Date: Wed, 5 Jun 2024 15:07:17 -0300
Subject: [PATCH 10/10] Update cspell.json

Re add
 "tlsv",
 "proto",
To remove cspell error
---
 cspell.json | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/cspell.json b/cspell.json
index acaef3a6..c6d8ea83 100644
--- a/cspell.json
+++ b/cspell.json
@@ -111,6 +111,8 @@
     "undefer",
     "vectorize",
     "vectorizes",
+    "tlsv",
+    "proto", 
     "walkdir"
   ],
   "files": [