mirror of
https://github.com/HigherOrderCO/Bend.git
synced 2024-10-26 05:50:18 +03:00
Update README.md
Change the benchmark location
This commit is contained in:
parent
51259e084b
commit
b9b29058ab
12
README.md
12
README.md
@ -183,6 +183,12 @@ In Bend, it can be parallelized by just changing the run command. If your code *
|
||||
### Speedup Examples
|
||||
The code snippet below implements a [bitonic sorter](https://en.wikipedia.org/wiki/Bitonic_sorter) with *immutable tree rotations*. It's not the type of algorithm you would expect to run fast on GPUs. However, since it uses a divide and conquer approach, which is inherently parallel, Bend will execute it on multiple threads, no thread creation, no explicit lock management.
|
||||
|
||||
#### Bitonic Sorter Benchmark
|
||||
|
||||
- `bend run`: CPU, Apple M3 Max: 12.15 seconds
|
||||
- `bend run-c`: CPU, Apple M3 Max: 0.96 seconds
|
||||
- `bend run-cu`: GPU, NVIDIA RTX 4090: 0.21 seconds
|
||||
|
||||
<details>
|
||||
<summary>Click here for the Bitonic Sorter code </summary>
|
||||
|
||||
@ -261,12 +267,6 @@ def main:
|
||||
return sum(20, sort(20, 0, gen(20, 0)))
|
||||
```
|
||||
|
||||
#### Benchmark
|
||||
|
||||
- `bend run`: CPU, Apple M3 Max: 12.15 seconds
|
||||
- `bend run-c`: CPU, Apple M3 Max: 0.96 seconds
|
||||
- `bend run-cu`: GPU, NVIDIA RTX 4090: 0.21 seconds
|
||||
|
||||
</details>
|
||||
|
||||
if you are interested in some other algorithms, you can check our [examples folder](https://github.com/HigherOrderCO/Bend/tree/main/examples)
|
||||
|
Loading…
Reference in New Issue
Block a user