Skip to content

Commit

Permalink
Update ReadMe
Browse files Browse the repository at this point in the history
  • Loading branch information
adityauj committed Sep 17, 2024
1 parent fd014fe commit 286516a
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 4 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
/target
*.data
*.svg
5 changes: 1 addition & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,6 @@ $(TARGET):
@cargo b --release
@cp ./target/release/$(TARGET) $(TARGET)

# swagger:
# $(info ===> GENERATE swagger)
# @go run github.com/swaggo/swag/cmd/swag init -d ./internal/api,./internal/util -g api.go -o ./api
# @mv ./api/docs.go ./internal/api/docs.go
install:
$(info ===> INSTALL)
@cargo install cargo-asm
Expand All @@ -29,6 +25,7 @@ clean:
$(info ===> CLEAN)
@cargo clean
@rm -f $(TARGET)
@rm -f *.data *svg *.data.old

test:
$(info ===> TESTING)
Expand Down
130 changes: 130 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# BandwidthBenchmark in Rust

Implementation of [TheBandwidthBenchmark](https://github.com/RRZE-HPC/TheBandwidthBenchmark) in Rust with multi-threading support.

This is a collection of simple streaming kernels.

Apart from the micro-benchmark functionality this is also a blueprint for other micro-benchmark applications.

Output is similar to C-version for compatibility.

# Target-triple and Target-cpu

You can set your own target-triple and target-cpu in **./cargo/config.toml**.
The reason for specifying the target-triple and target-cpu is to be able to generate optimal assembly with all the instructions supported by your cpu architecture.

Usually a list of target-features for a specific target-triple and target-cpu can be listed using following command:

```
rustc --print cfg -C target-cpu=native -C opt-level=3
```

By default, the target-triple is:
```
[target.x86-64-unknown-linux-gnu]
rustflags = [
"-C",
"target-cpu=native",
"-C",
"opt-level=3",
]
```

# Building and running the program
It is fairly simple to run the program.

A binary named **bench** can be built using :
```
cargo b --release
```
This command will output **bench** binary in ./target/release.
Then you can juse use
```
cargo r --release
```
The second option to build a binary is to use Makefile commands.
```
make
```
comand will output **bench** binary in the ./ directory i.e. the current folder.
Then you can juse use
```
./bench
```

The binary takes 3 parameters : **-n, -size, -ntimes** which are explained below:

```
Usage: ./bench [OPTIONS]
or
Usgae: cargo r --release -- [OPTIONS]
Options:
-n, --n <N>
Number of threads
[default: max #threads available on your machine]
-s, --size <SIZE>
Size of the total dataset in bytes
[default: 120000000]
-n, --ntimes <NTIMES>
Number of time to run all the benchmarks
[default: 10]
-h, --help
Print help (see a summary with '-h')
-V, --version
Print version
```

If you just use
```
./bench
```
the program will run in multi-threaded fashion with max available cores on you CPU and with 120000000 bytes of data per vector.

If you wish to run the program serially, run the below command:
```
./bench -n 1
```

# Assembly Output
You can generate assembly either for the whole code or just for specific kernel.

1. To generate assembly for the whole program, use below command:
```
make asm
```
2. To generate assembly specific to a kernel, please make sure that [cargo-show-asm](https://crates.io/crates/cargo-show-asm) is installed. Then use the following command:
```
cargo asm bench::copy --rust
```

**Note :** To make assembly available for a specific kernels, **#[inline(never)]** is specificed above the kernel.

# Output
A sample output from the benchmark is shown below:

```
Benchmarking with 8 threads.
Total allocated datasize: 3840.00 MB.
Initialization of arrays took : 506.008814ms.
----------------------------------------------------------------------------------------------------------
Function | Rate(MB/s) | Rate(MFlop/s) | Avg time | Min time | Max time |
----------------------------------------------------------------------------------------------------------
Init: | 8923.15 | - | 0.1120 | 0.1076 | 0.1372 |
Sum: | 19562.93 | 2445.37 | 0.0549 | 0.0491 | 0.0883 |
Copy: | 11859.23 | - | 0.1655 | 0.1619 | 0.1868 |
Update: | 17723.62 | 1107.73 | 0.1100 | 0.1083 | 0.1143 |
Triad: | 13162.47 | 1096.87 | 0.2207 | 0.2188 | 0.2255 |
Daxpy: | 18254.28 | 1521.19 | 0.1604 | 0.1578 | 0.1643 |
STriad: | 14149.89 | 884.37 | 0.2732 | 0.2714 | 0.2819 |
SDaxpy: | 17545.77 | 1096.61 | 0.2219 | 0.2189 | 0.2240 |
----------------------------------------------------------------------------------------------------------
Solution Validates
```
Binary file removed bench
Binary file not shown.
Binary file removed perf.data
Binary file not shown.

0 comments on commit 286516a

Please sign in to comment.