-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch MNIST example scripts with instructions #63
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
07cbead
docs: torch mnist example code with readme
JuanPedroGHM 952afc1
updated to match perun 0.4
JuanPedroGHM 079a212
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] ba35219
example output from torch_minst example
JuanPedroGHM 54f236f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 2723995
Expanded on decorators for example README.md
JuanPedroGHM 8684f36
docs: added links to example
JuanPedroGHM 7429e0f
Merge branch 'docs/torch_mnist_example' of github.com:Helmholtz-AI-En…
JuanPedroGHM 474b81e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,3 +133,5 @@ dmypy.json | |
*.out | ||
poetry.lock | ||
.perun.ini | ||
examples/data | ||
examples/**/perun_results |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# Torch MNIST Example | ||
|
||
This directory contains everything to you need to start using **perun** in your workflows. As an example, we are using the [torch](https://pytorch.org/) package to train a neural network to recognize the handwritten digits using the MNIST dataset. | ||
|
||
## Setup | ||
|
||
It is recommended that you create a new environment with any new project using the *venv* package | ||
|
||
```console | ||
python -m venv venv/perun-example | ||
source venv/perun-example/bin/activate | ||
``` | ||
|
||
or with *conda* | ||
|
||
```console | ||
conda create --name perun-example | ||
conda activate perun-example | ||
``` | ||
|
||
Once your new enviornment is ready, you can install the dependencies for the example. | ||
|
||
```console | ||
pip install -r requirements.txt | ||
``` | ||
|
||
This includes **perun** and the scripts dependencies. The the root of the project includes a minimal configuration file example.perun.ini*, with some basic options. More details on the configuration options can be found [in the docs](https://perun.readthedocs.io/en/latest/configuration.html). | ||
|
||
To make sure **perun** was installed properly and that it has access to some hardware sensors, run the command | ||
|
||
```console | ||
perun sensors | ||
``` | ||
|
||
## Monitoring | ||
|
||
Now everything is ready to start getting data. To get monitor your script a single time, simply run: | ||
|
||
```console | ||
perun monitor torch_mnist.py | ||
``` | ||
|
||
After the script finishes running, a folder *perun_results* will be created containing the consumption report of your application as a text file, including the all the raw data saved in an hdf5 file. | ||
|
||
To explored the contents of the hdf5 file, we recomed the **h5py** library or the [myHDF5](https://myhdf5.hdfgroup.org) website. | ||
|
||
The text report from running the MNIST example should look like this: | ||
|
||
```text | ||
PERUN REPORT | ||
|
||
App name: torch_mnist | ||
First run: 2023-08-22T17:44:34.927402 | ||
Last run: 2023-08-22T17:44:34.927402 | ||
|
||
|
||
RUN ID: 2023-08-22T17:44:34.927402 | ||
|
||
| Round # | Host | RUNTIME | ENERGY | CPU_POWER | CPU_UTIL | GPU_POWER | GPU_MEM | DRAM_POWER | MEM_UTIL | | ||
|----------:|:--------------------|:----------|:----------|:------------|:-----------|:------------|:----------|:-------------|:-----------| | ||
| 0 | hkn0402.localdomain | 61.954 s | 28.440 kJ | 203.619 W | 0.867 % | 232.448 W | 4.037 GB | 22.923 W | 0.033 % | | ||
| 0 | All | 61.954 s | 28.440 kJ | 203.619 W | 0.867 % | 232.448 W | 4.037 GB | 22.923 W | 0.033 % | | ||
|
||
Monitored Functions | ||
|
||
| Round # | Function | Avg Calls / Rank | Avg Runtime | Avg Power | Avg CPU Util | Avg GPU Mem Util | | ||
|----------:|:------------|-------------------:|:---------------|:-----------------|:---------------|:-------------------| | ||
| 0 | train | 1 | 50.390±0.000 s | 456.993±0.000 W | 0.869±0.000 % | 2.731±0.000 % | | ||
| 0 | train_epoch | 5 | 8.980±1.055 s | 433.082±11.012 W | 0.874±0.007 % | 2.746±0.148 % | | ||
| 0 | test | 5 | 1.098±0.003 s | 274.947±83.746 W | 0.804±0.030 % | 2.808±0.025 % | | ||
|
||
The application has run been run 1 times. Throught its runtime, it has used 0.012 kWh, released a total of 0.005 kgCO2e into the atmosphere, and you paid 0.00 € in electricity for it. | ||
``` | ||
|
||
The results display data about the functions *train*, *test_epoch* and *test*. Those functions were specialy marked using the ```@monitor()``` decorator. | ||
|
||
```python | ||
@monitor() | ||
def train(args, model, device, train_loader, test_loader, optimizer, scheduler): | ||
for epoch in range(1, args.epochs + 1): | ||
train_epoch(args, model, device, train_loader, optimizer, epoch) | ||
test(model, device, test_loader) | ||
scheduler.step() | ||
``` | ||
|
||
## Benchmarking | ||
|
||
If you need to run your code multiple times to gather statistics, perun includes an option called ```--rounds```. The application will be run multiple times, with each run added to similar tables as the one generated for a single run. | ||
|
||
|
||
```console | ||
perun monitor --rounds 5 torch_mnist.py | ||
``` | ||
|
||
```text | ||
PERUN REPORT | ||
|
||
App name: torch_mnist | ||
First run: 2023-08-22T17:44:34.927402 | ||
Last run: 2023-08-22T17:45:46.992693 | ||
|
||
|
||
RUN ID: 2023-08-22T17:45:46.992693 | ||
|
||
| Round # | Host | RUNTIME | ENERGY | CPU_POWER | CPU_UTIL | GPU_POWER | GPU_MEM | DRAM_POWER | MEM_UTIL | | ||
|----------:|:--------------------|:----------|:----------|:------------|:-----------|:------------|:----------|:-------------|:-----------| | ||
| 0 | hkn0402.localdomain | 52.988 s | 24.379 kJ | 202.854 W | 0.865 % | 234.184 W | 4.281 GB | 22.858 W | 0.034 % | | ||
| 0 | All | 52.988 s | 24.379 kJ | 202.854 W | 0.865 % | 234.184 W | 4.281 GB | 22.858 W | 0.034 % | | ||
| 1 | hkn0402.localdomain | 48.401 s | 22.319 kJ | 203.366 W | 0.886 % | 234.821 W | 4.513 GB | 22.798 W | 0.034 % | | ||
| 1 | All | 48.401 s | 22.319 kJ | 203.366 W | 0.886 % | 234.821 W | 4.513 GB | 22.798 W | 0.034 % | | ||
| 2 | hkn0402.localdomain | 48.258 s | 22.248 kJ | 203.339 W | 0.884 % | 234.720 W | 4.513 GB | 22.850 W | 0.034 % | | ||
| 2 | All | 48.258 s | 22.248 kJ | 203.339 W | 0.884 % | 234.720 W | 4.513 GB | 22.850 W | 0.034 % | | ||
| 3 | hkn0402.localdomain | 48.537 s | 22.393 kJ | 203.269 W | 0.884 % | 234.984 W | 4.513 GB | 22.968 W | 0.034 % | | ||
| 3 | All | 48.537 s | 22.393 kJ | 203.269 W | 0.884 % | 234.984 W | 4.513 GB | 22.968 W | 0.034 % | | ||
| 4 | hkn0402.localdomain | 48.416 s | 22.323 kJ | 203.408 W | 0.888 % | 234.626 W | 4.513 GB | 22.928 W | 0.034 % | | ||
| 4 | All | 48.416 s | 22.323 kJ | 203.408 W | 0.888 % | 234.626 W | 4.513 GB | 22.928 W | 0.034 % | | ||
|
||
Monitored Functions | ||
|
||
| Round # | Function | Avg Calls / Rank | Avg Runtime | Avg Power | Avg CPU Util | Avg GPU Mem Util | | ||
|----------:|:------------|-------------------:|:---------------|:-----------------|:---------------|:-------------------| | ||
| 0 | train | 1 | 50.169±0.000 s | 458.380±0.000 W | 0.875±0.000 % | 2.727±0.000 % | | ||
| 0 | train_epoch | 5 | 8.930±0.903 s | 439.707±12.743 W | 0.875±0.008 % | 2.743±0.154 % | | ||
| 0 | test | 5 | 1.103±0.004 s | 232.750±1.219 W | 0.805±0.030 % | 2.809±0.023 % | | ||
| 1 | train | 1 | 48.354±0.000 s | 453.376±0.000 W | 0.886±0.000 % | 2.820±0.000 % | | ||
| 1 | train_epoch | 5 | 8.556±0.008 s | 428.418±11.199 W | 0.890±0.018 % | 2.820±0.000 % | | ||
| 1 | test | 5 | 1.115±0.002 s | 272.918±80.330 W | 0.798±0.018 % | 2.820±0.000 % | | ||
| 2 | train | 1 | 48.210±0.000 s | 453.867±0.000 W | 0.884±0.000 % | 2.820±0.000 % | | ||
| 2 | train_epoch | 5 | 8.525±0.022 s | 423.647±1.049 W | 0.888±0.013 % | 2.820±0.000 % | | ||
| 2 | test | 5 | 1.117±0.005 s | 312.983±97.688 W | 0.806±0.012 % | 2.820±0.000 % | | ||
| 3 | train | 1 | 48.486±0.000 s | 452.940±0.000 W | 0.884±0.000 % | 2.820±0.000 % | | ||
| 3 | train_epoch | 5 | 8.577±0.012 s | 433.627±13.812 W | 0.888±0.017 % | 2.820±0.000 % | | ||
| 3 | test | 5 | 1.120±0.003 s | 233.973±3.516 W | 0.789±0.022 % | 2.820±0.000 % | | ||
| 4 | train | 1 | 48.367±0.000 s | 453.256±0.000 W | 0.888±0.000 % | 2.820±0.000 % | | ||
| 4 | train_epoch | 5 | 8.555±0.011 s | 433.582±12.606 W | 0.899±0.029 % | 2.820±0.000 % | | ||
| 4 | test | 5 | 1.118±0.002 s | 233.367±2.238 W | 0.818±0.045 % | 2.820±0.000 % | | ||
|
||
The application has run been run 2 times. Throught its runtime, it has used 0.062 kWh, released a total of 0.026 kgCO2e into the atmosphere, and you paid 0.02 € in electricity for it. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
certifi==2023.5.7 | ||
charset-normalizer==3.1.0 | ||
click==8.1.3 | ||
cmake==3.26.4 | ||
filelock==3.12.2 | ||
h5py==3.9.0 | ||
idna==3.4 | ||
Jinja2==3.1.2 | ||
lit==16.0.6 | ||
MarkupSafe==2.1.3 | ||
mpmath==1.3.0 | ||
networkx==3.1 | ||
numpy==1.24.4 | ||
nvidia-cublas-cu11==11.10.3.66 | ||
nvidia-cuda-cupti-cu11==11.7.101 | ||
nvidia-cuda-nvrtc-cu11==11.7.99 | ||
nvidia-cuda-runtime-cu11==11.7.99 | ||
nvidia-cudnn-cu11==8.5.0.96 | ||
nvidia-cufft-cu11==10.9.0.58 | ||
nvidia-curand-cu11==10.2.10.91 | ||
nvidia-cusolver-cu11==11.4.0.1 | ||
nvidia-cusparse-cu11==11.7.4.91 | ||
nvidia-nccl-cu11==2.14.3 | ||
nvidia-nvtx-cu11==11.7.91 | ||
pandas==2.0.3 | ||
perun==0.4.0 | ||
Pillow==10.0.0 | ||
psutil==5.9.5 | ||
py-cpuinfo==5.0.0 | ||
pynvml==11.5.0 | ||
python-dateutil==2.8.2 | ||
pytz==2023.3 | ||
requests==2.31.0 | ||
six==1.16.0 | ||
sympy==1.12 | ||
torch==2.0.1 | ||
torchvision==0.15.2 | ||
triton==2.0.0 | ||
typing_extensions==4.7.1 | ||
tzdata==2023.3 | ||
urllib3==2.0.3 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here:
The application has run been run 2 times.
should perhaps be
The application has now run 2 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've corrected it in the main branch.