Skip to content

Commit

Permalink
updated to match perun 0.4
Browse files Browse the repository at this point in the history
  • Loading branch information
JuanPedroGHM committed Aug 22, 2023
1 parent 07cbead commit 952afc1
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 6 deletions.
38 changes: 33 additions & 5 deletions examples/torch_mnist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Once your new enviornment is ready, you can install the dependencies for the exa
pip install -r requirements.txt
```

This includes **perun** and the scripts dependencies. The example includes a minimal configuration file *.perun.ini*, with some basic options. More details on the configuration options can be found [in the docs](https://perun.readthedocs.io/en/latest/configuration.html).
This includes **perun** and the scripts dependencies. The the root of the project includes a minimal configuration file example.perun.ini*, with some basic options. More details on the configuration options can be found [in the docs](https://perun.readthedocs.io/en/latest/configuration.html).

To make sure **perun** was installed properly and that it has access to some hardware sensors, run the command

Expand All @@ -40,15 +40,43 @@ Now everything is ready to start getting data. To get monitor your script a sing
perun monitor torch_mnist.py
```

After the script finishes running, a folder *perun_results* will be created containing the consumption report of your application as a text file.
After the script finishes running, a folder *perun_results* will be created containing the consumption report of your application as a text file, including the all the raw data saved in an hdf5 file.

To explored the contents of the hdf5 file, we recomed the **h5py** library or the [myHDF5](https://myhdf5.hdfgroup.org) website.

The text report from running the MNIST example should look like this:

```text
PERUN REPORT
App name: torch_mnist
First run: 2023-08-22T17:10:52.864155
Last run: 2023-08-22T17:10:52.864155
RUN ID: 2023-08-22T17:10:52.864155
| Round # | Host | RUNTIME | CPU_UTIL | MEM_UTIL |
|----------:|:----------------|:----------|:-----------|:-----------|
| 0 | juan-20w000p2ge | 330.841 s | 63.367 % | 0.559 % |
| 0 | All | 330.841 s | 63.367 % | 0.559 % |
Monitored Functions
| Round # | Function | Avg Calls / Rank | Avg Runtime | Avg Power | Avg CPU Util | Avg GPU Mem Util |
|----------:|:------------|-------------------:|:----------------|:--------------|:---------------|:-------------------|
| 0 | train | 1 | 329.300±0.000 s | 0.000±0.000 W | 63.594±0.000 % | 0.000±0.000 % |
| 0 | train_epoch | 5 | 61.563±2.827 s | 0.000±0.000 W | 64.669±2.130 % | 0.000±0.000 % |
| 0 | test | 5 | 4.297±0.069 s | 0.000±0.000 W | 46.278±1.119 % | 0.000±0.000 % |
The application has run been run 1 times.
```

## Benchmarking

Benchmarking mode can help users obtain more indept data about the runtime and energy consumption of your scripts, by running the application multiple times and providing statistics like mean, std, min and max values for the usual measurements. To use benchmarking mode, add the `--bench` option to the perun command:


```console
perun --bench monitor torch_mnist.py
perun monitor --rounds 5 torch_mnist.py
```

> The application will be run 10 times, when benchmarking is active. If you want to reduce the runtime of the example, either reduce the number of training epochs ```... torch_mnist.py --epochs 1``` or reduce the number of rounds **perun** runs the applications ```perun --bench --bench_rounds 5 monitor ...```.
2 changes: 1 addition & 1 deletion examples/torch_mnist/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
pandas==2.0.3
perun==0.3.2
perun==0.4.0
Pillow==10.0.0
psutil==5.9.5
py-cpuinfo==5.0.0
Expand Down
5 changes: 5 additions & 0 deletions examples/torch_mnist/torch_mnist.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

from perun import monitor


class Net(nn.Module):
def __init__(self):
Expand Down Expand Up @@ -35,13 +37,15 @@ def forward(self, x):
return output


@monitor()
def train(args, model, device, train_loader, test_loader, optimizer, scheduler):
for epoch in range(1, args.epochs + 1):
train_epoch(args, model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
scheduler.step()


@monitor()
def train_epoch(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
Expand All @@ -65,6 +69,7 @@ def train_epoch(args, model, device, train_loader, optimizer, epoch):
break


@monitor()
def test(model, device, test_loader):
model.eval()
test_loss = 0
Expand Down

0 comments on commit 952afc1

Please sign in to comment.