updated to match perun 0.4

Helmholtz-AI-Energy · Aug 22, 2023 · 952afc1 · 952afc1
1 parent 07cbead
commit 952afc1
Show file tree

Hide file tree

Showing 3 changed files with 39 additions and 6 deletions.
diff --git a/examples/torch_mnist/README.md b/examples/torch_mnist/README.md
@@ -24,7 +24,7 @@ Once your new enviornment is ready, you can install the dependencies for the exa
 pip install -r requirements.txt
 ```
 
-This includes **perun** and the scripts dependencies. The example includes a minimal configuration file *.perun.ini*, with some basic options. More details on the configuration options can be found [in the docs](https://perun.readthedocs.io/en/latest/configuration.html).
+This includes **perun** and the scripts dependencies. The the root of the project includes a minimal configuration file example.perun.ini*, with some basic options. More details on the configuration options can be found [in the docs](https://perun.readthedocs.io/en/latest/configuration.html).
 
 To make sure **perun** was installed properly and that it has access to some hardware sensors, run the command
 
@@ -40,15 +40,43 @@ Now everything is ready to start getting data. To get monitor your script a sing
 perun monitor torch_mnist.py
 ```
 
-After the script finishes running, a folder *perun_results* will be created containing the consumption report of your application as a text file.
+After the script finishes running, a folder *perun_results* will be created containing the consumption report of your application as a text file, including the all the raw data saved in an hdf5 file.
+
+To explored the contents of the hdf5 file, we recomed the **h5py** library or the [myHDF5](https://myhdf5.hdfgroup.org) website.
+
+The text report from running the MNIST example should look like this:
+
+```text
+PERUN REPORT
+
+App name: torch_mnist
+First run: 2023-08-22T17:10:52.864155
+Last run: 2023-08-22T17:10:52.864155
+
+
+RUN ID: 2023-08-22T17:10:52.864155
+
+|   Round # | Host            | RUNTIME   | CPU_UTIL   | MEM_UTIL   |
+|----------:|:----------------|:----------|:-----------|:-----------|
+|         0 | juan-20w000p2ge | 330.841 s | 63.367 %   | 0.559 %    |
+|         0 | All             | 330.841 s | 63.367 %   | 0.559 %    |
+
+Monitored Functions
+
+|   Round # | Function    |   Avg Calls / Rank | Avg Runtime     | Avg Power     | Avg CPU Util   | Avg GPU Mem Util   |
+|----------:|:------------|-------------------:|:----------------|:--------------|:---------------|:-------------------|
+|         0 | train       |                  1 | 329.300±0.000 s | 0.000±0.000 W | 63.594±0.000 % | 0.000±0.000 %      |
+|         0 | train_epoch |                  5 | 61.563±2.827 s  | 0.000±0.000 W | 64.669±2.130 % | 0.000±0.000 %      |
+|         0 | test        |                  5 | 4.297±0.069 s   | 0.000±0.000 W | 46.278±1.119 % | 0.000±0.000 %      |
+
+The application has run been run 1 times.
+```
 
 ## Benchmarking
 
 Benchmarking mode can help users obtain more indept data about the runtime and energy consumption of your scripts, by running the application multiple times and providing statistics like mean, std, min and max values for the usual measurements. To use benchmarking mode, add the `--bench` option to the perun command:
 
 
 ```console
-perun --bench monitor torch_mnist.py
+perun monitor --rounds 5 torch_mnist.py
 ```
-
- > The application will be run 10 times, when benchmarking is active. If you want to reduce the runtime of the example, either reduce the number of training epochs ```... torch_mnist.py --epochs 1``` or reduce the number of rounds **perun** runs the applications ```perun --bench --bench_rounds 5 monitor ...```.
diff --git a/examples/torch_mnist/requirements.txt b/examples/torch_mnist/requirements.txt
@@ -23,7 +23,7 @@ nvidia-cusparse-cu11==11.7.4.91
 nvidia-nccl-cu11==2.14.3
 nvidia-nvtx-cu11==11.7.91
 pandas==2.0.3
-perun==0.3.2
+perun==0.4.0
 Pillow==10.0.0
 psutil==5.9.5
 py-cpuinfo==5.0.0

diff --git a/examples/torch_mnist/torch_mnist.py b/examples/torch_mnist/torch_mnist.py
@@ -8,6 +8,8 @@
 from torch.utils.data import DataLoader
 from torchvision import datasets, transforms
 
+from perun import monitor
+
 
 class Net(nn.Module):
     def __init__(self):
@@ -35,13 +37,15 @@ def forward(self, x):
         return output
 
 
+@monitor()
 def train(args, model, device, train_loader, test_loader, optimizer, scheduler):
     for epoch in range(1, args.epochs + 1):
         train_epoch(args, model, device, train_loader, optimizer, epoch)
         test(model, device, test_loader)
         scheduler.step()
 
 
+@monitor()
 def train_epoch(args, model, device, train_loader, optimizer, epoch):
     model.train()
     for batch_idx, (data, target) in enumerate(train_loader):
@@ -65,6 +69,7 @@ def train_epoch(args, model, device, train_loader, optimizer, epoch):
                 break
 
 
+@monitor()
 def test(model, device, test_loader):
     model.eval()
     test_loss = 0