Skip to content

Commit

Permalink
docs: mlflow example
Browse files Browse the repository at this point in the history
  • Loading branch information
JuanPedroGHM committed Jan 9, 2024
1 parent 5b41ece commit bd48e02
Show file tree
Hide file tree
Showing 6 changed files with 24 additions and 4 deletions.
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ authors:
family-names: Gutiérrez Hermosillo Muriedas
email: [email protected]
affiliation: >-
Scientific Computing Centre, Karlsruhe Institute für
Scientific Computing Center, Karlsruhe Institute für
Technologie
orcid: 'https://orcid.org/0000-0001-8439-7145'
repository-code: 'https://github.com/Helmholtz-AI-Energy/perun'
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ mpirun -n 8 perun monitor path/to/your/script.py

## Docs

To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/).
To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/) or check the [examples](https://github.com/Helmholtz-AI-Energy/perun/tree/main/examples).

## Citing perun

Expand Down
6 changes: 5 additions & 1 deletion docs/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,12 @@
Data
====

perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual API that perun uses to gather measurements, and a single API can provide information about multiple devices, and multiple devices can be monitored from multiple APIs.
perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual values that can be collected from the distinct monitoring backends.

.. image:: images/data_structure.png

Each node in the data structure, once the raw data at the bottom has been processed, contain a set of summarized metrics based on the data that was collected by its sub-nodes, and a metadata dictionary with any information that could be obtained by the application, node, device or API.

Each node contains a list of metrics or stats, which represent the accumulated data. As well as metadata.

The nodeType attribute indiciates the type of object in the hierarchy this nodes represents. At the lowest level, the leafs of the tree, you would have individual "sensors", values collected by a single device or interface. Higher up the tree, the data nodes represent groups of devices and computational nodes. The three bottom levels of the tree represent the hardware. Further up the three, data starts being acumulated by individual runs of the application, with "run" being a single execution of the application, a "multi_run" is the data from multiple runs when perun is run with the ```--rounds N``` option, and at the highest level, the root of the tree is the application itself.
Binary file modified docs/images/data_structure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 16 additions & 0 deletions examples/mlflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# perun + MLFLow

If you are already using monitoring tools like MLFlow, you might want to add the data collected by perun to enhance the already existing data. This can be done easily with the ```@register_callback``` decorator. An example is shown in the train.py file:

```python
@register_callback
def perun2mlflow(node):
mlflow.start_run(active_run.info.run_id)
for metricType, metric in node.metrics.items():
name = f"{metricType.value}"
mlflow.log_metric(name, metric.value)
```

Functions decorated by ```@register_callback``` takes only one argument, ```node```. The node object is an instance of ```perun.data_model.data.DataNode```, which is a tree structure that contains all the data collected while monitoring the current script. Each node contains the accumulated data of the sub-nodes in the ```metrics``` dictionary. Each metric object contains all the metadata relevant to the value and the value itself. In the example above, the summarized values for power, energy and hardware utilization are being submitted as metrics to the mlflow tracking system.

For more information on the data node object, [check our docs](https://perun.readthedocs.io/en/latest/data.html)
2 changes: 1 addition & 1 deletion perun/processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ def processDataNode(
pue = perunConfig.getfloat("post-processing", "pue")
emissions_factor = perunConfig.getfloat("post-processing", "emissions_factor")
price_factor = perunConfig.getfloat("post-processing", "price_factor")
total_energy = dataNode.metrics[MetricType.ENERGY].value * pue
total_energy = dataNode.metrics[MetricType.ENERGY].value * pue # type: ignore
dataNode.metrics[MetricType.ENERGY].value = total_energy # type: ignore
e_kWh = total_energy / (3600 * 1e3)

Expand Down

0 comments on commit bd48e02

Please sign in to comment.