docs: mlflow example

Helmholtz-AI-Energy · Jan 9, 2024 · bd48e02 · bd48e02
1 parent 5b41ece
commit bd48e02
Show file tree

Hide file tree

Showing 6 changed files with 24 additions and 4 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -10,7 +10,7 @@ authors:
     family-names: Gutiérrez Hermosillo Muriedas
     email: [email protected]
     affiliation: >-
-      Scientific Computing Centre, Karlsruhe Institute für
+      Scientific Computing Center, Karlsruhe Institute für
       Technologie
     orcid: 'https://orcid.org/0000-0001-8439-7145'
 repository-code: 'https://github.com/Helmholtz-AI-Energy/perun'

diff --git a/README.md b/README.md
@@ -116,7 +116,7 @@ mpirun -n 8 perun monitor path/to/your/script.py
 
 ## Docs
 
-To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/).
+To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/) or check the [examples](https://github.com/Helmholtz-AI-Energy/perun/tree/main/examples).
 
 ## Citing perun
 

diff --git a/docs/data.rst b/docs/data.rst
@@ -3,8 +3,12 @@
 Data
 ====
 
-perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual API that perun uses to gather measurements, and a single API can provide information about multiple devices, and multiple devices can be monitored from multiple APIs.
+perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual values that can be collected from the distinct monitoring backends.
 
 .. image:: images/data_structure.png
 
 Each node in the data structure, once the raw data at the bottom has been processed, contain a set of summarized metrics based on the data that was collected by its sub-nodes, and a metadata dictionary with any information that could be obtained by the application, node, device or API.
+
+Each node contains a list of metrics or stats, which represent the accumulated data. As well as metadata.
+
+The nodeType attribute indiciates the type of object in the hierarchy this nodes represents. At the lowest level, the leafs of the tree, you would have individual "sensors", values collected by a single device or interface. Higher up the tree, the data nodes represent groups of devices and computational nodes. The three bottom levels of the tree represent the hardware. Further up the three, data starts being acumulated by individual runs of the application, with "run" being a single execution of the application, a "multi_run" is the data from multiple runs when perun is run with the ```--rounds N``` option, and at the highest level, the root of the tree is the application itself.
diff --git a/docs/images/data_structure.png b/docs/images/data_structure.png
diff --git a/examples/mlflow/README.md b/examples/mlflow/README.md
@@ -0,0 +1,16 @@
+# perun + MLFLow
+
+If you are already using monitoring tools like MLFlow, you might want to add the data collected by perun to enhance the already existing data. This can be done easily with the ```@register_callback``` decorator. An example is shown in the train.py file:
+
+```python
+    @register_callback
+    def perun2mlflow(node):
+        mlflow.start_run(active_run.info.run_id)
+        for metricType, metric in node.metrics.items():
+            name = f"{metricType.value}"
+            mlflow.log_metric(name, metric.value)
+```
+
+Functions decorated by ```@register_callback``` takes only one argument, ```node```. The node object is an instance of ```perun.data_model.data.DataNode```, which is a tree structure that contains all the data collected while monitoring the current script. Each node contains the accumulated data of the sub-nodes in the ```metrics``` dictionary. Each metric object contains all the metadata relevant to the value and the value itself. In the example above, the summarized values for power, energy and hardware utilization are being submitted as metrics to the mlflow tracking system.
+
+For more information on the data node object, [check our docs](https://perun.readthedocs.io/en/latest/data.html)
diff --git a/perun/processing.py b/perun/processing.py
@@ -399,7 +399,7 @@ def processDataNode(
         pue = perunConfig.getfloat("post-processing", "pue")
         emissions_factor = perunConfig.getfloat("post-processing", "emissions_factor")
         price_factor = perunConfig.getfloat("post-processing", "price_factor")
-        total_energy = dataNode.metrics[MetricType.ENERGY].value * pue
+        total_energy = dataNode.metrics[MetricType.ENERGY].value * pue  # type: ignore
         dataNode.metrics[MetricType.ENERGY].value = total_energy  # type: ignore
         e_kWh = total_energy / (3600 * 1e3)
-Original file line number
+Diff line change
@@ Expand Up / @@ -116,7 +116,7 @@ mpirun -n 8 perun monitor path/to/your/script.py @@
     ## Docs
-    To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/).
+    To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/) or check the [examples](https://github.com/Helmholtz-AI-Energy/perun/tree/main/examples).
     ## Citing perun
@@ Expand Down @@