diff --git a/CITATION.cff b/CITATION.cff index 13427dd..1cb0f07 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -10,7 +10,7 @@ authors: family-names: Gutiérrez Hermosillo Muriedas email: juan.muriedas@kit.edu affiliation: >- - Scientific Computing Centre, Karlsruhe Institute für + Scientific Computing Center, Karlsruhe Institute für Technologie orcid: 'https://orcid.org/0000-0001-8439-7145' repository-code: 'https://github.com/Helmholtz-AI-Energy/perun' diff --git a/README.md b/README.md index fed2e57..a1f0e27 100644 --- a/README.md +++ b/README.md @@ -116,7 +116,7 @@ mpirun -n 8 perun monitor path/to/your/script.py ## Docs -To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/). +To get more information, check out our [docs page](https://perun.readthedocs.io/en/latest/) or check the [examples](https://github.com/Helmholtz-AI-Energy/perun/tree/main/examples). ## Citing perun diff --git a/docs/data.rst b/docs/data.rst index fc5bbb8..84200fe 100644 --- a/docs/data.rst +++ b/docs/data.rst @@ -3,8 +3,12 @@ Data ==== -perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual API that perun uses to gather measurements, and a single API can provide information about multiple devices, and multiple devices can be monitored from multiple APIs. +perun structures the data collected as a tree, with the root node containing the aggregated data of an indiviudal run of the application, and the nodes further down the tree contain the information of the indivdual compute nodes, devices and *sensors*. *Sensors* are meant as the individual values that can be collected from the distinct monitoring backends. .. image:: images/data_structure.png Each node in the data structure, once the raw data at the bottom has been processed, contain a set of summarized metrics based on the data that was collected by its sub-nodes, and a metadata dictionary with any information that could be obtained by the application, node, device or API. + +Each node contains a list of metrics or stats, which represent the accumulated data. As well as metadata. + +The nodeType attribute indiciates the type of object in the hierarchy this nodes represents. At the lowest level, the leafs of the tree, you would have individual "sensors", values collected by a single device or interface. Higher up the tree, the data nodes represent groups of devices and computational nodes. The three bottom levels of the tree represent the hardware. Further up the three, data starts being acumulated by individual runs of the application, with "run" being a single execution of the application, a "multi_run" is the data from multiple runs when perun is run with the ```--rounds N``` option, and at the highest level, the root of the tree is the application itself. diff --git a/docs/images/data_structure.png b/docs/images/data_structure.png index caf0494..a3f58f2 100644 Binary files a/docs/images/data_structure.png and b/docs/images/data_structure.png differ diff --git a/examples/mlflow/README.md b/examples/mlflow/README.md new file mode 100644 index 0000000..4ccf977 --- /dev/null +++ b/examples/mlflow/README.md @@ -0,0 +1,16 @@ +# perun + MLFLow + +If you are already using monitoring tools like MLFlow, you might want to add the data collected by perun to enhance the already existing data. This can be done easily with the ```@register_callback``` decorator. An example is shown in the train.py file: + +```python + @register_callback + def perun2mlflow(node): + mlflow.start_run(active_run.info.run_id) + for metricType, metric in node.metrics.items(): + name = f"{metricType.value}" + mlflow.log_metric(name, metric.value) +``` + +Functions decorated by ```@register_callback``` takes only one argument, ```node```. The node object is an instance of ```perun.data_model.data.DataNode```, which is a tree structure that contains all the data collected while monitoring the current script. Each node contains the accumulated data of the sub-nodes in the ```metrics``` dictionary. Each metric object contains all the metadata relevant to the value and the value itself. In the example above, the summarized values for power, energy and hardware utilization are being submitted as metrics to the mlflow tracking system. + +For more information on the data node object, [check our docs](https://perun.readthedocs.io/en/latest/data.html) diff --git a/perun/processing.py b/perun/processing.py index ba866c0..1580fb0 100644 --- a/perun/processing.py +++ b/perun/processing.py @@ -399,7 +399,7 @@ def processDataNode( pue = perunConfig.getfloat("post-processing", "pue") emissions_factor = perunConfig.getfloat("post-processing", "emissions_factor") price_factor = perunConfig.getfloat("post-processing", "price_factor") - total_energy = dataNode.metrics[MetricType.ENERGY].value * pue + total_energy = dataNode.metrics[MetricType.ENERGY].value * pue # type: ignore dataNode.metrics[MetricType.ENERGY].value = total_energy # type: ignore e_kWh = total_energy / (3600 * 1e3)