- git > 2.22
- kubectl
- yq, jq
- power meter is available
- Fork and clone this repository and move to profile folder
git clone cd model_training chmod +x script.sh
- port 9090 and 5101 not being used (will be used in port-forward for prometheus and kind registry respectively)
Run
./script.sh prepare_cluster
The script will
- create a kind cluster
kind-for-training
with registry at port5101
. - deploy Prometheus.
- deploy Prometheus RBAC and node port to
30090
port on kind node which will be forwarded to9090
port on the host. - deploy service monitor for kepler and reload to Prometheus server
Please confirm the following requirements:
- Kepler installation
- Prometheus installation
- Kepler metrics are exported to Promtheus server
- Prometheus server is available at
http://localhost:9090
. Otherwise, set environmentPROM_SERVER
.
There are two options to run the benchmark and collect the metrics, CPE-operator with manual script and Tekton Pipeline.
The adoption of the CPE operator is slated for deprecation. We are on transitioning to the automation of collection and training processes through the Tekton pipeline. Nevertheless, the CPE operator might still be considered for usage in customized benchmarks requiring performance values per sub-workload within the benchmark suite.
In addition to the above two automation approach, you can manually run your own benchmarks, then collect, train, and export the models by the entrypoint cmd/main.py
Run
./script.sh cleanup