From f99742476b7bd80baedaa5d3502a3e7cd85f5f02 Mon Sep 17 00:00:00 2001 From: Maryam Tahhan Date: Tue, 2 Apr 2024 16:57:09 +0100 Subject: [PATCH] docs: update troubelshooting for super linter Signed-off-by: Maryam Tahhan --- docs/usage/trouble_shooting.md | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/usage/trouble_shooting.md b/docs/usage/trouble_shooting.md index a938fe10..bc6d727d 100644 --- a/docs/usage/trouble_shooting.md +++ b/docs/usage/trouble_shooting.md @@ -1,11 +1,16 @@ # Trouble Shooting ## Kepler Pod failed to start + ### Background -Kepler uses eBPF to obtain performance counter readings and processes stats. Since eBPF requires kernel headers, Kepler will fail to start up when the kernel headers are missing. + +Kepler uses eBPF to obtain performance counter readings and processes stats. Since eBPF requires kernel +headers, Kepler will fail to start up when the kernel headers are missing. ### Diagnose -To confirm, check the Kepler Pod logs with the following command and look for message `not able to load eBPF modules`. + +To confirm, check the Kepler Pod logs with the following command and look for message +`not able to load eBPF modules`. ```bash kubectl logs -n kepler daemonset/kepler-exporter @@ -26,10 +31,15 @@ On OpenShift, install the MachineConfiguration [here](https://github.com/sustain ## Kepler energy metrics are zeroes + ### Background -Kepler uses RAPL counters on x86 platforms to read energy consumption. -VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained ML models. The models use either hardware performance counters or cGroup stats to estimate energy consumed by processes. Currently the cGroup based models use cGroup v2 features such as `cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`, `cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`. +Kepler uses RAPL counters on x86 platforms to read energy consumption. +VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained +ML models. The models use either hardware performance counters or cGroup stats to estimate energy +consumed by processes. Currently the cGroup based models use cGroup v2 features such as +`cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`, +`cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`. ### Diagnose @@ -40,7 +50,8 @@ ls /sys/fs/cgroup/cgroup.controllers ``` ### Solution + Enable cGroup v2 on the node by following [these Kubernetes instruction](https://kubernetes.io/docs/concepts/architecture/cgroups/). -On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs) \ No newline at end of file +On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)