Skip to content

Commit

Permalink
docs: update troubelshooting for super linter
Browse files Browse the repository at this point in the history
Signed-off-by: Maryam Tahhan <[email protected]>
  • Loading branch information
maryamtahhan committed Apr 5, 2024
1 parent 8f24c75 commit e9ed4a3
Showing 1 changed file with 16 additions and 5 deletions.
21 changes: 16 additions & 5 deletions docs/usage/trouble_shooting.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# Trouble Shooting

## Kepler Pod failed to start

### Background
Kepler uses eBPF to obtain performance counter readings and processes stats. Since eBPF requires kernel headers, Kepler will fail to start up when the kernel headers are missing.

Kepler uses eBPF to obtain performance counter readings and processes stats. Since eBPF requires kernel
headers, Kepler will fail to start up when the kernel headers are missing.

### Diagnose
To confirm, check the Kepler Pod logs with the following command and look for message `not able to load eBPF modules`.

To confirm, check the Kepler Pod logs with the following command and look for message
`not able to load eBPF modules`.

```bash
kubectl logs -n kepler daemonset/kepler-exporter
Expand All @@ -26,10 +31,15 @@ On OpenShift, install the MachineConfiguration [here](https://github.com/sustain

## Kepler energy metrics are zeroes

<!-- markdownlint-disable MD024 -->
### Background

Kepler uses RAPL counters on x86 platforms to read energy consumption.
VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained ML models. The models use either hardware performance counters or cGroup stats to estimate energy consumed by processes. Currently the cGroup based models use cGroup v2 features such as `cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`, `cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`.
Kepler uses RAPL counters on x86 platforms to read energy consumption.
VMs do not have RAPL counters and thus Kepler estimates energy consumption based on the pre-trained
ML models. The models use either hardware performance counters or cGroup stats to estimate energy
consumed by processes. Currently the cGroup based models use cGroup v2 features such as
`cgroupfs_cpu_usage_us`, `cgroupfs_memory_usage_bytes`, `cgroupfs_system_cpu_usage_us`,
`cgroupfs_user_cpu_usage_us`, `bytes_read`, and `bytes_writes`.

### Diagnose

Expand All @@ -40,7 +50,8 @@ ls /sys/fs/cgroup/cgroup.controllers
```

### Solution
<!-- markdownlint-enable MD024 -->

Enable cGroup v2 on the node by following [these Kubernetes instruction](https://kubernetes.io/docs/concepts/architecture/cgroups/).

On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)
On OpenShift, apply [these cGroup v2 MachineConfiguration](https://github.com/sustainable-computing-io/kepler/tree/main/manifests/config/cluster-prereqs)

0 comments on commit e9ed4a3

Please sign in to comment.