Skip to content

Commit

Permalink
Merge pull request #142 from maryamtahhan/feat-super-linter
Browse files Browse the repository at this point in the history
Feat super linter
  • Loading branch information
rootfs authored Apr 6, 2024
2 parents dada268 + 9dc9cbe commit abc7edb
Show file tree
Hide file tree
Showing 29 changed files with 594 additions and 431 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/super-linter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# Full git history is needed to get a proper list of changed files within `super-linter`
fetch-depth: 0

- name: Lint Code Base
uses: github/super-linter@v4
uses: super-linter/super-linter@v6.3.0
env:
VALIDATE_ALL_CODEBASE: false
DEFAULT_BRANCH: "main"
Expand Down
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,37 @@
# Documentation for Kepler-Doc

Follow https://sustainable-computing.io/ to see documentation
Follow [sustainable-computing.io](https://sustainable-computing.io/) to see documentation

## Install MkDocs

**Requirements:**

- Python 3.8

```bash
pip install -r requirements.txt
```

## Rendering adopters

- uses gomplate 3.11.4, either install it or use tea.xyz:
```
sh <(curl https://tea.xyz) +gomplate.ca^v3.11.4 sh
```

```sh
sh <(curl https://tea.xyz) +gomplate.ca^v3.11.4 sh
```

- template adopters via:
```
gomplate -d adopters=./data/adopters.yaml -f templates/adopters.md -o docs/project/adopters.md
```

```sh
gomplate -d adopters=./data/adopters.yaml -f templates/adopters.md -o docs/project/adopters.md
```

## Commands

* `mkdocs new [dir-name]` - Create a new project.
* `mkdocs serve` - Start the live-reloading docs server.
* `mkdocs build` - Build the documentation site.
* `mkdocs -h` - Print help message and exit.
- `mkdocs new [dir-name]` - Create a new project.
- `mkdocs serve` - Start the live-reloading docs server.
- `mkdocs build` - Build the documentation site.
- `mkdocs -h` - Print help message and exit.

## Layout

Expand All @@ -43,11 +48,12 @@ GitHub codespaces [provides a generous free tier](https://github.com/features/co
1. Click "Create codespace on main"
1. A new tab will open and your environment will be built
1. Create `virtualenv` to install `mkdocs`
```bash
virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

```bash
virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

1. Once built, type `mkdocs serve`
1. A box will appear informing you that the site is available on port `8000`. Click the link to view the site
Expand Down
10 changes: 6 additions & 4 deletions docs/design/architecture.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
# Components

## Kepler Exporter

Kepler Exporter exposes a variety of metrics about the energy consumption of Kubernetes components such as Pods and Nodes.

Monitor container power consumption with the [metrics](metrics.md) made available by the Kepler Exporter.

![](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/doc/kepler-arch.png)
![Kepler Architecture](https://raw.githubusercontent.com/sustainable-computing-io/kepler/main/doc/kepler-arch.png)

## Kepler Model Server

The main feature of `Kepler Model Server` is to return a [power estimation model](../kepler_model_server/power_estimation.md) corresponding to the request containing target granularity (node in total, node per each processor component, pod in total, pod per each processor component), available input metrics, model filters such as accuracy.

In addition, the online-trainer can be deployed as a sidecar container to the server (main container) to execute trainning pipelines and update the model on the fly when power metrics are available.
In addition, the online-trainer can be deployed as a sidecar container to the server (main container) to execute training pipelines and update the model on the fly when power metrics are available.

`Kepler Estimator` is a client module to kepler model server running as a sidecar of Kepler Exporter (main container).

This python will serve a PowerReequest from model package in Kepler Exporter as defined in estimator.go via unix domain socket `/tmp/estimator.sock`.
This python will serve a PowerRequest from model package in Kepler Exporter as defined in estimator.go via unix domain socket `/tmp/estimator.sock`.

Check us out on GitHub ➡️ [Kepler Model Server](https://github.com/sustainable-computing-io/kepler-model-server)
Check us out on GitHub ➡️ [Kepler Model Server](https://github.com/sustainable-computing-io/kepler-model-server)
90 changes: 48 additions & 42 deletions docs/design/ebpf_in_kepler.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,54 @@
# eBPF in Kepler

## Contents
- [Background](#background)
- [What is eBPF ?](#what-is-ebpf)
- [What is a kprobe?](#what-is-a-kprobe)
- [How to list all currently registered kprobes ?](#list-kprobes)
- [Hardware CPU Events Monitoring](#hardware-cpu-events-monitoring)
- [How to check if kernel supports perf_event_open?](#check-support-perf_event_open)
- [Kernel routine probed by kepler](#kernel-routine-probed-by-kepler)
- [Hardware CPU events monitored by Kepler](#hardware-cpu-events-monitored-by-kepler)
- [Calculate process (aka task) total CPU time](#calculate-total-cpu-time)
- [Calculate task CPU cycles](#calculate-total-cpu-cycle)
- [Calculate task Ref CPU cycles](#calculate-total-cpu-ref-cycle)
- [Calculate task CPU instructions](#calculate-total-cpu-instr)
- [Calculate task Cache misses](#calculate-total-cpu-cache-miss)
- [Calculate 'On CPU Average Frequency'](#calculate-on-cpu-avg-freq)
- [Process Table](#process-table)
- [References](#references)

- [Background](#background)
- [What is eBPF ?](#what-is-ebpf)
- [What is a kprobe?](#what-is-a-kprobe)
- [How to list all currently registered kprobes ?](#list-kprobes)
- [Hardware CPU Events Monitoring](#hardware-cpu-events-monitoring)
- [How to check if kernel supports perf_event_open?](#check-support-perf_event_open)
- [Kernel routine probed by kepler](#kernel-routine-probed-by-kepler)
- [Hardware CPU events monitored by Kepler](#hardware-cpu-events-monitored-by-kepler)
- [Calculate process (aka task) total CPU time](#calculate-total-cpu-time)
- [Calculate task CPU cycles](#calculate-total-cpu-cycle)
- [Calculate task Ref CPU cycles](#calculate-total-cpu-ref-cycle)
- [Calculate task CPU instructions](#calculate-total-cpu-instr)
- [Calculate task Cache misses](#calculate-total-cpu-cache-miss)
- [Calculate 'On CPU Average Frequency'](#calculate-on-cpu-avg-freq)
- [Process Table](#process-table)
- [References](#references)

## Background

<!-- markdownlint-disable MD033 -->
### What is eBPF ? <a name="what-is-ebpf"></a>

eBPF is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel without requiring to change kernel source code or load kernel modules. [1]

### What is a kprobe?
### What is a kprobe?

KProbes is a debugging mechanism for the Linux kernel which can also be used for monitoring events inside a production system. KProbes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. You can trap at almost any kernel code address, specifying a handler routine to be invoked when the breakpoint is hit. [2]

#### How to list all currently registered kprobes ? <a name="list-kprobes"></a>
```

```bash
sudo cat /sys/kernel/debug/kprobes/list
```

### Hardware CPU Events Monitoring

Performance counters are special hardware registers available on most modern CPUs. These registers count the number of certain types of hw events: such as instructions executed, cache misses suffered, or branches mis-predicted -without slowing down the kernel or applications. [4]

Using syscall `perf_event_open` [5], Linux allows to set up performance monitoring for hardware and software performance. It returns a file descriptor to read performance information.
This syscall takes `pid` and `cpuid` as parameters. Kepler uses `pid == -1` and `cpuid` as actual cpu id.
This combination of pid and cpu allows measuring all process/threads on the specified cpu.

#### How to check if kernel supports `perf_event_open`? <a name="check-support-perf_event_open"></a>

Check presence of `/proc/sys/kernel/perf_event_paranoid` to know if kernel supports `perf_event_open` and what is allowed to be measured

```
```bash
The perf_event_paranoid file can be set to restrict
access to the performance counters.

Expand All @@ -57,19 +64,22 @@ Check presence of `/proc/sys/kernel/perf_event_paranoid` to know if kernel suppo
**CAP_SYS_ADMIN** is highest level of capability, it must have some security implications

## Kernel routine probed by kepler

Kepler traps into `finish_task_switch` kernel function [3], which is responsible for cleaning up after a task switch occurs. Since the probe is `kprobe` it is called before `finish_task_switch` is called (instead of a `kretprobe` which is called after the probed function returns).

When a context switch occurs inside the kernel, the function `finish_task_switch` is called on the new task which is going to use the CPU. This function receives an argument of type `task_struct*` which contains all the information about the task which is leaving the CPU.[3]

The probe function in kepler is
```
The probe function in kepler is

```c
int kprobe__finish_task_switch(struct pt_regs *ctx, struct task_struct *prev)
```
The first argument is of type pointer to a `pt_regs` struct which refers to the structure that holds the register state of the CPU at the time of the kernel function entry. This struct contains fields that correspond to the CPU registers, such as general-purpose registers (e.g., r0, r1, etc.), stack pointer (sp), program counter (pc), and other architectural-specific registers.
The second argument is a pointer to a `task_struct` which contains the task information for the previous task, i.e. the task which is leaving the CPU.

## Hardware CPU events monitored by Kepler
Kepler opens monitoring for following hardware cpu events
| PERF Type | Perf Count Type | Description | Array name <br>(in bpf program) |
Expand All @@ -79,18 +89,17 @@ Kepler opens monitoring for following hardware cpu events
| PERF_TYPE_HARDWARE | PERF_COUNT_HW_INSTRUCTIONS | Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts. | cpu_instr_hc_reader |
| PERF_TYPE_HARDWARE | PERF_COUNT_HW_CACHE_MISSES | Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. | cache_miss_hc_reader |

Performance counters are accessed via special file descriptors. There's one file descriptor per virtual counter used. The file descriptor is associated with the corresponding array. When bcc wrapper functions are used, it reads the corresponding fd, and return values.

## Calculate process (aka task) total CPU time <a name="calculate-total-cpu-time"></a>
The ebpf program (`bpfassets/bcc/bcc.c`) maintains a mapping from a `<pid, cpuid>` pair to a timestamp. The timestamp signifies the moment `kprobe__finish_task_switch` was called for pid when this pid was to be scheduled on cpu `<cpuid>`
```
```c
// <Task PID, CPUID> => Context Switch Start time
typedef struct pid_time_t { u32 pid; u32 cpu; } pid_time_t;
BPF_HASH(pid_time, pid_time_t);
typedef struct pid_time_t { u32 pid; u32 cpu; } pid_time_t;
BPF_HASH(pid_time, pid_time_t);
// pid_time is the name of variable which if of type map
```

Expand All @@ -99,6 +108,7 @@ Within the function `get_on_cpu_time`, the difference between the current timest
This `on_cpu_time_delta` is used to accumulate the `process_run_time` metrics for the previous task.

## Calculate task CPU cycles <a name="calculate-total-cpu-cycle"></a>

For task cpu cycles, the bpf program maintains an array named `cpu_cycles`, indexed by `cpuid`. This contains values from perf array `cpu_cycles_hc_reader`, which is a perf event type array.

On each task switch:
Expand All @@ -111,20 +121,24 @@ On each task switch:
The delta thus calculated is the cpu cycles used by the process leaving the cpu

## Calculate task Ref CPU cycles <a name="calculate-total-cpu-ref-cycle"></a>

Same process as calculating CPU cycles, difference being perf array used is `cpu_ref_cycles_hc_reader` and prev value is stored in `cpu_ref_cycles`

## Calculate task CPU instructions <a name="calculate-total-cpu-instr"></a>

Same process as calculating CPU cycles, difference being perf array used is `cpu_instr_hc_reader` and prev value is stored in `cpu_instr`

## Calculate task Cache misses <a name="calculate-total-cpu-cache-miss"></a>
Same process as calculating CPU cycles, difference being perf array used is `cache_miss_hc_reader` and prev value is stored in `cache_miss`

Same process as calculating CPU cycles, difference being perf array used is `cache_miss_hc_reader` and prev value is stored in `cache_miss`

## Calculate 'On CPU Average Frequency' <a name="calculate-on-cpu-avg-freq"></a>
```
<!-- markdownlint-enable MD033 -->

```c
avg_freq = ((on_cpu_cycles_delta * CPU_REF_FREQ) / on_cpu_ref_cycles_delta) * HZ;

CPU_REF_FREQ = 2500
CPU_REF_FREQ = 2500
HZ = 1000
```

Expand Down Expand Up @@ -155,19 +169,11 @@ This hash is read by the kernel collector in `container_hc_collector.go` for met
## References

[1] [https://ebpf.io/what-is-ebpf/](https://ebpf.io/what-is-ebpf/) , [https://www.splunk.com/en_us/blog/learn/what-is-ebpf.html](https://www.splunk.com/en_us/blog/learn/what-is-ebpf.html) , [https://www.tigera.io/learn/guides/ebpf/](https://www.tigera.io/learn/guides/ebpf/)

[2] [An introduction to KProbes](https://lwn.net/Articles/132196/) , [Kernel Probes (Kprobes)](https://docs.kernel.org/trace/kprobes.html)

[3] [finish_task_switch - clean up after a task-switch](https://elixir.bootlin.com/linux/v6.4-rc7/source/kernel/sched/core.c#L5157)

[4] [Performance Counters for Linux](https://elixir.bootlin.com/linux/latest/source/tools/perf/design.txt)

[5] [perf_event_open(2) — Linux manual page](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html)





[2] [An introduction to KProbes](https://lwn.net/Articles/132196/) , [Kernel Probes (Kprobes)](https://docs.kernel.org/trace/kprobes.html)

[3] [finish_task_switch - clean up after a task-switch](https://elixir.bootlin.com/linux/v6.4-rc7/source/kernel/sched/core.c#L5157)

[4] [Performance Counters for Linux](https://elixir.bootlin.com/linux/latest/source/tools/perf/design.txt)

[5] [perf_event_open(2) — Linux manual page](https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html)
Loading

0 comments on commit abc7edb

Please sign in to comment.