Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small improvements and fixes #50

Merged
merged 8 commits into from
Sep 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion contributions-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ While not everyone's native language, it is the language of the internet and the

These docs are meant for the community as a whole and we do not want to promote certain projects over others. A fine line has to be drawn between mentioning projects to inform readers of their existence and promoting one over the other.

The primary goal of the project is to document eBPF to such an extent that developers do not have to go the the eBPF kernel sources to find out how to use eBPF. This will inevitably include mentioning the APIs of loader projects and eBPF kernel side libraries, and showing examples. Having that said, we do not want to document these projects.
The primary goal of the project is to document eBPF to such an extent that developers do not have to go the eBPF kernel sources to find out how to use eBPF. This will inevitably include mentioning the APIs of loader projects and eBPF kernel side libraries, and showing examples. Having that said, we do not want to document these projects.

The exception to this rule are tools and libraries which originate in the Linux kernel or are maintained by kernel developers alongside the kernel such as: `libbpf`, `libxdp`, `bpftool` and `iproute2`.

3 changes: 3 additions & 0 deletions docs/ebpf-library/libbpf.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Libbpf kernel side library

!!! example "Docs could be improved"
This part of the docs is incomplete, contributions are very welcome

<!-- TODO abstract -->

## Functions
Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/concurrency.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ This scheme also increases the complexity on the userspace side since more data

## Map RCU

In niche use-cases it might be possible to get away with the the helper functions built-in RCU logic. This method work by never modifying the map value directly via the pointer you get via the `bpf_map_lookup_elem` helper. But instead copying the map value to the BPF stack, modifying its value there, then calling `bpf_map_update_elem` on the modified copy. The helper functions will guarantee that we transition cleanly from the initial state to the updated state. This property might be important if there exists a relation between fields in the map value. This technique map result in missing updates if multiple updates happen at the same time, but values will never be "mixed".
In niche use-cases it might be possible to get away with the helper functions built-in RCU logic. This method work by never modifying the map value directly via the pointer you get via the `bpf_map_lookup_elem` helper. But instead copying the map value to the BPF stack, modifying its value there, then calling `bpf_map_update_elem` on the modified copy. The helper functions will guarantee that we transition cleanly from the initial state to the updated state. This property might be important if there exists a relation between fields in the map value. This technique map result in missing updates if multiple updates happen at the same time, but values will never be "mixed".

Performance wise there is a trade off. This technique does perform additional memory copies, but is also does not block or synchronize. So this may or may not be faster than spin-locking depending on the size of the values.

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/loops.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ for (int i = 0; i < ip->tot_len; i++) {
}
```

The since `ip->tot_len` is a 16 bit integer, the verifier will check the body for every possible value of `i` up to 65535. Depending on the instructions and branches in the body, you will run out of complexity very quickly. Most of the time scanning the first X bytes of a body is enough, so you can limit the loop to that:
Since `ip->tot_len` is a 16 bit integer, the verifier will check the body for every possible value of `i` up to 65535. Depending on the instructions and branches in the body, you will run out of complexity very quickly. Most of the time scanning the first X bytes of a body is enough, so you can limit the loop to that:

```c
void *data = ctx->data;
Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/pinning.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ If your Linux distribution does not automatically mount the BPF file system you

A process can get a file descriptor to a BPF object by calling the [`BPF_OBJ_GET`](../syscall/BPF_OBJ_GET.md) syscall command, passing it a valid path to a pin.

Pins are usually used as an easy method of sharing or transferring a BPF object between processes or applications. Command line tools which have short running processes before existing can for example use them to perform actions on object over multiple invocation. Long running daemons can use pins to ensure resources do not go away while restarting. And tools like `iproute2`/`tc` can load a program on behalf of a user and then another program can modify the maps afterwards.
Pins are usually used as an easy method of sharing or transferring a BPF object between processes or applications. Command line tools which have short running processes before exiting can for example use them to perform actions on object over multiple invocation. Long running daemons can use pins to ensure resources do not go away while restarting. And tools like `iproute2`/`tc` can load a program on behalf of a user and then another program can modify the maps afterwards.

Pins can be removed by using the `rm` cli tool or `unlink` syscall. Pins are ephemeral and do not persist over restarts of the system.

Expand Down
4 changes: 2 additions & 2 deletions docs/linux/concepts/resource-limit.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ description: "This page explains the concept of resource limits in eBPF. It expl
---
# Resource limits

The Linux kernel has protection mechanisms that prevent processes from taking up to much memory. Since BPF maps can take up a lot of space, they are also limited via these mechanisms.
The Linux kernel has protection mechanisms that prevent processes from taking up too much memory. Since BPF maps can take up a lot of space, they are also limited via these mechanisms.

## Rlimit

rlimit or "resource limit" is a system to track and limit the amount of certain resources you are allowed to use. One of the things it limits is the amount of "locked memory" https://man7.org/linux/man-pages/man2/getrlimit.2.html

Until kernel version v5.11 this mechanism was used to track and limit the memory usage of BGP maps which count towards the locked memory limit, so you commonly would have to increase or disable this rlimit which requires an additional capability `CAP_SYS_RESOURCE`.
Until kernel version v5.11 this mechanism was used to track and limit the memory usage of BPF maps which count towards the locked memory limit, so you commonly would have to increase or disable this rlimit which requires an additional capability `CAP_SYS_RESOURCE`.

## cGroup memory limit

Expand Down
4 changes: 2 additions & 2 deletions docs/linux/concepts/tail-calls.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "This page explains the concept of tail calls in eBPF. It explains
---
# Tail calls

A tail call is a form mechanism that allows eBPF authors to break up their logic into multiple parts and go from one to the other. Unlike traditional function calls, control flow never returns to the code making a tail call, it works more like a `goto` statement.
A tail call is a mechanism that allows eBPF authors to break up their logic into multiple parts and go from one to the other. Unlike traditional function calls, control flow never returns to the code making a tail call, it works more like a `goto` statement.

To use tail calls, an author would add a [`BPF_MAP_TYPE_PROG_ARRAY`](../map-type/BPF_MAP_TYPE_PROG_ARRAY.md) map to their program. The map can be filled with references to other programs (given a few conditions). And the program can then use the [`bpf_tail_call`](../helper-function/bpf_tail_call.md) helper call with a reference to the map and an index to perform the actual tail call.

Expand All @@ -16,7 +16,7 @@ Another use case is for replacing or extending logic. By replacing the contents

To prevent infinite loops or very long running programs, the kernel limits the amount of tail calls per initial invocation to `32` so `33` programs can execute in total before the tail call helper will refuse to jump anymore.

If a program array is associated with a program, any program added to the map should "match" the program. So they have to have the same `type`, `expected_attach_type`, `attached_btf`, etc.
If a program array is associated with a program, any program added to the map should "match" the program. So they have to have the same [`prog_type`](../syscall/BPF_PROG_LOAD.md#prog_type), [`expected_attach_type`](../syscall/BPF_PROG_LOAD.md#expected_attach_type), [`attached_btf_id`](../syscall/BPF_PROG_LOAD.md#attached_btf_id), etc.

While the same stack frame is shared, the verifier will block you from using any existing stack state without re-initializing it, the same goes for the registers. Thus, there is no straightforward way to shared state. Common workarounds for this issue are to use opaque fields in metadata such as [`__sk_buff->cb`](../program-context/__sk_buff.md#cb) or [`xdp_md->data_meta`](../program-type/BPF_PROG_TYPE_XDP.md#data_meta) memory. Alternatively, a per-CPU map with a single entry can be used to share data, which works since eBPF programs never migrate to a different CPU even between tail calls. However on RT (real time) kernels eBPF programs might be interrupted and re-started at a later time, so these maps should only be shared between tail calls on the same task, not globally.

Expand Down
3 changes: 1 addition & 2 deletions docs/linux/concepts/timers.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,7 @@ These three helper calls do not necessarily have to happen in the same program a
* `prog2` calls `bpf_timer_set_callback` for some `map1` elements.
* Those that were not `bpf_timer_init`-ed will return `-EINVAL`.
* `prog3` calls `bpf_timer_start` for some `map1` elements.
* Those that were not both `bpf_timer_init`-ed and
* `bpf_timer_set_callback`-ed will return `-EINVAL`.
* Those that were not both `bpf_timer_init`-ed and `bpf_timer_set_callback`-ed will return `-EINVAL`.


[`bpf_timer_init`](../helper-function/bpf_timer_init.md) and [`bpf_timer_set_callback`](../helper-function/bpf_timer_set_callback.md) will return `-EPERM` if map doesn't have user references (is not held by open file descriptor from user space and not pinned in bpffs).
Expand Down
2 changes: 1 addition & 1 deletion docs/linux/concepts/verifier.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The verifier also keeps track of data types, before I mentioned the pointer to a

It uses this same type info tracking to assert that the correct parameters are passed to helper functions or function calls. The verifier can also use BTF to enforce that a map value contains a timer field for example or a spinlock. BTF is also used to enforce that the correct parameters are passed to KFuncs, that BTF function definitions match the actual BPF functions and that these BTF function definitions match callbacks.

The verifier will attempt to asses all queued states and branches. But to protect itself it has limits. It tracks the amount of instructions inspected, this is for any permutation, so the complexity of a program not only depends on the amount of instructions, but also on the amount of branches. The verifier only has a limited amount of storage for states, so infinite recursion doesn't consume to much memory.
The verifier will attempt to assess all queued states and branches. But to protect itself it has limits. It tracks the amount of instructions inspected, this is for any permutation, so the complexity of a program not only depends on the amount of instructions, but also on the amount of branches. The verifier only has a limited amount of storage for states, so infinite recursion doesn't consume too much memory.

!!! note
Until [:octicons-tag-24: v5.2](https://github.com/torvalds/linux/commit/c04c0d2b968ac45d6ef020316808ef6c82325a82) there was a hard 4k instruction limit and a 128k complexity limit. Afterwards both are 1 million.
Expand Down
2 changes: 1 addition & 1 deletion docs/linux/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Program like functions also have return values, the meaning of which is again de

eBPF program are typically written in C and compiled with LLVM, but this isn't necessarily the only way to do it. Any program which can generate byte-code (following the eBPF instruction set) can author eBPF programs. eBPF programs are typically serialized into a relocatable ELF file.

Ultimately eBPF programs are loaded into the kernel using the [BPF syscall](./syscall/index.md), the userspace program that does this is refereed to as a loader. In practice loaders range from applications that just load the eBPF program to complex systems that constantly interacts with multiple programs and maps to provide advanced features. Loaders often use [loader libraries](./../ebpf-library/index.md) to provide higher-level APIs than the syscall to ease development.
Ultimately eBPF programs are loaded into the kernel using the [BPF syscall](./syscall/index.md), the userspace program that does this is referred to as a loader. In practice loaders range from applications that just load the eBPF program to complex systems that constantly interacts with multiple programs and maps to provide advanced features. Loaders often use [loader libraries](./../ebpf-library/index.md) to provide higher-level APIs than the syscall to ease development.

When the loader loads a program the kernel will verify that the program is "safe". This job is done by a component of the kernel called the verifier. "safe" in this context means that programs are not allowed to crash the kernel or break critical components. eBPF programs have to pass quite a number of stringent requirements before being allowed anywhere near kernel memory. For more details checkout the verifier page.

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/program-type/BPF_PROG_TYPE_SK_REUSEPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Before [:octicons-tag-24: v5.14](https://github.com/torvalds/linux/commit/d5e4dd

This situation can happen when various server management tools restart server (such as nginx) processes. For instance, when we change nginx configurations and restart it, it spins up new workers that respect the new configuration and closes all listeners on the old workers, resulting in in-flight <nospell>ACK of 3WHS is responded by RST</nospell>.

To fix this defect, the concept of socket migration was added, which will repeat the socket selection logic to pick a new socket. When not using eBPF, the same hash logic is used, but only if the `net.ipv4.tcp_migrate_req` sysctl setting has been enabled. When using eBPF with this program type, loading the program with the `BPF_SK_REUSEPORT_SELECT_OR_MIGRATE` attachment type indicates that this program also overwrites the migration logic. No need to set the sysctl option in this case. This does mean that the the program can be called for initial selection as well as for migration. The `sk` and `sk_migration` context fields indicate for which purpose the program is invoked.
To fix this defect, the concept of socket migration was added, which will repeat the socket selection logic to pick a new socket. When not using eBPF, the same hash logic is used, but only if the `net.ipv4.tcp_migrate_req` sysctl setting has been enabled. When using eBPF with this program type, loading the program with the `BPF_SK_REUSEPORT_SELECT_OR_MIGRATE` attachment type indicates that this program also overwrites the migration logic. No need to set the sysctl option in this case. This does mean that the program can be called for initial selection as well as for migration. The `sk` and `sk_migration` context fields indicate for which purpose the program is invoked.

When invoked for migration, the following actions can be taken:

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/syscall/BPF_OBJ_GET.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This command will return a file descriptor to the pinned BTF object on success (

A common use case for opening such a pin is to transfer a reference to a BPF object from one process to another. The [`BPF_OBJ_PIN`](BPF_OBJ_PIN.md) syscall command can be used to pin a BPF object to the BPF file system so another process can get a reference to it with this syscall command.

Please the the [pinning concept page](../concepts/pinning.md) for more details.
Please the [pinning concept page](../concepts/pinning.md) for more details.

## Attributes

Expand Down
2 changes: 1 addition & 1 deletion docs/linux/syscall/BPF_OBJ_PIN.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ This command will return `0` on success or a error number (negative integer) if

A common use case for creating such a pin is to transfer a reference to a BPF object from one process to another. The [`BPF_OBJ_GET`](BPF_OBJ_GET.md) syscall command can be used to get a file descriptor from pins created with this command.

Please the the [pinning concept page](../concepts/pinning.md) for more details.
Please the [pinning concept page](../concepts/pinning.md) for more details.

## Attributes

Expand Down
8 changes: 4 additions & 4 deletions docs/linux/syscall/BPF_PROG_LOAD.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ Loading BTF for your program is optional, but highly recommended since a ever gr

[:octicons-tag-24: v5.0](https://github.com/torvalds/linux/commit/838e96904ff3fc6c30e5ebbc611474669856e3c0)

This attribute specifies the size of the records in `func_info`, this allows for compatibility between newer and older loaders and kernel versions if the size of the the function info records ever changes.
This attribute specifies the size of the records in `func_info`, this allows for compatibility between newer and older loaders and kernel versions if the size of the function info records ever changes.

### `func_info`

Expand All @@ -160,7 +160,7 @@ This attribute specifies the amount of function records that are present in `fun

[:octicons-tag-24: v5.0](https://github.com/torvalds/linux/commit/c454a46b5efd8eff8880e88ece2976e60a26bf35)

This attribute specifies the size of the records in `line_info`, this allows for compatibility between newer and older loaders and kernel versions if the size of the the line info records ever changes.
This attribute specifies the size of the records in `line_info`, this allows for compatibility between newer and older loaders and kernel versions if the size of the line info records ever changes.

### `line_info`

Expand All @@ -185,7 +185,7 @@ This attribute specifies the amount of function records that are present in `lin

[:octicons-tag-24: v5.5](https://github.com/torvalds/linux/commit/ccfe29eb29c2edcea6552072ef00ff4117f53e83)

This attribute specifies the [BTF](../../concepts/btf.md) type ID of kernel types the current program wishes to attach to. This ID refers the the ID within the `vmlinux` object, not the BTF object specified by `prog_btf_fd`. This attribute can have different meaning depending on the program type.
This attribute specifies the [BTF](../../concepts/btf.md) type ID of kernel types the current program wishes to attach to. This ID refers the ID within the `vmlinux` object, not the BTF object specified by `prog_btf_fd`. This attribute can have different meaning depending on the program type.

* For `BPF_PROG_TYPE_STRUCT_OPS` this attribute is the ID of the ops struct of which the user wants to replace a function pointer with an eBPF program.
* For `BPF_PROG_TYPE_LSM` this attribute specifies the LSM hook point where we intend to attach it to.
Expand All @@ -208,7 +208,7 @@ This attribute specifies the file descriptor of a BTF object which the kernel sh

[:octicons-tag-24: v5.17](https://github.com/torvalds/linux/commit/fbd94c7afcf99c9f3b1ba1168657ecc428eb2c8d)

This attribute specifies the size of the records in `core_relos`, this allows for compatibility between newer and older loaders and kernel versions if the size of the the CO-RE relocation records ever changes.
This attribute specifies the size of the records in `core_relos`, this allows for compatibility between newer and older loaders and kernel versions if the size of the CO-RE relocation records ever changes.

### `fd_array`

Expand Down
2 changes: 1 addition & 1 deletion tools/helper-def-scraper/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ const libBpfGhHelperDefsURL = "https://raw.githubusercontent.com/libbpf/libbpf/m

var (
filePath = flag.String("file-path", "", "If set, use a file path instead of fetching from the interwebs")
helperFuncPath = flag.String("helper-path", "", "The path the the helper function pages")
helperFuncPath = flag.String("helper-path", "", "The path the helper function pages")

helperRegex = regexp.MustCompile(`static [^\(]+ \*?\(\* const ([^\)]+)\)[^\n]+;`)
)
Expand Down
Loading