Skip to content

Commit

Permalink
docs: Tracing doc
Browse files Browse the repository at this point in the history
Add `tracing.md` to document how to use the instrumentation to trace the
Firecracker process.

Signed-off-by: Jonathan Woollett-Light <[email protected]>
  • Loading branch information
Jonathan Woollett-Light committed Nov 1, 2023
1 parent bf9a414 commit 78d5655
Showing 1 changed file with 264 additions and 0 deletions.
264 changes: 264 additions & 0 deletions docs/tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
# Tracing

## Introduction

Firecracker implements a framework for instrumentation based tracing.

Instrumentation based tracing as defined by
[Sheng Liang on usenix.org](https://www.usenix.org/legacy/publications/library/proceedings/coots99/full_papers/liang/liang_html/node9.html):

> There are two ways to obtain profiling information: either statistical
> sampling or code instrumentation. Statistical sampling is less
> disruptive to program execution, but cannot provide completely
> accurate information. Code instrumentation, on the other hand, may be
> more disruptive, but allows the profiler to record all the events it
> is interested in. Specifically in CPU time profiling, statistical
> sampling may reveal, for example, the relative percentage of time
> spent in frequently-called methods, whereas code instrumentation can
> report the exact number of time each method is invoked.
The focus with tracing in Firecracker is to improve debug-ability.

Enabling tracing adds logs output on each functions entry and exit.
This assists debugging problems that relate to deadlocks or high
latencies by quickly identifying elongated function calls.

## Implementation

Firecracker implements instrumentation based tracing via
[`log`](https://github.com/rust-lang/log) and
[`log_instrument`](https://github.com/JonathanWoollett-Light/log-instrument),
outputting a `Trace` level log when entering and exiting every function.

It is disabled by default at compile-time. Tracing functionality has no
impact on the release binary.

To enable tracing in Firecracker, add instrumentation with:

```
clippy-tracing \
--action fix \
--path ./src \
--exclude vmm_config/logger.rs,virtio/gen,bindings.rs,net/gen,benches,logger/,signal_handler.rs,time.rs
```

and re-compile with `--features tracing`:

```
cargo build --features tracing
```

This will result in an increase in the binary size (~100kb) and a
significant regression in performance (>10x). To mitigate the
performance impact you can filter the tracing output as described in the
next section.

## Filtering

You can filter tracing output both at run-time and compile-time.
This can be used to mitigate the performance impact of logging many
traces.

Run-time filtering is implemented with the `/logger` API call, this can
significantly mitigate the impact on execution time but cannot mitigate
the impact on memory usage. Execution time impact is mitigated by
avoiding constructing and writing the trace log, it still needs to check
the condition in every place it would output a log. Memory usage impact
is not mitigated as the instrumentation remains in the binary unchanged.

Compile-time filtering is a manual process using the
[`clippy-tracing`](https://github.com/JonathanWoollett-Light/clippy-tracing)
tool. This can almost entirely mitigate the impact on execution time and
the impact on memory usage.

### Run-time

You can filter by module path and/or file path at runtime, e.g.:

```bash
curl -X PUT --unix-socket "${API_SOCKET}" \
--data "{
\"level\": \"Trace\",
\"module\": \"api_server::request\",
}" \
"http://localhost/logger"
```

Instrumentation logs are `Trace` level logs, at runtime the level must
be set to `Trace` to see them. The module filter applied here ensures
only logs from the `request` modules within the `api_server` crate will
be output.

This will mitigate most of the performance regression.

### Compile-time

Specific environments can restrict run-time configuration. In these
environments it becomes necessary to support targeted tracing without
run-time re-configuration, for this compile-time filtering must be used.

To reproduce the same filtering as run-time at compile-time, you can use
[`clippy-tracing`](../src/clippy-tracing)
at compile-time like:

```bash
# Remove all instrumentation.
clippy-tracing --action strip --path ./src
# Adds instrumentation to the specific file/s.
clippy-tracing --action fix --path ./src/api_server/src/request
# Build Firecracker.
cargo build --features tracing
```

Then at run-time:

```bash
curl -X PUT --unix-socket "${API_SOCKET}" \
--data "{
\"level\": \"Trace\",
}" \
"http://localhost/logger"
```

The instrumentation has been stripped from all files other than those at
`./src/api_server/src/request` so we do not need to apply a run-time
filter. Runtime filtering could be applied but in this case it yields no
additional benefit.

## Example

In this example we start Firecracker with tracing then make a simple API
call.

### API call

```
~/Projects/firecracker$ sudo curl -X GET --unix-socket "/run/firecracker.socket" "http://localhost/"
{"id":"anonymous-instance","state":"Not started","vmm_version":"1.6.0-dev","app_name":"Firecracker"}
```

### Firecracker

```
~/Projects/firecracker$ sudo ./firecracker/build/cargo_target/release/firecracker --level Trace
2023-10-13T14:15:38.851263983 [anonymous-instance:main] Running Firecracker v1.6.0-dev
2023-10-13T14:15:38.851316122 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851322264 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851325119 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851328776 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851331351 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present
2023-10-13T14:15:38.851335809 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of
2023-10-13T14:15:38.851338254 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of
2023-10-13T14:15:38.851342091 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present
2023-10-13T14:15:38.851345638 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851349245 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851352721 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851355827 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851359444 [anonymous-instance:main] ThreadId(1)::main::main_exec>>from_args
2023-10-13T14:15:38.851362931 [anonymous-instance:main] ThreadId(1)::main::main_exec<<from_args
2023-10-13T14:15:38.851366207 [anonymous-instance:main] ThreadId(1)::main::main_exec>>get_filters
2023-10-13T14:15:38.851368401 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters>>get_default_filters
2023-10-13T14:15:38.851372068 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters>>deserialize_binary
2023-10-13T14:15:38.851380033 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters<<deserialize_binary
2023-10-13T14:15:38.851383990 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters>>filter_thread_categories
2023-10-13T14:15:38.851388098 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters::get_default_filters<<filter_thread_categories
2023-10-13T14:15:38.851391845 [anonymous-instance:main] ThreadId(1)::main::main_exec::get_filters<<get_default_filters
2023-10-13T14:15:38.851394360 [anonymous-instance:main] ThreadId(1)::main::main_exec<<get_filters
2023-10-13T14:15:38.851398077 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851400462 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851403507 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851410961 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851414107 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851417955 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851420650 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851426130 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851428434 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present
2023-10-13T14:15:38.851430949 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of
2023-10-13T14:15:38.851434766 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of
2023-10-13T14:15:38.851438133 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present
2023-10-13T14:15:38.851440577 [anonymous-instance:main] ThreadId(1)::main::main_exec>>flag_present
2023-10-13T14:15:38.851444575 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present>>value_of
2023-10-13T14:15:38.851447941 [anonymous-instance:main] ThreadId(1)::main::main_exec::flag_present<<value_of
2023-10-13T14:15:38.851450005 [anonymous-instance:main] ThreadId(1)::main::main_exec<<flag_present
2023-10-13T14:15:38.851453772 [anonymous-instance:main] ThreadId(1)::main::main_exec>>arguments
2023-10-13T14:15:38.851456488 [anonymous-instance:main] ThreadId(1)::main::main_exec<<arguments
2023-10-13T14:15:38.851458902 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851462679 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851466587 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851470324 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>as_single_value
2023-10-13T14:15:38.851473239 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<as_single_value
2023-10-13T14:15:38.851476896 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851479521 [anonymous-instance:main] ThreadId(1)::main::main_exec>>arguments
2023-10-13T14:15:38.851485062 [anonymous-instance:main] ThreadId(1)::main::main_exec<<arguments
2023-10-13T14:15:38.851488398 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851491925 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851494900 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851496934 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851499689 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851502374 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851504629 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851507234 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>as_single_value
2023-10-13T14:15:38.851508897 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<as_single_value
2023-10-13T14:15:38.851510630 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851513576 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851515559 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851517503 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851520068 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851521731 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851525628 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851529045 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851533153 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851536339 [anonymous-instance:main] ThreadId(1)::main::main_exec>>single_value
2023-10-13T14:15:38.851538883 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value>>value_of
2023-10-13T14:15:38.851542330 [anonymous-instance:main] ThreadId(1)::main::main_exec::single_value<<value_of
2023-10-13T14:15:38.851544704 [anonymous-instance:main] ThreadId(1)::main::main_exec<<single_value
2023-10-13T14:15:38.851548572 [anonymous-instance:main] ThreadId(1)::main::main_exec>>run_with_api
2023-10-13T14:15:38.851664621 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>new
2023-10-13T14:15:38.851672586 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api<<new
2023-10-13T14:15:38.851677876 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>init
2023-10-13T14:15:38.851684739 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api<<init
2023-10-13T14:15:38.851724064 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api>>build_microvm_from_requests
2023-10-13T14:15:38.851728171 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>default
2023-10-13T14:15:38.851731888 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<default
2023-10-13T14:15:38.851734634 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>new
2023-10-13T14:15:38.851737830 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<new
2023-10-13T14:15:38.851748550 [anonymous-instance:fc_api] ThreadId(2)>>new
2023-10-13T14:15:38.851761404 [anonymous-instance:fc_api] ThreadId(2)<<new
2023-10-13T14:15:38.851764861 [anonymous-instance:fc_api] ThreadId(2)>>run
2023-10-13T14:15:38.851775200 [anonymous-instance:fc_api] ThreadId(2)::run>>apply_filter
2023-10-13T14:15:38.851823462 [anonymous-instance:fc_api] ThreadId(2)::run<<apply_filter
2023-10-13T14:15:55.422397039 [anonymous-instance:fc_api] ThreadId(2)::run>>handle_request
2023-10-13T14:15:55.422417909 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>try_from
2023-10-13T14:15:55.422420554 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>describe
2023-10-13T14:15:55.422424551 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<describe
2023-10-13T14:15:55.422426395 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>log_received_api_request
2023-10-13T14:15:55.422429270 [anonymous-instance:fc_api] The API server received a Get request on "/".
2023-10-13T14:15:55.422431354 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<log_received_api_request
2023-10-13T14:15:55.422433298 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from>>parse_get_instance_info
2023-10-13T14:15:55.422435211 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info>>new_sync
2023-10-13T14:15:55.422437165 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info::new_sync>>new
2023-10-13T14:15:55.422439289 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info::new_sync<<new
2023-10-13T14:15:55.422441123 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from::parse_get_instance_info<<new_sync
2023-10-13T14:15:55.422444459 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::try_from<<parse_get_instance_info
2023-10-13T14:15:55.422446833 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<try_from
2023-10-13T14:15:55.422448837 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>into_parts
2023-10-13T14:15:55.422450921 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<into_parts
2023-10-13T14:15:55.422453967 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>serve_vmm_action_request
2023-10-13T14:15:55.422472552 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests>>handle_preboot_request
2023-10-13T14:15:55.422480477 [anonymous-instance:main] ThreadId(1)::main::main_exec::run_with_api::build_microvm_from_requests<<handle_preboot_request
2023-10-13T14:15:55.422488963 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request>>convert_to_response
2023-10-13T14:15:55.422492289 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response>>success_response_with_data
2023-10-13T14:15:55.422493983 [anonymous-instance:fc_api] The request was executed successfully. Status code: 200 OK.
2023-10-13T14:15:55.422498331 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data>>serialize
2023-10-13T14:15:55.422501387 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data::serialize>>fmt
2023-10-13T14:15:55.422506086 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data::serialize<<fmt
2023-10-13T14:15:55.422509171 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response::success_response_with_data<<serialize
2023-10-13T14:15:55.422511776 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request::convert_to_response<<success_response_with_data
2023-10-13T14:15:55.422514371 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request::serve_vmm_action_request<<convert_to_response
2023-10-13T14:15:55.422516385 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<serve_vmm_action_request
2023-10-13T14:15:55.422518719 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request>>take_deprecation_message
2023-10-13T14:15:55.422520533 [anonymous-instance:fc_api] ThreadId(2)::run::handle_request<<take_deprecation_message
2023-10-13T14:15:55.422522847 [anonymous-instance:fc_api] ThreadId(2)::run<<handle_request
2023-10-13T14:15:55.422525422 [anonymous-instance:fc_api] Total previous API call duration: 132 us.
```

0 comments on commit 78d5655

Please sign in to comment.