Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spans and flame graphs #4631

Open
emilk opened this issue Jan 2, 2024 · 4 comments
Open

Spans and flame graphs #4631

emilk opened this issue Jan 2, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@emilk
Copy link
Member

emilk commented Jan 2, 2024

Currently all our data is associated with a single instance in time - they are events.

There is however many things that require data to span a time range, such as audio and video.

Another useful thing to use spans for is for flame graphs, which is a way to visualize a call graph:

image

Such a flame graph is useful for profiling, but also for observability, i.e. understanding how a piece is connected.

Implementation

And easy way to implement this is to use a special enum Span { Begin, End } component.

We then have a special time range query that is aware of spans (i.e. querying for data in the time range [#10, #20] must include time ranges that spans [#0, #30]).

Together with a special Flame Graph Space View we have a pretty good start.

Threads and processes

For multi-threaded or multi-processed data we must have one flame-graph per thread:

puffin_egui

We should also be able to record relationships between threads. For instance, we want to be able to see that thread A is blocked waiting on thread B and C (see also EmbarkStudios/puffin#174).

This also means log events should come with a ProcessId and ThreadId component.

API

For Python, a with scope makes sense, as does a function decoration:

@rr.span
def my_function(images):
    for image in images:
        with rr.span(f"image {image.name}"):
            process(image)

…with optional recording argument

In Rust and C++ we would need to use macros, similar to e.g. puffin and loguru.

See also

@emilk emilk added enhancement New feature or request 👀 needs triage This issue needs to be triaged by the Rerun team labels Jan 2, 2024
@nikolausWest
Copy link
Member

It probably makes a lot of sense to both take inspiration and make sure we're ultimately compatible with OpenTelemetry. In this case worth looking at the tracing package: https://opentelemetry-python.readthedocs.io/en/latest/api/trace.html

@teh-cmc
Copy link
Member

teh-cmc commented Jan 3, 2024

Implementation

And easy way to implement this is to use a special enum Span { Begin, End } component.

We then have a special time range query that is aware of spans (i.e. querying for data in the time range [#10, #20] must include time ranges that spans [#0, #30]).

Together with a special Flame Graph Space View we have a pretty good start.

I'm not sure I understand why we need to introduce a new/special component for this?

Since the timepoint we have to day is effectively a start timepoint, an alternative implementation I had in mind was to introduce a second, optional timepoint for every log event, which specifies the end timepoint of the event (which therefore becomes a span rather than an event at this point).
If the end timepoint isn't specified, then we only look at the start timepoint and consider the event to be instantaneous, as we do today. Otherwise it's a span.

This allows to have spans that cover different time units quite naturally (e.g. "this event spanned 278ms wall-clock time (log_time), 90 simulation ticks (sim_tick) and was instantaneous on the the frame timeline (frame_nr)").
Then I don't think we need to change anything query-wise? Haven't thought about it enough to be sure though.

@emilk
Copy link
Member Author

emilk commented Jan 3, 2024

That is another way of implementing it for sure, but it is quite useful to be able to distinguish an event from a span, and it is also useful to be able to express half-open spans (spans with just a start or just an end).

I envision a flame-graph like view where log events (e.g. text and images) are shown as single point inside the span that contains them.

@nikolausWest
Copy link
Member

Maybe we should separate these concepts as:

  • Event: data + a time point
  • Duration event: data + start and end time
  • Span: an operation (unit of work) + a start and end time.
    • A hierarchical set of spans make up a trace
      • A trace can be visualized as a flame graph
    • Multiple events can be produced within a span

@nikolausWest nikolausWest removed the 👀 needs triage This issue needs to be triaged by the Rerun team label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants