How to achieve consistent sampling across linked traces? #2918

kalyanaj · 2022-11-03T22:36:24Z

Filing this issue per our discussion in the Sampling SIG today.

What are you trying to achieve?
OpenTelemetry supports Span Links that can be used to model asynchronous scenarios or batched operations (fan-out/fan-in). I am looking to achieve some level of consistent (head-based) sampling of all the linked traces. If the sampling decision happens at an individual trace level, customers cannot understand the whole story of what happened to a request.

Example of links usage: One use-case is in a producer - consumer scenario where a producer span (say Trace T1 / Span S1) enqueues a job to a queue; let's say such jobs are processed by a consuming service asynchronously. Since the lifetimes of the producer and consumer are different, the consuming operation is modelled as a separate trace (T2 / S2) that links to T1 / S1 using span-links. If there's a way to do consistent sampling across links, then if T1 was sampled then T2 also should be sampled.

What did you expect to see?
Guidance / samples / out-of-the-box sampler to help achieve the above. For example, something like:

if you are using parent-based sampling & want to get consistent sampling across links, this is what you need to do.
if you are using consistent-probability sampling & want to get consistent sampling across links, this is what you need to do.

Additional context.

One way the above scenario could be achieved is with a custom sampler that checks if any of the linked spans (of the span for which the sampling decision is being made) is sampled, and if so decide to sample this as well. This can work when the source of this link is the root span of a new trace.
On the other hand, if the source of the link is not the root span, it may need to consider its parent's decision or its links' decision to arrive at its decision. Yes, it will be a partial trace, but having a partial trace here might be better than no trace.
Need to understand the implications for the adjusted count etc.
There would be other trade-offs to consider as well: e.g., if a span is sampled because one of its 20 links is sampled, this span could have a higher probability of always being sampled (since its probability of being sampled = prob(link1 being sampled) + P(link2 being sampled) + ... + P(link20 being sampled)) so need to consider if additional probabilistic measures are needed for the link sampling (credit: @pyohannes).

carlosalberto · 2022-11-04T12:33:11Z

cc @jmacd

cijothomas · 2022-11-09T03:26:08Z

One way the above scenario could be achieved is with a custom sampler that checks if any of the linked spans (of the span for which the sampling decision is being made) is sampled, and if so decide to sample this as well

https://github.com/open-telemetry/opentelemetry-dotnet/pull/1851/files It was in .NET originally, but was removed as it was not something spec covered at that time.

jmacd · 2022-11-14T20:22:31Z

@kalyanaj Thank you for posing these questions.

I would like to separate questions about span Links being created after the start of a span into a separate topic which may interest @pyohannes, below.

About `ShouldSample()` and Span Links

Note that the present OTel-specified mechanism for probability sampling uses tracestate and that OTLP Span Links include the tracestate of the linked context. This means each context independently encodes its adjusted count.

We can describe a non-probability Sampler that decides to sample if any of links are sampled. The new span's trace state will have no r-value or p-value, but the linked-to contexts each may define a r-value/p-value and taken from the perspective of any of those contexts, the new span may be considered representative variously depending on the p-value of the linked-to context.

We can also give the new span an independent probability of being sampled on its own (root or non-root) using the consistent scheme already specified.

We can combine the non-probability and the probability samplers as specified in the composition rules, which states, particularly, that if the Span would not be sampled probabilistically but is recorded for any other reason, it should use p-value 63 which signifies zero adjusted count.

I understand these statements do not quite answer your question, but it is by design. If you want to inherit a probability sampling decision from a parent context, then you may continue as its child, otherwise new contexts require new probability sampling decisions. In a scenario where you sample each link at 1/2 and you have 10 links when starting a new span, the probability of all of links being sampled is 2^-10, for example.

About Span Links outside of `Start()`

The current OTel trace API allows span links to be given only when a span starts. This, we believe, is a rule so that sampling decisions can be made based on all the links and we suspect this has to do with establishing trace "completeness".

The probability sampling specification makes a recommendation to use non-descending probabilities from root to leaf in a trace, to ensure completeness, because of asymmetry. We know when there is a missing parent but not when there is a missing child, so we recommend children not to use a lower sampling rate than their parents, so that (because of sampling "consistency") traces are either complete or recognizably incomplete.

We have a similar situation with span links -- we know when the span link for a sampled span was unsampled ("missing"), but if the linked-to context is sampled and the new span is not sampled, we have an analogous problem -- there is nothing to inform the sampled span that it is missing a link from the unsampled new span. In this scenario, where span links are used, we have no way to ensure completeness.

The fact that we cannot ensure completeness is by design, but the fact that we cannot recognize incomplete traces is a defect. Moreover, we have this defect with or without the ability to add links after a span starts, because we have no way to inform a linked-to context that a link was established.

The problem, I believe, is that we are treating span links as having a single direction. If we had a field to represent the direction of the link, then when a new span starts it records links directed TO a number of other spans while each context linked-to would have a new link directed FROM the new span. In this case, if either side is sampled we will be able to detect a link to the other possibly-sampled context. Having a span link direction field would also allow us to support span link creation after span start, because when a linkage is potentially recorded due to sampling on either side, we will be able to at least establish that an unrecorded connection exists.

pyohannes · 2022-11-15T17:46:36Z

If we had a field to represent the direction of the link, then when a new span starts it records links directed TO a number of other spans while each context linked-to would have a new link directed FROM the new span. In this case, if either side is sampled we will be able to detect a link to the other possibly-sampled context.

If there is a producer publishing a message to a topic, it cannot know how many consumers are subscribed to the topic and are processing the message. In case some of the consumer traces aren't sampled, I don't see how directions on links would help.

lmolkova · 2022-11-16T20:57:59Z

I agree with @jmacd here - assuming we deal with a relatively high number of links, the question is what approach would maximize the number of complete groups of traces, but it'd be impossible to achieve full consistency.

This perspective also helps with links after start discussion. It's up to sampler to maximize consistency, but since it's impossible to achieve it anyway, we should allow adding links after start (with direction or without it)

jmacd · 2022-11-17T20:01:14Z

@pyohannes I apologize for the confusion--The idea didn't fully address the problem, as I realized from a discussion we had about this issue in today's Sampling SIG.

I was trying to establish that Sampling as we know it, where new spans make a sampling decision somehow dependent on their parent context and the span contexts they are linked with at creation, is the reason why we do not support creating Span links after span start. The idea is that because a Sampler has access to the sampled flag of its parent context and other preceding (linked-to) contexts, then we have these capabilities:

We can ensure completeness by making the right Sampler decision
We can verify completeness when reviewing Span data.

The reason we prohibit creating span links after creation is because it breaks one or both of these. What we have is a situation where a link between spans must be recorded by the later-in-time span; the only way we have to control recording a span is in the sampling decision, therefore span links must be present at the time of sampling.

The creation of a span link after span start breaks the two requirements as follows. If the linked-to context is sampled, then the only way to make it complete is to record the linked-from span. If the linked-from span is already not being recorded because the sampling decision has passed, it becomes impossible to record the link. We have unverifiable incompleteness because the linked-to span has no awareness of the linked-from span, which was not recorded. The problem scenario, to be concrete, is a call to add a span link when the linked-to context is sampled and the linked-from span is a no-op span. We have nowhere to record the link.

The OTel Sampler API returns currently one of four states, described here: https://opentelemetry.io/docs/reference/specification/trace/sdk/#recording-sampled-reaction-table. To address both @kalyanaj's original question and support span link after creation, we need a new Span reaction that is a "conditionally recorded" span. A conditionally recorded span is one that is not itself sampled and is being held in memory, recording events and potential after-creation span links. When an after-creation span link occurs linking to a sampled span context, the conditionally recorded span would change states, entering a new state "exported-unsampled" where the span is passed to the exporter despite being unsampled. (If the span was also being probability sampled, the exported-unsampled spans MUST be assigned zero adjusted count.)

Then, to configure a Sampler that would ensure consistent, complete spans including their span links:

If the prevailing Sampler (root or parent-based) decisions to sample, sample as usual.
Otherwise, ShouldSample would return either conditionally-recorded or unsampled-exported to allow for recording the span to complete other contexts. Conditionally-recorded meaning that none of the at-creation-time span links were sampled, but potentially future span links will trigger export. Sampled-exported decisions meaning that at least one of the at-creation-time span links was already sampled.
The probability sampling composition rules explain how to combine 1 and 2.

I hope this sketch is more complete! I didn't actually add a direction attribute to Links, I just require them to be recorded when either side is sampled, for completeness. The need for a new "exported-unsampled" Sampler decision is required even without support for adding span links after creation (to @kalyanaj's point). The need for a new "conditionally-recorded" Sampler decision would be required to support span links after creation (to @pyohannes's feature request).

yurishkuro · 2022-11-18T05:17:23Z

@jmacd btw, some Jaeger SDKs utilized a state similar to "conditionally-recorded" (we called it deferred sampling), to support sampling based on span attributes that become available only after span start. It's a bit of a kludge, because the state only makes sense until a child span is created, at which point the sampling decision needs to be finalized.

I am, however, not convinced that sampling considerations are the deciding factor for allowing adding links post creation. The exact same arguments could be made for disallowing span attributes after span creation, yet we allow that. Just because sampling questions become more difficult with post-creation links, it does not negate the fact that there are use cases that can benefit from late links, especially in scenarios that sample everything (e.g. CI or other devexp workflows).

pyohannes · 2022-11-22T00:31:15Z

The idea is that because a Sampler has access to the sampled flag of its parent context and other preceding (linked-to) contexts, then we have these capabilities:

We can ensure completeness by making the right Sampler decision

We can verify completeness when reviewing Span data.

This is true. However, I think ensuring completeness across linked traces makes you lose another crucial capability: effectively enforcing a fixed sampling rate.

If you make a sampling decision based on links to two upstream spans, both upstream spans sampled with a probability of 10%, you're sampling the span with the probability that at least one of the two upstream spans was sampled. This probability is higher than 10%, and, the more links you have, it approaches 100%.

In cases where there is heavy batching and where there are several layers of links, the actual sampling volume could end up being much higher than what one might expect based on the probability decision at the root.

While this is not to be seen as an argument for adding spans link after span creation, I think it illustrates that probably not all capabilities we intend to provide can be fully utilized at the same time, but there might be trade-offs based on usage scenarios.

jmacd · 2022-11-28T16:48:01Z

I think it illustrates that probably not all capabilities we intend to provide can be fully utilized at the same time, but there might be trade-offs based on usage scenarios.

I agree. We can't avoid the fundamentals of sampling.

What we can do is provide new Sampler implementations that give users a choice. If users would like to record a span that is linked-to by others, they should be able to do so without causing entire other traces to be collected. If that capability will co-exist with what we have today, it means two new Sampler decision codes as I outlined above, one to say "maybe record this span, depending" and one to say "record an untraced span".

jmacd · 2022-11-28T17:27:00Z

@yurishkuro About "deferred sampling" thanks for explaining. Comparing the two span states that I described with the one from Jaeger, the "deferred sampling" state of Jaeger is similar to but different than the one I called "conditionally recorded", because you could remain in a conditionally recorded state after the first child up until span end because, at any moment, a new span link could appear and cause the span to become "unsampled exported".

Using the Jaeger term "deferred" instead of "conditionally" would give us a complete list of span states:

sampled (implies recorded and exported, an existing spec)
recorded, deferred sampling (implies no children yet, can still enter state 1, 3)
recorded, deferred exporting (implies not sampled, can still enter state 4, 5, or 6)
recorded, exported (implies the span will export when it ends, a new state to support late span links)
recorded, not exported (implies no desire to export the unsampled span, an existing feature to support e.g., z-pages, an existing spec)
not recorded (an existing spec)

So, it looks like three new states if you combine Jaeger's deferred sampling decision with the deferred exporting decision requested to support span links after start.

For us to adopt this kind of support in OpenTelemetry will require prototypes, in case anyone is wondering what are the next steps. Interested parties should look at #2179, too.

kalyanaj added the spec:trace Related to the specification/trace directory label Nov 3, 2022

github-actions bot assigned yurishkuro Nov 3, 2022

pyohannes mentioned this issue Nov 15, 2022

Semantic convention for span link names open-telemetry/semantic-conventions#1057

Open

jmacd mentioned this issue Nov 17, 2022

Attribute to indicate whether a Span is used as an Exemplar #2922

Closed

lmolkova mentioned this issue Nov 23, 2022

Please (re)-allow recording links after Span creation time #454

Closed

jmacd mentioned this issue Nov 28, 2022

Let SDKs export all the spans regardless of their sampled flag #2986

Open

kentquirk mentioned this issue Jan 23, 2023

feat: OTLP bidirectional span links honeycombio/husky#167

Closed

jmacd mentioned this issue Jan 31, 2023

Introduce means of producing metrics from all Spans regardless of sampling decision. #3145

Open

pyohannes mentioned this issue Feb 7, 2023

Allow adding links after span creation #3186

Closed

jmacd mentioned this issue Mar 2, 2023

Example / proof of concept to achieve a combination of head-based sampling + a basic form of tail-based sampling at a span level. open-telemetry/opentelemetry-dotnet#4206

Merged

3 tasks

kalyanaj mentioned this issue Mar 29, 2023

Provide an example for how sampling based on the context of the activity links could work in OTel.NET open-telemetry/opentelemetry-dotnet#4345

Closed

jmacd mentioned this issue Aug 10, 2023

Add W3C-specified trace flags to v1 Span proto open-telemetry/opentelemetry-proto#503

Merged

jmacd mentioned this issue Sep 12, 2023

Add a new AddLink() operation to Span. #3678

Merged

carlosalberto mentioned this issue Sep 20, 2023

Identify Links added after Span creation #3698

Open

jmacd mentioned this issue Sep 20, 2023

Unclear whether Span.flags represents span's parent or own flags open-telemetry/opentelemetry-proto#507

Open

This was referenced Dec 6, 2023

Add possibility to set parent span context after span creation #3781

Closed

Clarification on how processors handle sampling decisions. #3779

Open

jmacd mentioned this issue Mar 11, 2024

Add OpenTelemetry sampling conventions open-telemetry/semantic-conventions#793

Closed

3 tasks

jmacd mentioned this issue Mar 28, 2024

Context propagation to client instrumentation #3825

Closed

5 tasks

jmacd mentioned this issue Apr 19, 2024

Give samplers the span ID #3991

Open

jmacd self-assigned this Apr 24, 2024

isobelormiston mentioned this issue Jun 14, 2024

Tail sampling processor: add a way to sample all spans that have a span link to a sampled span. open-telemetry/opentelemetry-collector-contrib#33568

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to achieve consistent sampling across linked traces? #2918

How to achieve consistent sampling across linked traces? #2918

kalyanaj commented Nov 3, 2022 •

edited

Loading

carlosalberto commented Nov 4, 2022

cijothomas commented Nov 9, 2022

jmacd commented Nov 14, 2022

pyohannes commented Nov 15, 2022

lmolkova commented Nov 16, 2022

jmacd commented Nov 17, 2022 •

edited

Loading

yurishkuro commented Nov 18, 2022

pyohannes commented Nov 22, 2022

jmacd commented Nov 28, 2022

jmacd commented Nov 28, 2022 •

edited

Loading

How to achieve consistent sampling across linked traces? #2918

How to achieve consistent sampling across linked traces? #2918

Comments

kalyanaj commented Nov 3, 2022 • edited Loading

carlosalberto commented Nov 4, 2022

cijothomas commented Nov 9, 2022

jmacd commented Nov 14, 2022

About ShouldSample() and Span Links

About Span Links outside of Start()

pyohannes commented Nov 15, 2022

lmolkova commented Nov 16, 2022

jmacd commented Nov 17, 2022 • edited Loading

yurishkuro commented Nov 18, 2022

pyohannes commented Nov 22, 2022

jmacd commented Nov 28, 2022

jmacd commented Nov 28, 2022 • edited Loading

kalyanaj commented Nov 3, 2022 •

edited

Loading

About `ShouldSample()` and Span Links

About Span Links outside of `Start()`

jmacd commented Nov 17, 2022 •

edited

Loading

jmacd commented Nov 28, 2022 •

edited

Loading