-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow adding links after span creation #3186
Allow adding links after span creation #3186
Conversation
Adding this method to a span is going to be a breaking change to the Go API for SDK developers. It is not something I support. I have looked through the use cases, and they all seem to be the same situation: a user wants to "link" message IDs to an already existing span. I don't see this as a good enough reason to break the API, especially since a user can already wrap this batch receive operation with a child span and add the appropriate links there. Additionally, there were many other proposals to solve this use case in #454. Those alternate approach would not break the Go API. Can we consider those approaches instead? cc @open-telemetry/go-maintainers |
Is this due to a restriction in the Go programming language itself (adding any new API will become a breaking change instead of an additive change), or it is due to some design/implementation decision that OpenTelemetry Go has made (and it is hard to revert that decision)? |
I know what you mean, but that's not a good argument as it means there can be no evolution, period. There are ways to add methods to Go API without breaking the existing interfaces, e.g. by defining new ones. It's not pretty, but what can you do - the new functionality is relatively niche, so I think it's an acceptable trade-off. |
it exists in any language that supports pure interfaces. E.g. in Java <=1.6, adding a method to an interface meant existing classes were incompatible, but I think Java deferred that to runtime, whereas Go fails at compile time. After Java introduced default implementations, this became non-issue (well, less of). |
In C++, this is what folks normally do 😆 https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo-interface |
Adding new method in API will break the ABI compatibility for OpenTelemetry C++ too. But we can increment API version through inline namespaces in case of ABI breaking change, so should be good. And will vote for this feature as there have been multiple asks for it from C++ community too. |
I didn't say the API cannot be evolved, those are your words. I said this change is not motivated enough to warrant the breakage it will cause. There are alternatives that were proposed and there is an existing solution using a new span. Why do we need to add this change to accomplish what can already be achieved or can be achieved in an alternate, less destructive, manner? |
I'm not sure what exactly qualifies a change as "motivated enough". The proposed change is probably not crucial to a majority of OpenTelemetry users, but users stumble over the limitation again and again. The description above mentions comments and PRs of at least seven different users, and those are only the ones who somehow made it to #454 (via comments or links). Creating a separate (semantically meaningless) span for adding links is not an acceptable workaround in my opinion. What operation is such a span supposed to cover? What's the duration of such a span? What's the semantic meaning of this duration? Regarding "breakage": I'm not a Go expert, but as far as I understand this doesn't refer to users (no existing Go OTel users will be broken), but it means that supporting this in a way that doesn't break existing users results in challenges and headaches for OTel Go maintainers? |
Creating a span to track the partial read result of a consumer request tracks the operation of reading a partial result. The duration of the span would represent the time it took to read part of the returned request. Calling it meaningless is subjective and not universal. For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?
Would these users get what they want if documentation was updated to show how to achieve what they want by delaying a span creation? Or by creating child spans? How many users will complain about the breaking changes to Go and C++? Will they be even more frustrated when they realize there were design choice that could have been implemented that wouldn't have disrupted things? As one of those linked use cases mentions, the idea of adding a link via an event has still not been addressed.
No, currently how these changes are written the Span interface will need to be extended. This is a breaking change to any SDK author that implements the current interface definition. Adding an additional Go interface that an SDK span could implement is the proposed work around for Go (see https://github.com/open-telemetry/opentelemetry-go/blob/main/CONTRIBUTING.md#interfaces). The current changes in this PR would not be satisfied if that approach was taken in Go, our implementation would not be compliant with the |
This was considered, but it breaks context propagation. |
Gotcha 👍 |
@@ -177,14 +177,10 @@ using the OpenCensus <-> OpenTelemetry bridge. | |||
that [parent spans must be specified during span creation](../trace/api.md#span-creation). | |||
This leads to some issues with OpenCensus APIs that allowed flexible | |||
specification of parent spans post-initialization. | |||
2. Links added to spans after the spans are created. This is [not supported in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have OC compliance matrix lines, but I think this would deserve one.
cc @dashpole - For tracking that this work will need to be done in the OC shims.
How? I might be missing something here. Can you spell this out for me? |
I think the idea is the request to a messaging system is mean to be made with a context. The context should contain an active trace/span so the receiver can properly link to the trace. If you start the trace after the request returns there is no way to include it in the transmitted context. |
Talking with @pyohannes in slack it was brought up that another option that could work here, in addition to adding a link to an event, is to add a span end option that accept links. |
`Span`s can be from the same or a different trace (see | ||
[Links between spans](../overview.md#links-between-spans)). | ||
|
||
A `Link` is structurally defined by the following properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also realized this original section repeats below the same information regarding SpanContext and attributes :/ )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).
Do you think the point below (which is already part of the PR) suffices:
- An API to record a single
Link
where theLink
properties are passed as
arguments.
There's an alternative PR up, exploring adding links as events: #3240 |
I don't understand why sampling consideration are trampling the basic data modeling considerations? |
Sampling isn't the issue here, but avoiding churn for SDK developers:
The aim is to evaluate alternative approaches that were brought up, and then to decide which way to go. Or, as it came up as an argument too, to decide that the proposed capability isn't worth the cost of any of the possible approaches. |
Alternative approaches involve just as much "churn", since the ask is to add new functionality. |
Regarding non-extensible Go interfaces: Effectively we use a struct of functions instead of an interface. It's a bit more work on the implementation side but should not affect user experience if we do it right. Here's an example: type SpanImpl struct {
End func()
SetAttributes func(...Attribute)
AddEvent func(string, ...Attribute)
AddError func(err error)
SetStatus func(SpanStatus, string)
}
type Span struct {
impl SpanImpl
} full example: https://github.com/Azure/azure-sdk-for-go/blob/96d7f70e55a1f334ff6c95b5acf458ce355f45a1/sdk/azcore/tracing/tracing.go#L87-L118 // cc @carlosalberto |
Thanks for the suggestion. It was one considered but not implemented. It is a moot point at this time as the interface in question has already been release as a stable v1. |
@MrAlias my point there was to
I hope we can consider designing future Go APIs in an extendable way. |
We did consider designing the existing Go APIs in an extensible way and will continue to do so. I resent the implication that we did not. As @MrAlias stated, we considered embedding the an interface in a struct and rejected it as it is dangerous. We settled on an approach that we felt provided the most flexibility and safety. That included keeping open the option to add methods to interfaces expected to be implemented by an SDK, such as the |
from @Aneurysm9's "being questioned" link above:
I'm wondering if this is the best solution for everyone? It allows the OpenTelemetry specification to evolve, and keeps the Go community happy(?) |
We are implementing the OpenTelemetry API v1. What happens when there is an OpenTelemetry API v2 but we have already made half a dozen major version increments because the v1 API has been expanded? Are we to expect users will know that Beyond that, we're required to support all major API versions for three years from the release of the next version. If we have to bump the major version every time a method is added to the API then we're quickly going to wind up supporting many distinct API versions. And SDK versions to match. And all of the modules in -contrib. A v2 API is not a small undertaking and it should be the last resort. We should not break compatibility when there are other options available. We've spent a lot of time here asking why Go doesn't do something different and not a lot explaining why other, less invasive, options aren't viable. |
I think there are two different discussions here:
Then the question is if this change deserves a major version update for OTel Go. We have a couple of alternatives:
regardless of events discussion and over-the-wire representation, we should be able to add
|
I think that allowing links to be recorded at the end of the span is the best path forward. It does not require any compatibility breaking changes and is no worse with regard to sampling than adding a |
From the API standpoint, we don't pass attributes or events to end methods. I.e. it won't be consistent. Also, I think the time at which link was captured could be significant and (assuming we use events underneath or stream data as it arrives) we can send links over the wire without keeping them in memory as they arrive. Blocking all these possibilities and requiring callers to collect all their links and add them at the end in all languages would not be great. |
So this could be left to the SIGs so they add this operation either as a) a method of |
I think see placing links in the end of a span as a problem, but my reasoning is based on my beliefs about the under-specified trace data model. I believe we should treat span links as bi-directional information, which means that we ought to be able to configure a sampler that records links on both sides of the link. If we force users to record span links at the end, it looks like a one-directional piece of information--the information could've been used to update the linked-to span in real time, but it was deferred so the tracer doesn't know about it. The reason I prefer to see links presented to the tracer in the moment is that there is better a chance to record the To link when it happens. If an unsampled context links to a live, sampled span, it would be best to record the link to the live sampled span because it is going to export. Likewise, a "z-pages" style of reader will not be able to see links for in-progress spans. I appreciate the problem that OTel-Go presents, and I see a workaround. I still prefer the option where we model late-arriving links as span events. We can formalize how to encode span links as event attributes--for demonstration purposes let's assume that #2888 is accepted and we can encode nested dictionaries and lists of values. A span link event would be encoded using semantic conventions that include all its details, using Now, users need not call |
I assume the same approach can be used to record links (that appear as links on OTLP and on SpanData). Using two different data structures for links (links and events) is very controversial as outlined in this discussion: #3240 (comment) I wonder if we really have to decide on links/events discussion now. It seems we need a way to record links whatever they are and data structure and OTLP representation can change later independently if proven needed. |
I had been thinking about this recently, and even though feels odd, I have to say this could also satisfy the requirements overall (remember that, even if we send Let's discuss this on the next Spec call. |
I'd even propose |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes #454
The limitation caused by preventing adding links after span creation limits OpenTelemetry users in the ability to efficiently and intuitively create spans and traces in certain scenarios (see examples in the linked issues and PRs below).
Adding links after span creation allows to create more meaningful traces in those scenarios. Best-effort sampling based on links is still possible, but only based on links added at creation time (similar to attributes). This was discussed with members of the sampling SIG, and from their side there are no concerns regarding removing this restriction (also see this PR comment).
Some use cases from the related issue:
#454 (comment)
#454 (comment)
#454 (comment)
#454 (comment)
#2278 (comment)
Users who are forced to come up with workarounds:
Link
to an already-createdSpan
opentelemetry-js-api#124Changes
This moves the section about adding span links from "Span creation" to "Span operations", removing mentions of the limitation that links can only be added at creation time. The name AddLink is proposed, wording is kept consistent with the corresponding section for adding attributes.
Discussions about the need for indicating causality in link-relationships (e. g. via attributes) will be held in the messaging workgroup. If necessary, this will result in a follow-up PR.