Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix Telemetry Spans
Motivation
We found that sometimes our Batcher tries to cancel batches that were actually included in the net, calling the
batcherTaskCreationFailed
endpoint, which finalizes the trace and prevents the Aggregator from registering its spans in the trace.Description
batcherTaskCreationFailed
occurs.Observations
On a real
batcherTaskCreationFailed
, the Aggregator won't receive the new task, and the trace will remain unfinished. Furthermore, the trace metadata won't be removed from the Telemetry server store. Despite that, we will be able to visualize the orphans spans with a warning that their parent ID is invalid.#1477 was created to address this issue.
How To Test
Run anvil, all Aligned components with one or more operators and start telemetry:
Go to jaeger and explore the generated traces:
Change the Batcher
create_new_task_retryable
function inbatcher/aligned-batcher/src/retry/batcher_retryables.rs:165
to return an error after receiving the receipt:Then, start all components again and you should be able to see the Aggregator spans even when the Batcher sends
Batcher - Task Creation Failed
Remove the hole content of the Batcher
create_new_task_retryable
function inbatcher/aligned-batcher/src/retry/batcher_retryables.rs:105
and return an error without creating any task:You should only be able to see the Batcher spans:
Type of change
Checklist
testnet
, everything else tostaging