-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SDTEST-409] Send telemetry events in background thread to remove blocking HTTP calls from critical path #3718
[SDTEST-409] Send telemetry events in background thread to remove blocking HTTP calls from critical path #3718
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3718 +/- ##
========================================
Coverage 98.10% 98.10%
========================================
Files 1225 1227 +2
Lines 72866 73014 +148
Branches 3493 3508 +15
========================================
+ Hits 71485 71632 +147
- Misses 1381 1382 +1 ☔ View full report in Codecov by Sentry. |
end | ||
) | ||
end | ||
|
||
# Should only be called the first time components are built |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to remove this logic because Telemetry::Worker
now handles that app-started
is sent once. Note that before it was not guaranteed that app-started event was sent: if agent URL was not configured via environment variable, telemetry would silently fail to start. This is handled correctly now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love removing "special behavior" like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a really solid re-work of telemetry! Nice work!
Only have a minor nit (non-blocking), and a question about the behavior of "only once successful" (regarding retries). Otherwise I'm good with this.
end | ||
) | ||
end | ||
|
||
# Should only be called the first time components are built |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love removing "special behavior" like this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this. Nice work!
Nice! Here's my original reproducer for the blocking calls issue: require 'socket'
require 'datadog'
server = TCPServer.new(8126) # fake agent which never replies
puts Datadog::Tracing.enabled? Here's what I get in master:
and here's what I get with this PR:
The printing of |
@ivoanjo thanks for the reproducer! I have a suspicion that workers shutdown might not work properly if worker is blocked by IO. I will play around with shorter HTTP timeout values and reproducer you provided (I think 30 seconds timeout for telemetry is too long anyway). |
@ivoanjo this was a good catch, the timeout was indeed too long for telemetry use case, I lowered HTTP timeout to 2 seconds, the result:
|
…re stopping worker
…pendency collection event logic, move it to the Worker
24a0409
to
5729921
Compare
What does this PR do?
Moves all telemetry events sending to the background thread. Event "app-started" is now being sent when the
Telemetry::Worker
(only one successful event dispatch per application run), no additional logic to determine when to calltelemetry.started!
is required.Notable changes include:
Telemetry::Client
is renamed toTelemetry::Component
to better reflect its purpose: it is no longer dispatches HTTP requests, but provides a public API to enqueue them.Telemetry::Heartbeat
is renamed toTelemetry::Worker
. Now it includesCore::Workers::Queue
as well as buffer with limited capacity.
OnlyOnce
util:OnlyOnceSuccessful
. It shares most of its functionality withOnlyOnce
with important difference: it resetsran_once
var if the block returns falsey value such asnil
orfalse
. This ensures thatapp-started
event is sent once, successfully (before that, if agent was misconfigured on application start, telemetry would fail to start correctly)Core::Configuration
no longer includes custom logic to determine whether to calltelemetry.started!
: now it is handled internally byTelemetry::Worker
, there is no need for Core to handle specific component's lifecycle.Motivation:
Currently
Telemetry::Client
class performs HTTP requests to the agent synchronously which can cause delays to the instrumented application.How to test the change?
Tested using empty rails app, here are the debug logs: