-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the metric "NTP server reachable" #85
Conversation
Hi @majewsky @SuperSandro2000, would that be possible to review the produced code and give me some feedbacks if necessary. Thanks and sorry for the disturbance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you cleanup the command line history? Otherwise I would fallback to squash merging
7a47bb4
to
a836c14
Compare
collector.go
Outdated
for time.Since(begin) < c.NtpMeasurementDuration { | ||
nextMeasurement, err := c.getClockOffsetAndStratum() | ||
if err != nil { | ||
c.serverReachable.WithLabelValues(c.NtpServer).Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.serverReachable.WithLabelValues(c.NtpServer).Set(1) | |
c.serverReachable.WithLabelValues(c.NtpServer).Set(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put 1
there since it was able to reach the NTP server at least one time to enter inside of this loop.
During my test, the NTP server was closing our connection since I was making too much request while looping.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The finding is actually a pretty good reason to have 0 here which means you either need to make the NTP server more forgiving or reduce the measurement duration.
Also we really can't have this in this branch because we could end up here when the ntp server is not reachable at all
Line 235 in bc06f85
return measurement, fmt.Errorf("couldn't get NTP measurement: %w", err) |
We could move this after the err != nil check but then the metric could be flappy and we definitely would want to expose this in the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took your advice into consideration
It would really be nice if you could do a second review gentlemen 🙌 Thanks! |
Sorry, we are currently a bit time constraint, so it answers can take a few days. |
Transform the high drift from a constant to a parameter Complete the README add metric Add metric add metric to the collector. Send it everytime the server was reache successfully at least once correct matric name correct metric name make the highDrift constant a parameter add the possibility to configure the highDrift threshold in http mode Add doc delete log statement + correction after linting Change from 1 to 0 for the metric value After discussing with the SAPCC team, it seems better to have a 0 there. We should either make our NTP server more forgiving or reduce the measurement duration if the NTP server close our connection. Run go-makefile-maker Renovate: Update github.com/sapcc/go-bits digest to f061229 Run go-makefile-maker Renovate: Update github.com/sapcc/go-bits digest to 364c083
No worries! Thanks for your time :) |
… from a constant to a parameter Closes #85
I've cleaned up the history and pushed that for review into https://github.com/sapcc/ntp_exporter/tree/ntp-reachable |
Do not hesitate to ping me if you need additional help! |
begin := time.Now() | ||
measurement, err := c.getClockOffsetAndStratum() | ||
|
||
if err != nil { | ||
c.serverReachable.WithLabelValues(c.NtpServer).Set(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we Set(0)
at the top of measure()
once?
for time.Since(begin) < c.NtpMeasurementDuration { | ||
nextMeasurement, err := c.getClockOffsetAndStratum() | ||
if err != nil { | ||
c.serverReachable.WithLabelValues(c.NtpServer).Set(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we Set(0) at the top of measure() once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure since there could be a disconnection during the time period depicted by NtpMeasurementDuration
. However, we could perhaps set the value in the error condition of the function getClockOffsetAndStratum
.
Add metric which indicates whether the NTP server has been successfully targeted by the NTP exporter.
Add the command-line option high-drift.
closes #82