Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the metric "NTP server reachable" #85

Merged
merged 3 commits into from
May 17, 2024
Merged

Conversation

Rafouf69
Copy link
Contributor

Add metric which indicates whether the NTP server has been successfully targeted by the NTP exporter.

Add the command-line option high-drift.

closes #82

@Rafouf69 Rafouf69 changed the title Add the mtric NTP server reachable Add the metric "NTP server reachable" Feb 23, 2024
@Rafouf69
Copy link
Contributor Author

Hi @majewsky @SuperSandro2000, would that be possible to review the produced code and give me some feedbacks if necessary.

Thanks and sorry for the disturbance!

Copy link
Member

@SuperSandro2000 SuperSandro2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you cleanup the command line history? Otherwise I would fallback to squash merging

main.go Outdated Show resolved Hide resolved
collector.go Outdated Show resolved Hide resolved
@Rafouf69 Rafouf69 force-pushed the master branch 2 times, most recently from 7a47bb4 to a836c14 Compare February 27, 2024 13:27
collector.go Outdated
for time.Since(begin) < c.NtpMeasurementDuration {
nextMeasurement, err := c.getClockOffsetAndStratum()
if err != nil {
c.serverReachable.WithLabelValues(c.NtpServer).Set(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c.serverReachable.WithLabelValues(c.NtpServer).Set(1)
c.serverReachable.WithLabelValues(c.NtpServer).Set(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put 1 there since it was able to reach the NTP server at least one time to enter inside of this loop.
During my test, the NTP server was closing our connection since I was making too much request while looping.

What do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The finding is actually a pretty good reason to have 0 here which means you either need to make the NTP server more forgiving or reduce the measurement duration.

Also we really can't have this in this branch because we could end up here when the ntp server is not reachable at all

return measurement, fmt.Errorf("couldn't get NTP measurement: %w", err)

We could move this after the err != nil check but then the metric could be flappy and we definitely would want to expose this in the metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your advice into consideration

@Rafouf69
Copy link
Contributor Author

Rafouf69 commented Mar 8, 2024

It would really be nice if you could do a second review gentlemen 🙌 Thanks!

@SuperSandro2000
Copy link
Member

Sorry, we are currently a bit time constraint, so it answers can take a few days.

Transform the high drift from a constant to a parameter
Complete the README

add metric

Add metric

add metric to the collector.
Send it everytime the server was reache successfully at least once

correct matric name

correct metric name

make the highDrift constant a parameter

add the possibility to configure the highDrift threshold in http mode
Add doc

delete log statement + correction after linting

Change from 1 to 0 for the metric value

After discussing with the SAPCC team, it seems better to have a 0 there. We should either make our NTP server more forgiving or reduce the measurement duration if the NTP server close our connection.

Run go-makefile-maker

Renovate: Update github.com/sapcc/go-bits digest to f061229

Run go-makefile-maker

Renovate: Update github.com/sapcc/go-bits digest to 364c083
@Rafouf69
Copy link
Contributor Author

No worries! Thanks for your time :)

SuperSandro2000 pushed a commit that referenced this pull request Apr 10, 2024
@SuperSandro2000
Copy link
Member

I've cleaned up the history and pushed that for review into https://github.com/sapcc/ntp_exporter/tree/ntp-reachable

@Rafouf69
Copy link
Contributor Author

I've cleaned up the history and pushed that for review into https://github.com/sapcc/ntp_exporter/tree/ntp-reachable

Do not hesitate to ping me if you need additional help!

begin := time.Now()
measurement, err := c.getClockOffsetAndStratum()

if err != nil {
c.serverReachable.WithLabelValues(c.NtpServer).Set(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we Set(0) at the top of measure() once?

for time.Since(begin) < c.NtpMeasurementDuration {
nextMeasurement, err := c.getClockOffsetAndStratum()
if err != nil {
c.serverReachable.WithLabelValues(c.NtpServer).Set(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we Set(0) at the top of measure() once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure since there could be a disconnection during the time period depicted by NtpMeasurementDuration . However, we could perhaps set the value in the error condition of the function getClockOffsetAndStratum.

@Rafouf69 Rafouf69 requested a review from Nuckal777 May 15, 2024 09:46
@Nuckal777 Nuckal777 merged commit 40d8992 into sapcc:master May 17, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric when a NTP server is unreachable
4 participants