-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zeroed EndsAt timestamp #3805
Fix zeroed EndsAt timestamp #3805
Conversation
var ( | ||
alerts = ag.alerts.List() | ||
alertsSlice = make(types.AlertSlice, 0, len(alerts)) | ||
now = time.Now() | ||
) | ||
for _, alert := range alerts { | ||
a := *alert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we can remove this, but will do so in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the intent correctly this is passing the reference time through the context and every notifier should use the .*At(now time.Time)
variant. As we don't (plan) to modify the alert we don't need to take a deep copy (some notifiers call types.Alerts).
Dropping this copy might be hard. On line 528 (529) the UpdatedAt time is compared between the copy and the storage of the aggregation group. Without a copy these numbers will always be the same.
I am wondering if there are ways to make sure that integrations never call the Resolved()
, Status()
? Given that alertmanager is likely to see an increase in number of integrations that appears desirable to me to ease writing and reviewing these.
Given that the model.Alert is copied and one needs the reference time. What is your thought on creating something like a model.AlertsSnapshot that holds an immutable snapshot of the alert and the status/reference time? This would ease writing and reviewing the notifiers, protect the core from accidental modificiations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping this copy might be hard. On line 528 (529) the UpdatedAt time is compared between the copy and the storage of the aggregation group. Without a copy these numbers will always be the same.
Oh well spotted!
What is your thought on creating something like a model.AlertsSnapshot that holds an immutable snapshot of the alert and the status/reference time?
This is what I tried at first, but I found it to be quite an intrusive change (#3351 (comment)). I haven't ruled it out though - do you think that would be a better way to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think longterm we are better off with a genuine snapshot. It eases writing integrations, it eases code review of them. I think some of the type changes we can probably automate with gofmt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
type AlertSnapshot struct {
Alert
// status of the alert at the time it was snapshot.
status model.AlertStatus
}
func (a *AlertSnapshot) Resolved() bool {
return a.status == model.AlertResolved
}
func (a *AlertSnapshot) Status() model.AlertStatus {
return a.status
}
// Need to override String() as otherwise it will call Resolved()
// of model.Alert not AlertSnapshot.
func (a *AlertSnapshot) String() string {
s := fmt.Sprintf("%s[%s]", a.Name(), a.Fingerprint().String()[:7])
if a.Resolved() {
return s + "[resolved]"
}
return s + "[active]"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but from my point of view the trade-off is whether we want all of these fields to be mutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh – I think I misunderstood. By mutable/immutable are you talking about the fields being package-public (capitalized) or package-private (starting with lowercase)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Starting with lower case. I wanted to see how an implementation of this looks like and got the below to compile (and pass most tests but it requires some clean ups).
For me the biggest benefit is that it creates an immutable snapshot and simplifies API. The notification handlers are unable to modify the underlying Alert and the API removes the *At(time.Time)
functions and we don't have to remember using notify.Now
.
Please see:
main...zecke:alertmanager:zecke-alternative-ends-at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think you have time to open a PR for this? @gotjosh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Let me fix the formatting and remove some of the repeated code in the tests.
This commit bumps prometheus/common to v0.52.3. It has a breaking change where the metric alertmanager_build_info has been renamed to go_build_info as the metric has been moved from prometheus/common to prometheus/client_golang and the namspace argument has been removed. Signed-off-by: George Robinson <[email protected]>
Build fails as we need #3806. |
This commit fixes a bug where firing alerts had zeroed EndsAt timestamps in notifications. This bug was introduced in another PR (prometheus#1686) to fix issues where alerts are resolved during a flush. This commit takes a different approach where instead of removing the EndsAt timestamp from the alert, it uses the ResolvedAt and StatusAt methods with the timestamp in the context, not the current time. Signed-off-by: George Robinson <[email protected]>
Signed-off-by: George Robinson <[email protected]>
Signed-off-by: George Robinson <[email protected]>
var ( | ||
alerts = ag.alerts.List() | ||
alertsSlice = make(types.AlertSlice, 0, len(alerts)) | ||
now = time.Now() | ||
) | ||
for _, alert := range alerts { | ||
a := *alert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the intent correctly this is passing the reference time through the context and every notifier should use the .*At(now time.Time)
variant. As we don't (plan) to modify the alert we don't need to take a deep copy (some notifiers call types.Alerts).
Dropping this copy might be hard. On line 528 (529) the UpdatedAt time is compared between the copy and the storage of the aggregation group. Without a copy these numbers will always be the same.
I am wondering if there are ways to make sure that integrations never call the Resolved()
, Status()
? Given that alertmanager is likely to see an increase in number of integrations that appears desirable to me to ease writing and reviewing these.
Given that the model.Alert is copied and one needs the reference time. What is your thought on creating something like a model.AlertsSnapshot that holds an immutable snapshot of the alert and the status/reference time? This would ease writing and reviewing the notifiers, protect the core from accidental modificiations?
@@ -526,7 +525,7 @@ func (ag *aggrGroup) flush(notify func(...*types.Alert) bool) { | |||
level.Error(ag.logger).Log("msg", "failed to get alert", "err", err, "alert", a.String()) | |||
continue | |||
} | |||
if a.Resolved() && got.UpdatedAt == a.UpdatedAt { | |||
if a.ResolvedAt(now) && got.UpdatedAt == a.UpdatedAt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fine. If got
has any modification reflected in UpdatedAt then we do not remove the alert.
Signed-off-by: George Robinson <[email protected]>
8b530b8
to
7d19a60
Compare
I'm closing this PR in favor of #3827. |
This commit fixes a bug where firing alerts had zeroed
EndsAt
timestamps in notifications. This bug was introduced in another PR (#1686) to fix issues where alerts are resolved during a flush.This commit takes a different approach where instead of removing the
EndsAt
timestamp from the alert, it uses theResolvedAt
andStatusAt
methods with the timestamp in the context, not the current time.