Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alertmanager: Update to latest main #7103

Merged
merged 13 commits into from
Jan 24, 2024
Merged

Alertmanager: Update to latest main #7103

merged 13 commits into from
Jan 24, 2024

Conversation

grobinson-grafana
Copy link
Contributor

@grobinson-grafana grobinson-grafana commented Jan 11, 2024

What this PR does

This pull request updates Alertmanager in Mimir. It deprecates Alertmanager APIv1. The version of Alertmanager vendored in this commit contains support for UTF-8, however is disabled until we initialize the compat package from within Mimir using compat.InitFromFlags. UTF-8 will be enabled at a later time.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@CLAassistant
Copy link

CLAassistant commented Jan 11, 2024

CLA assistant check
All committers have signed the CLA.

@grobinson-grafana grobinson-grafana force-pushed the grobinson/update-am branch 2 times, most recently from cb7101c to 2a9e1d7 Compare January 11, 2024 16:07
@grobinson-grafana grobinson-grafana self-assigned this Jan 12, 2024
@grobinson-grafana grobinson-grafana changed the title Update Alertmanager to 9ed52df Update Alertmanager vendor Jan 15, 2024
@grobinson-grafana grobinson-grafana force-pushed the grobinson/update-am branch 5 times, most recently from 04a8da2 to 25a22d1 Compare January 18, 2024 16:06
@grobinson-grafana
Copy link
Contributor Author

grobinson-grafana commented Jan 18, 2024

I tested this commit in dev-us-central-0 using commit 3537d35. You can see from the link that the commit from Mimir is 5a22d1c.

Here is a screenshot showing the version on startup:

Screenshot 2024-01-18 at 4 27 13 PM

Test 1: mimirtool works as expected with non UTF-8 configurations

mimirtool can load the following configuration:

route:
    receiver: default
    routes:
        - continue: false
          matchers:
            - foo=bar
            - bar=baz
          mute_time_intervals: []
          receiver: ""
          routes: []
mimirtool alertmanager load alertmanager.yml

and the configuration can be seen in Grafana:

Screenshot 2024-01-18 at 4 30 23 PM

Test 2: UTF-8 configurations are rejected

Using a modified version of mimirtool that supports UTF-8, it cannot load UTF-8 configurations as the request is rejected in the API:

route:
    receiver: default
    routes:
        - continue: false
          matchers:
            - foo=bar
            - bar=baz
            - baz="\xf0\x9f\x99\x82"
            - bar🙂=baz
          mute_time_intervals: []
          receiver: ""
          routes: []
mimirtool alertmanager load alertmanager.yml
ERRO[0000] response                                      body="error validating Alertmanager config: bad matcher format: bar🙂=baz\n" status="400 Bad Request"
mimirtool: error: POST request to https://alertmanager-dev-us-central1.grafana-dev.net/api/v1/alerts failed: server returned HTTP status: 400 Bad Request, body: "error validating Alertmanager config: bad matcher format: bar🙂=baz\n", try --help

The input baz="\xf0\x9f\x99\x82" is accepted, but bar🙂=baz is not. The reason for this is that in current versions of Alertmanager "\xf0\x9f\x99\x82" is interpreted as a string literal, and the escape sequences are not interpreted. Once UTF-8 is enabled this will change such that "\xf0\x9f\x99\x82" will be interpreted as 🙂. These examples are known and are referred to as disagreement. We are planning to log all occurrences of disagreement, and fix them if required, in each cell before enabling UTF-8.

Screenshot 2024-01-18 at 4 37 18 PM Screenshot 2024-01-18 at 4 37 30 PM

Test 3: amtool works as expected with non UTF-8 alerts and silences

amtool can create alerts and silences without UTF-8 as normal:

amtool alert add foo=bar
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
Screenshot 2024-01-18 at 4 42 33 PM
amtool silence add foo=bar -c "Silence foo=bar"
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
6c386bdf-d1ad-4c83-a23d-5e12d1bce0bb
Screenshot 2024-01-18 at 4 43 05 PM

Test 4: UTF-8 alerts and silences are rejected

A modified version of amtool that supports UTF-8 cannot create alerts or silences containing UTF-8 as these are rejected in the API:

amtool alert add foo🙂=bar
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
amtool: error: [POST /alerts][400] postAlertsBadRequest  invalid label set: invalid name "foo🙂"
amtool silence add foo🙂=bar -c "Silence foo🙂=bar"
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
amtool: error: [POST /silences][400] postSilencesBadRequest  silence invalid: invalid label matcher 0: invalid label name "foo🙂"

Test 5: UTF-8 is allowed on the right hand side

Of course, UTF-8 is allowed on the right hand side (either as the value of a label or right hand side of a matcher). There is no change here, and this is how Alertmanager works (we are adding support for UTF-8 on both sides). For example:

route:
    receiver: default
    routes:
        - continue: false
          matchers:
            - foo=🙂bar
            - bar=🙂baz
          mute_time_intervals: []
          receiver: ""
          routes: []
Screenshot 2024-01-18 at 4 47 30 PM
amtool alert add foo=🙂bar
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
Screenshot 2024-01-18 at 4 48 08 PM
amtool silence add foo=🙂bar -c "Silence foo=🙂bar"
Warning: amtool version (0.26.0) and alertmanager version (grobinson-update-am-3537d353) are different.
bc5483b2-69a6-430e-8506-b52f3fa95104
Screenshot 2024-01-18 at 4 48 41 PM

@grobinson-grafana grobinson-grafana marked this pull request as ready for review January 18, 2024 16:48
@grobinson-grafana grobinson-grafana requested review from grafanabot and a team as code owners January 18, 2024 16:48
@grobinson-grafana grobinson-grafana changed the title Update Alertmanager vendor Update Alertmanager in Mimir Jan 18, 2024
@grobinson-grafana
Copy link
Contributor Author

Test 6: APIv1 has been deprecated

Here is a test showing that APIv1 has been deprecated in grobinson-update-am-3537d353:

curl https://alertmanager-dev-us-central1.grafana-dev.net/alertmanager/api/v1/alerts
{"status":"deprecated","error":"The Alertmanager v1 API was deprecated in version 0.16.0 and is removed as of version 0.28.0 - please use the equivalent route in the v2 API"}

and here is grafana/metrics-enterprise:r273-0f9172ac:

curl https://alertmanager-dev-us-central1.grafana-dev.net/alertmanager/api/v1/alerts
{"status":"success","data":[]}

Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a CHANGELOG entry to describe the changes (including the removal of API v1), please?

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went very quickly through the changes and I don't see anything concerning from Mimir side. I let @gotjosh do the proper review. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
@grobinson-grafana
Copy link
Contributor Author

Updated Alertmanager to commit f92a08d include prometheus/alertmanager#3676 that removes some unused code.

@grobinson-grafana
Copy link
Contributor Author

@gotjosh something I thought about is whether we should also add this change in this PR instead of #6898:

diff --git a/vendor/github.com/grafana/mimir/pkg/mimir/modules.go b/vendor/github.com/grafana/mimir/pkg/mimir/modules.go
index 58a03273e..9cc23ae3c 100644
--- a/vendor/github.com/grafana/mimir/pkg/mimir/modules.go
+++ b/vendor/github.com/grafana/mimir/pkg/mimir/modules.go
@@ -26,6 +26,8 @@ import (
        "github.com/opentracing-contrib/go-stdlib/nethttp"
        "github.com/opentracing/opentracing-go"
        "github.com/pkg/errors"
+       "github.com/prometheus/alertmanager/featurecontrol"
+       "github.com/prometheus/alertmanager/matchers/compat"
        "github.com/prometheus/client_golang/prometheus"
        "github.com/prometheus/common/config"
        "github.com/prometheus/prometheus/model/labels"
@@ -881,6 +883,10 @@ func (t *Mimir) initRuler() (serv services.Service, err error) {
 }

 func (t *Mimir) initAlertManager() (serv services.Service, err error) {
+       f, err := featurecontrol.NewFlags(util_log.Logger, featurecontrol.FeatureClassicMode)
+       util_log.CheckFatal("initializing Alertmanager feature flags", err)
+       compat.InitFromFlags(util_log.Logger, compat.RegisteredMetrics, f)
+
        t.Cfg.Alertmanager.ShardingRing.Common.ListenPort = t.Cfg.Server.GRPCListenPort
        t.Cfg.Alertmanager.CheckExternalURL(t.Cfg.API.AlertmanagerHTTPPrefix, util_log.Logger)

There will be no change in behavior. The difference is that we initialize the compat package with a logger, so we will get debug level logs. The default variables use log.NewNopLogger().

Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but please see my comments.

integration/alertmanager_test.go Outdated Show resolved Hide resolved
Title: `{{ template "msteams.default.title" . }}`,
Text: `{{ template "msteams.default.text" . }}`,
Title: `{{ template "msteams.default.title" . }}`,
Summary: `{{ template "msteams.default.summary" . }}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gillesdemey one for you - the msteams integrations for the cloud alertmanager now has a new configuration option, we need to support summary.

InvalidTotal *prometheus.GaugeVec
}

func NewMetrics(r prometheus.Registerer) *Metrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these metrics being exposed as part of the alertmanager metrics neither here or in #6898, just to confirm you don't want to expose these yet, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were available in dev when I tested just from the default registrar. Should we use another registrar? If so, we can initialize the compat package #7103 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Mimir uses the default registerer. If I attempt to re-register the metrics using the registerer in initAlertManager then I get a panic: duplicate metrics collector registration attempted.

pkg/alertmanager/distributor_test.go Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
This commit updates Alertmanager in Mimir. It deprecates APIv1,
replacing all APIv1 endpoints with a deprecation notice. The
version of Alertmanager vendored in this commit contains support
for UTF-8, however is disabled until we initialize the compat
package from within Mimir using compat.InitFromFlags.
UTF-8 will be enabled at a later time.
Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gotjosh gotjosh merged commit 4d69925 into main Jan 24, 2024
28 checks passed
@gotjosh gotjosh deleted the grobinson/update-am branch January 24, 2024 10:16
@grobinson-grafana grobinson-grafana changed the title Update Alertmanager in Mimir Alertmanager: Update to latest main Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants