Adds (Prometheus) ServiceMonitor integration #16

mshivanna · 2024-02-20T18:40:07Z

This adds (Prometheus) ServiceMonitor integration via values, notably serviceMonitor.enabled.

Here are example values, integration tested via CI

serviceMonitor:
  enabled: true
  interval: 1s
  scrapeTimeout: 1s
  namespace: ci-monitoring

This was very tricky due to test due to..

using resources created out-of-band and in a different namespace (kube-prometheus-stack)
indirection between service monitor and values in prometheus, such as target scrapePool
unpredictable amount of time between creating k8s configuration and it converging.

Key	Type	Default	Description
serviceMonitor.enabled	bool	`false`	Creates a ServiceMonitor to scrape /prometheus. Requires prometheus-operator
serviceMonitor.namespace	string	override or release namespace	Namespace to create the service monitor in
serviceMonitor.labels	object	`{}`	Additional metadata labels
serviceMonitor.interval	string	Prometheus global scrape interval	How often to scrape /prometheus. e.g. '5s'
serviceMonitor.scrapeTimeout	string	Prometheus global scrape timeout	Timeout for scraping metrics. e.g. '10s'

codefromthecrypt · 2024-02-20T23:50:52Z

thanks for the start!

CI says

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: resource mapping not found for name: "zipkin-4rrm0lckxg" namespace: "zipkin-4rrm0lckxg" from "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

along with the change here, we need corresponding bits in the schema file and README
to make sure it works, we should also have ci/serviceMonitor-values.yaml

mshivanna · 2024-02-21T16:21:16Z

ok will fix it

Signed-off-by: mshivanna_tdx <[email protected]>

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2024-03-11T06:15:19Z

ok I fixed the things I mentioned and pushed

codefromthecrypt · 2024-03-11T06:21:52Z

tests pass, but I want to see if there's any way to actually test it (vs normal helm chart tests which just make sure it doesn't crash)

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2024-03-11T09:38:31Z

OK so current status that ct install passes unless you actually try to use this. This is one reason why I wanted to make sure there is an integration test. @mshivanna can you take a look and see what might be the issue? Basically the test can get the prometheus query endpoint, but there is no data in it from zipkin even after you wait.

codefromthecrypt · 2024-03-11T09:39:53Z

charts/zipkin/templates/tests/test-connection.yaml

+      # This uses prometheus-operated in the ci-monitoring namespace, from helmfile.yaml.
+      # Note: The query API returns HTTP 200 on empty, so we grep to ensure something returned.
+      # See https://prometheus.io/docs/prometheus/latest/querying/api/
+      args: [ 'sleep 5 && wget -q -O - http://prometheus-operated.ci-monitoring.svc.cluster.local:9090/api/v1/query?query=http_server_requests_seconds_max | grep zipkin' ]


you can take out the '|grep zipkin' temporarily here to see that the query works, but no data is returned. You might also want to check '/api/v1/targets?scrapePool=zipkin'

codefromthecrypt · 2024-03-11T09:49:49Z

so this is the part that fails because no data is returned. it didn't fail due to incorrect endpoint, as that would show something in the console. What failed was the 'grep'

   get-prometheus-query:
    Container ID:  containerd://5fa1db9ae46769445d59949f3c925a424e9f4f35fc8195ebd8fa2120f7140486
    Image:         ghcr.io/openzipkin/alpine:3.19.1
    Image ID:      ghcr.io/openzipkin/alpine@sha256:0269536c808330211eeb9d952ecfc262699038e90162fcb412d7c9ae102061a9
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      sleep 5 && wget -q -O - http://prometheus-operated.ci-monitoring.svc.cluster.local:9090/api/v1/query?query=http_server_requests_seconds_max | grep zipkin
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 11 Mar 2024 09:41:04 +0000
      Finished:     Mon, 11 Mar 2024 09:41:09 +0000

codefromthecrypt · 2024-03-11T09:51:42Z

pushed a commit that will pass only because it no longer validates.. to help someone with fresh eyes have a look at what might be up.

codefromthecrypt · 2024-03-11T10:52:14Z

so you can see here that there is data in the prom endpoint on zipkin, but it isn't being scraped for some reason.. or made available to prom. That's the problem to solve! Details in the last workflow run

 ==> Logs of container zipkin-ljnno241yb-test-connection
------------------------------------------------------------------------------------------------------------------------
--snip--
http_server_requests_seconds_max{method="GET",status="200",uri="/api/v2/services",} 0.022805119
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads 13.0
--snip--
------------------------------------------------------------------------------------------------------------------------
<== Logs of container zipkin-ljnno241yb-test-connection
------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------
==> Logs of container zipkin-ljnno241yb-test-connection
------------------------------------------------------------------------------------------------------------------------
{"status":"success","data":{"resultType":"vector","result":[]}}
------------------------------------------------------------------------------------------------------------------------
<== Logs of container zipkin-ljnno241yb-test-connection
------------------------------------------------------------------------------------------------------------------------
========================================================================================================================

codefromthecrypt · 2024-03-11T11:01:26Z

and in case it helps, here's the yaml produced by helm install zipkin charts/zipkin --values charts/zipkin/ci/serviceMonitor-values.yaml

# Source: zipkin/templates/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: zipkin
  namespace: default
  labels:
    helm.sh/chart: zipkin-0.2.2
    app.kubernetes.io/name: zipkin
    app.kubernetes.io/instance: zipkin
    app.kubernetes.io/version: "3.1.1"
    app.kubernetes.io/managed-by: Helm
spec:
  endpoints:
  - port: http-query
    path: '/prometheus'
    interval: 1s
    scrapeTimeout: 2s
  selector:
    matchLabels:
        app.kubernetes.io/name: zipkin
        app.kubernetes.io/instance: zipkin
  namespaceSelector:
    matchNames:
      - default

and after port forwarding like kubectl port-forward service/prometheus-operated 9090:9090 -n ci-monitoring

zipkin doesn't show up in the scrape pool

curl -s localhost:9090/api/v1/targets?scrapePool=zipkin|jq .
{
  "status": "success",
  "data": {
    "activeTargets": [],
    "droppedTargets": [],
    "droppedTargetCounts": {
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-apiserver/0": 0,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-coredns/0": 9,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-controller-manager/0": 10,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-etcd/0": 10,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-proxy/0": 10,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-scheduler/0": 10,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/0": 9,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/1": 9,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/2": 9,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-operator/0": 5,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-prometheus/0": 5,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-prometheus/1": 5,
      "serviceMonitor/ci-monitoring/prometheus-stack-kube-state-metrics/0": 5
    }
  }
}

codefromthecrypt · 2024-03-11T11:44:53Z

thought I got it, but I didn't. I noticed prometheus .spec.serviceMonitorSelector setup in helmfile.yaml is 'release: prometheus-stack' and added that label, but yeah didn't work anyway.

help wanted!

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt

This is ready to go. PTAL @anuraaga @reta to see if you understand my notes

codefromthecrypt · 2024-03-12T08:14:13Z

charts/zipkin/templates/servicemonitor.yaml

@@ -0,0 +1,47 @@
+{{- /*
+Copyright 2024 The OpenZipkin Authors


note: later we can switch everything to SPDX, so I didn't do it in this PR

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2024-03-12T12:25:32Z

charts/zipkin/ci/serviceMonitor-values.yaml

+serviceMonitor:
+  enabled: true
+  interval: 1s
+  scrapeTimeout: 1s


ps I don't know if this is normal in k8s, but if you make an invalid config, like scrapeTimeout > interval, the service monitor will be created, but just won't ever be processed. You end up having to look at prometheus-operator pod logs to figure it out. I don't know if this is a bug or a norm.. if someone thinks this is a bug, probably needs to be raised upstream as hours lost over this.

anuraaga · 2024-03-12T12:32:11Z

charts/zipkin/ci/helmfile.yaml

+  - name: prometheus-community
+    url: https://prometheus-community.github.io/helm-charts
+
+# Prometheus is too much to configure manually in a test yaml. We need the CRD


in a test yaml so we use helm.

Maybe, I wasn't quite sure what the intention is of this comment, made a guess

thanks, rewrote!

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt · 2024-03-13T01:28:36Z

thanks for the idea and initial commit @mshivanna! thanks for the review help here and behind the curtain @anuraaga!

codefromthecrypt changed the title ~~[zipkin-helm] option to enable serviceMonitor for zipkin~~ option to enable serviceMonitor Feb 20, 2024

[zipkin-helm] option to enable serviceMonitor for zipkin

85d5b23

Signed-off-by: mshivanna_tdx <[email protected]>

codefromthecrypt force-pushed the enable-serviceMonitor branch from 53022a5 to 85d5b23 Compare March 11, 2024 04:05

readme, schema and test

c40453d

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt force-pushed the enable-serviceMonitor branch from d083677 to 042acd8 Compare March 11, 2024 09:35

use helmfile for continuous integration

76c78a6

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt force-pushed the enable-serviceMonitor branch from 042acd8 to 76c78a6 Compare March 11, 2024 09:37

codefromthecrypt reviewed Mar 11, 2024

View reviewed changes

codefromthecrypt requested a review from anuraaga March 11, 2024 09:48

codefromthecrypt force-pushed the enable-serviceMonitor branch from b018f3b to 57ec2d8 Compare March 11, 2024 11:42

codefromthecrypt added the help wanted Extra attention is needed label Mar 11, 2024

Ensure everything works in real life

89677c0

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt force-pushed the enable-serviceMonitor branch from 57ec2d8 to 89677c0 Compare March 12, 2024 07:56

codefromthecrypt removed the help wanted Extra attention is needed label Mar 12, 2024

codefromthecrypt self-assigned this Mar 12, 2024

notes

adc4565

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt changed the title ~~option to enable serviceMonitor~~ Adds service Mar 12, 2024

codefromthecrypt changed the title ~~Adds service~~ Adds (Prometheus) ServiceMonitor integration Mar 12, 2024

codefromthecrypt approved these changes Mar 12, 2024

View reviewed changes

codefromthecrypt requested a review from reta March 12, 2024 08:12

codefromthecrypt reviewed Mar 12, 2024

View reviewed changes

you saw nothing.. move along

ec7be1e

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt reviewed Mar 12, 2024

View reviewed changes

anuraaga approved these changes Mar 12, 2024

View reviewed changes

feedback

46e3154

Signed-off-by: Adrian Cole <[email protected]>

codefromthecrypt merged commit 67d383f into openzipkin:master Mar 13, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds (Prometheus) ServiceMonitor integration #16

Adds (Prometheus) ServiceMonitor integration #16

mshivanna commented Feb 20, 2024 •

edited by codefromthecrypt

Loading

codefromthecrypt commented Feb 20, 2024

mshivanna commented Feb 21, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt left a comment

codefromthecrypt Mar 12, 2024

codefromthecrypt Mar 12, 2024

anuraaga Mar 12, 2024

codefromthecrypt Mar 13, 2024

codefromthecrypt Mar 13, 2024

codefromthecrypt commented Mar 13, 2024

Adds (Prometheus) ServiceMonitor integration #16

Adds (Prometheus) ServiceMonitor integration #16

Conversation

mshivanna commented Feb 20, 2024 • edited by codefromthecrypt Loading

codefromthecrypt commented Feb 20, 2024

mshivanna commented Feb 21, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt Mar 11, 2024

Choose a reason for hiding this comment

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt commented Mar 11, 2024

codefromthecrypt left a comment

Choose a reason for hiding this comment

codefromthecrypt Mar 12, 2024

Choose a reason for hiding this comment

codefromthecrypt Mar 12, 2024

Choose a reason for hiding this comment

anuraaga Mar 12, 2024

Choose a reason for hiding this comment

codefromthecrypt Mar 13, 2024

Choose a reason for hiding this comment

codefromthecrypt Mar 13, 2024

Choose a reason for hiding this comment

codefromthecrypt commented Mar 13, 2024

mshivanna commented Feb 20, 2024 •

edited by codefromthecrypt

Loading