-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds (Prometheus) ServiceMonitor integration #16
Adds (Prometheus) ServiceMonitor integration #16
Conversation
thanks for the start! CI says
along with the change here, we need corresponding bits in the schema file and README |
ok will fix it |
Signed-off-by: mshivanna_tdx <[email protected]>
53022a5
to
85d5b23
Compare
Signed-off-by: Adrian Cole <[email protected]>
ok I fixed the things I mentioned and pushed |
tests pass, but I want to see if there's any way to actually test it (vs normal helm chart tests which just make sure it doesn't crash) |
d083677
to
042acd8
Compare
Signed-off-by: Adrian Cole <[email protected]>
042acd8
to
76c78a6
Compare
OK so current status that ct install passes unless you actually try to use this. This is one reason why I wanted to make sure there is an integration test. @mshivanna can you take a look and see what might be the issue? Basically the test can get the prometheus query endpoint, but there is no data in it from zipkin even after you wait. |
# This uses prometheus-operated in the ci-monitoring namespace, from helmfile.yaml. | ||
# Note: The query API returns HTTP 200 on empty, so we grep to ensure something returned. | ||
# See https://prometheus.io/docs/prometheus/latest/querying/api/ | ||
args: [ 'sleep 5 && wget -q -O - http://prometheus-operated.ci-monitoring.svc.cluster.local:9090/api/v1/query?query=http_server_requests_seconds_max | grep zipkin' ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can take out the '|grep zipkin' temporarily here to see that the query works, but no data is returned. You might also want to check '/api/v1/targets?scrapePool=zipkin'
so this is the part that fails because no data is returned. it didn't fail due to incorrect endpoint, as that would show something in the console. What failed was the 'grep'
|
pushed a commit that will pass only because it no longer validates.. to help someone with fresh eyes have a look at what might be up. |
so you can see here that there is data in the prom endpoint on zipkin, but it isn't being scraped for some reason.. or made available to prom. That's the problem to solve! Details in the last workflow run
|
and in case it helps, here's the yaml produced by
and after port forwarding like zipkin doesn't show up in the scrape pool curl -s localhost:9090/api/v1/targets?scrapePool=zipkin|jq .
{
"status": "success",
"data": {
"activeTargets": [],
"droppedTargets": [],
"droppedTargetCounts": {
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-apiserver/0": 0,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-coredns/0": 9,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-controller-manager/0": 10,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-etcd/0": 10,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-proxy/0": 10,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kube-scheduler/0": 10,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/0": 9,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/1": 9,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-kubelet/2": 9,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-operator/0": 5,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-prometheus/0": 5,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-prom-prometheus/1": 5,
"serviceMonitor/ci-monitoring/prometheus-stack-kube-state-metrics/0": 5
}
}
} |
b018f3b
to
57ec2d8
Compare
thought I got it, but I didn't. I noticed prometheus .spec.serviceMonitorSelector setup in helmfile.yaml is 'release: prometheus-stack' and added that label, but yeah didn't work anyway. help wanted! |
Signed-off-by: Adrian Cole <[email protected]>
57ec2d8
to
89677c0
Compare
Signed-off-by: Adrian Cole <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,47 @@ | |||
{{- /* | |||
Copyright 2024 The OpenZipkin Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: later we can switch everything to SPDX, so I didn't do it in this PR
Signed-off-by: Adrian Cole <[email protected]>
serviceMonitor: | ||
enabled: true | ||
interval: 1s | ||
scrapeTimeout: 1s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ps I don't know if this is normal in k8s, but if you make an invalid config, like scrapeTimeout > interval, the service monitor will be created, but just won't ever be processed. You end up having to look at prometheus-operator pod logs to figure it out. I don't know if this is a bug or a norm.. if someone thinks this is a bug, probably needs to be raised upstream as hours lost over this.
charts/zipkin/ci/helmfile.yaml
Outdated
- name: prometheus-community | ||
url: https://prometheus-community.github.io/helm-charts | ||
|
||
# Prometheus is too much to configure manually in a test yaml. We need the CRD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a test yaml so we use helm.
Maybe, I wasn't quite sure what the intention is of this comment, made a guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, rewrote!
Signed-off-by: Adrian Cole <[email protected]>
thanks for the idea and initial commit @mshivanna! thanks for the review help here and behind the curtain @anuraaga! |
This adds (Prometheus) ServiceMonitor integration via values, notably
serviceMonitor.enabled
.Here are example values, integration tested via CI
This was very tricky due to test due to..
false
{}