Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds (Prometheus) ServiceMonitor integration #16

Merged
merged 7 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,11 @@ jobs:
uses: helm/kind-action@v1
if: steps.list-changed.outputs.changed == 'true'

- name: Setup helmfile
uses: mamezou-tech/[email protected]

- name: Install prometheus
run: helmfile -f charts/zipkin/ci/helmfile.yaml sync

- name: Run chart-testing (install)
run: ct install
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ You can then run `helm search repo zipkin` to see the charts.
| ingress.path | string | `"/"` | |
| ingress.tls | list | `[]` | |
| nameOverride | string | `""` | |
| namespaceOverride | string | release namespace | Namespace to create the zipkin resources in |
| nodeSelector | object | `{}` | |
| podAnnotations."sidecar.istio.io/inject" | string | `"false"` | |
| podSecurityContext | object | `{}` | |
Expand All @@ -55,6 +56,11 @@ You can then run `helm search repo zipkin` to see the charts.
| serviceAccount.create | bool | `true` | |
| serviceAccount.name | string | `""` | If not set and create is true, a name is generated using the fullname template |
| serviceAccount.psp | bool | `false` | |
| serviceMonitor.enabled | bool | `false` | Creates a ServiceMonitor to scrape /prometheus. Requires prometheus-operator |
| serviceMonitor.namespace | string | override or release namespace | Namespace to create the service monitor in |
| serviceMonitor.labels | object | `{}` | Additional metadata labels |
| serviceMonitor.interval | string | Prometheus global scrape interval | How often to scrape /prometheus. e.g. '5s' |
| serviceMonitor.scrapeTimeout | string | Prometheus global scrape timeout | Timeout for scraping metrics. e.g. '10s' |
| tolerations | list | `[]` | |
| zipkin.discovery.eureka.serviceUrl | string | no default | v2 endpoint of Eureka, e.g. `https://eureka-prod/eureka/v2` |
| zipkin.discovery.eureka.app | string | `"zipkin"` | The application this instance registers to |
Expand Down
56 changes: 56 additions & 0 deletions charts/zipkin/ci/helmfile.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
# install via `helmfile -f charts/zipkin/ci/helmfile.yaml sync`
repositories:
- name: prometheus-community
url: https://prometheus-community.github.io/helm-charts

# Prometheus requires the CRD servicemonitors.monitoring.coreos.com as well as
# Prometheus, deployed as the service named "prometheus-operated" in our test
# namespace "ci-monitoring". We set this up via helm prior to running tests, as
# adding CRDs and multiple resources during a test is far more complicated.
releases:
- name: prometheus-stack
namespace: ci-monitoring # arbitrary non-default name
createNamespace: true
chart: prometheus-community/kube-prometheus-stack
values:
- prometheusOperator:
enabled: true
prometheus:
enabled: true
# By default, the service monitor has namespace restrictions and must
# match a label "release: kube-prometheus-stack". Relax for testing.
# See https://prometheus-operator.dev/docs/operator/troubleshooting/#it-is-in-the-configuration-but-not-on-the-service-discovery-page
prometheusSpec:
serviceMonitorNamespaceSelector:
any: true
serviceMonitorSelector:
any: true
serviceMonitorSelectorNilUsesHelmValues: false
# Disable anything else, like multi-container grafana pods.
defaultRules:
enabled: false
alertmanager:
enabled: false
kubeApiServer:
enabled: false
kubelet:
enabled: false
kubeControllerManager:
enabled: false
coreDns:
enabled: false
kubeDns:
enabled: false
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeProxy:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: false
grafana:
enabled: false
6 changes: 6 additions & 0 deletions charts/zipkin/ci/serviceMonitor-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
serviceMonitor:
enabled: true
interval: 1s
scrapeTimeout: 1s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps I don't know if this is normal in k8s, but if you make an invalid config, like scrapeTimeout > interval, the service monitor will be created, but just won't ever be processed. You end up having to look at prometheus-operator pod logs to figure it out. I don't know if this is a bug or a norm.. if someone thinks this is a bug, probably needs to be raised upstream as hours lost over this.

namespace: ci-monitoring
47 changes: 47 additions & 0 deletions charts/zipkin/templates/servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{{- /*
Copyright 2024 The OpenZipkin Authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: later we can switch everything to SPDX, so I didn't do it in this PR


Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except
in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or implied. See the License for the specific language governing permissions and limitations under
the License.
*/}}
{{- if .Values.serviceMonitor.enabled -}}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ include "zipkin.fullname" . }}
{{- if .Values.serviceMonitor.namespace }}
namespace: {{ .Values.serviceMonitor.namespace }}
{{- else }}
namespace: {{ include "zipkin.namespace" . }}
{{- end }}
labels:
{{- include "zipkin.labels" . | nindent 4 }}
{{- if .Values.serviceMonitor.labels }}
{{- (toYaml .Values.serviceMonitor.labels | nindent 4) }}
{{- end }}
spec:
jobLabel: {{ include "zipkin.fullname" . }}
namespaceSelector:
matchNames:
- {{ include "zipkin.namespace" . }}
endpoints:
- port: http-query
path: '/prometheus'
scheme: http
{{- with .Values.serviceMonitor.interval }}
interval: {{ . }}
{{- end }}
{{- with .Values.serviceMonitor.scrapeTimeout }}
scrapeTimeout: {{ . }}
{{- end }}
selector:
matchLabels:
{{- include "zipkin.selectorLabels" . | nindent 8 }}
{{- end }}
18 changes: 16 additions & 2 deletions charts/zipkin/templates/tests/test-connection.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,7 @@ apiVersion: v1
kind: Pod
metadata:
name: "{{ include "zipkin.fullname" . }}-test-connection"
labels:
{{- include "zipkin.labels" . | nindent 4 }}
labels: {} # we don't need any labels in the test pod!
annotations:
"helm.sh/hook": test
spec:
Expand Down Expand Up @@ -45,5 +44,20 @@ spec:
command: [ '/bin/sh', '-c' ]
# If self-tracing, sleep for the trace to process. Then, get it by the constant ID passed above.
args: [ 'sleep 3 && wget -q --spider http://{{ include "zipkin.fullname" . }}:{{ .Values.service.port }}/api/v2/trace/cafebabecafebabe' ]
{{- end }}
{{- if .Values.serviceMonitor.enabled }}
# This verifies prometheus scraped the zipkin service on the correct
# endpoint, by reading an actual statistic.
# See https://prometheus.io/docs/prometheus/latest/querying/api/
- name: get-prometheus-query
image: 'ghcr.io/openzipkin/alpine:3.19.1'
command: [ '/bin/sh', '-c' ]
# Note: The below commands use the Prometheus API, which returns HTTP 200
# even on empty. Rather than install jq, we use grep to ensure a result.
#
# We use a sleep loop despite the scrape delay of only 1s. This is due to
# an up to one-minute read-back delay between adding the service monitor,
# and being visibility as a prometheus target in kube-prometheus-stack.
args: [ 'until (wget -q -O - http://prometheus-operated.{{ .Values.serviceMonitor.namespace }}.svc.cluster.local:9090/api/v1/query?query=http_server_requests_seconds_max|grep zipkin); do sleep 3; done' ]
{{- end }}
restartPolicy: Never
20 changes: 20 additions & 0 deletions charts/zipkin/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,26 @@
}
}
},
"serviceMonitor": {
"type": "object",
"properties": {
"enabled": {
"type": "boolean"
},
"interval": {
"type": "string"
},
"labels": {
"type": "object"
},
"namespace": {
"type": "string"
},
"scrapeTimeout": {
"type": "string"
}
}
},
"tolerations": {
"type": "array"
},
Expand Down
11 changes: 11 additions & 0 deletions charts/zipkin/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,17 @@ service:
type: ClusterIP
port: 9411

serviceMonitor:
# Creates a ServiceMonitor to scrape /prometheus
enabled: false
# Namespace to create the service monitor in
namespace: ""
# interval: 10s
# scrapeTimeout: 10s
# Add any labels required by your prometheus spec serviceMonitorSelector
labels: {}
# release: prometheus

ingress:
enabled: false
annotations:
Expand Down
Loading