diff --git a/CHANGES.md b/CHANGES.md index b20880015..4de3b4958 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -15,7 +15,36 @@ [Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest) ------------------------------------------------------------------------------------------------------------------ -[//]: # (list changes here, using '-' for each new entry, remove this when items are added) +## Changes + +- Add the `prometheus-longterm-metrics` and `thanos` optional components + + The `prometheus-longterm-metrics` component collects longterm monitoring metrics from the original prometheus instance + (the one created by the ``components/monitoring`` component). + + Longterm metrics are any prometheus rule that have the label ``group: longterm-metrics`` or in other words are + selectable using prometheus's ``'{group="longterm-metrics"}'`` query filter. To see which longterm metric rules are + added by default see the + ``optional-components/prometheus-longterm-metrics/config/monitoring/prometheus.rules.template`` file. + + To configure this component: + + * update the ``PROMETHEUS_LONGTERM_RETENTION_TIME`` variable to set how long the data will be kept by prometheus + + Enabling the `prometheus-longterm-metrics` component creates the additional endpoint ``/prometheus-longterm-metrics``. + + The `thanos` component enables better storage of longterm metrics collected by the + ``optional-components/prometheus-longterm-metrics`` component. Data will be collected from the + ``prometheus-longterm-metrics`` and stored in an S3 object store indefinitely. + + When enabling this component, please change the default values for the ``THANOS_MINIO_ROOT_USER`` and ``THANOS_MINIO_ROOT_PASSWORD`` + by updating the ``env.local`` file. These set the login credentials for the root user that runs the + [minio](https://min.io/) object store. + +- Enabling the `thanos` component creates the additional endpoints: + + * ``/thanos-query``: a prometheus-like query interface to inspect the data stored by thanos + * ``/thanos-minio``: a minio web console to inspect the data stored by minio. [2.6.0](https://github.com/bird-house/birdhouse-deploy/tree/2.6.0) (2024-11-19) ------------------------------------------------------------------------------------------------------------------ diff --git a/birdhouse/components/README.rst b/birdhouse/components/README.rst index c9c1ffebe..cf29abda8 100644 --- a/birdhouse/components/README.rst +++ b/birdhouse/components/README.rst @@ -372,6 +372,7 @@ AlertManager for Alert Dashboard and Silencing .. image:: monitoring/images/alertmanager-dashboard.png .. image:: monitoring/images/alertmanager-silence-alert.png +.. _monitoring-customize-the-component Customizing the Component ------------------------- @@ -390,6 +391,57 @@ Customizing the Component Slack or other services accepting webhooks), ``ALERTMANAGER_EXTRA_RECEIVERS``. +Longterm Storage of Prometheus Metrics +-------------------------------------- + +Prometheus stores metrics for 90 days by default. This may be sufficient for some use cases but you may wish to store +some metrics for longer. In order to store certain metrics for a longer than 90 days, you can enable the following +additional components: + +- :ref:`prometheus-longterm-metrics`: a second Prometheus instance used to collect the metrics that you want to store longterm +- :ref:`thanos`: a service that enables more efficient storage of the metrics collected by the :ref:`prometheus-longterm-metrics` + component. +- :ref:`prometheus-longterm-rules`: adds some example rules to the monitoring Prometheus instance (the one deployed by this `monitoring` + component) that can be stored longterm by the `prometheus-longterm-metrics` component. + +.. note:: + A separate prometheus instance is necessary since the retention time for prometheus metrics is set at the + instance level. This means that increasing the retention time must be done for all metrics at once which is undesirable + because you probably don't need to store every metric for a long period of time and you'll end up using a lot more + disk space than needed. + +If some or all of these additional components are enabled, they interact in the following way to store certain metrics for +longer than 90 days: + +1. + - `recording rules`_ are added to the monitoring Prometheus instance (the one deployed by this `monitoring` component). These + rules are any that have the `longterm-metrics` label. + - The metrics described by these rules are collected/calculated by the monitoring Prometheus instance. The monitoring Prometheus + instance treats these rules the same as any other (ie. only stores them for 90 days by default). + - To enable some example longterm `recording rules`_, enable the :ref:`prometheus-longterm-rules` component. You can also choose + to create your own rules (see :ref:`prometheus-longterm-metrics` for details on how to create these longterm metrics rules). +2. + - The :ref:`prometheus-longterm-metrics` Prometheus instance collects/copies only the rules with the `longterm-metrics` label from the + monitoring Prometheus instance. + - The :ref:`prometheus-longterm-metrics` Prometheus instance stores only these metrics for a custom duration (can be longer than + 90 days). +3. + - The :ref:`thanos` component can be deployed alongside the :ref:`prometheus-longterm-metrics` Prometheus instance in order to store + the metrics that the :ref:`prometheus-longterm-metrics` Prometheus instance has already collected. + - The :ref:`thanos` component collects the metrics collected by the :ref:`prometheus-longterm-metrics` Prometheus instance and + stores them in an S3 object store. + - The :ref:`thanos` object store stores the metrics more efficiently, meaning that metrics can be stored for even longer and they'll + take up less disk space than if they were just stored by the :ref:`prometheus-longterm-metrics` Prometheus instance. + +.. note:: + + It is possible to deploy the :ref:`prometheus-longterm-metrics` Prometheus instance and the :ref:`thanos` instance on a different + machine than the monitoring Prometheus instance. However, note that both the :ref:`prometheus-longterm-metrics` and :ref:`thanos` + components *must* be deployed on the same machine (if both are in use). Also note that this is untested and may require serious + troubleshooting to work properly. + +.. _recording rules: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/ + Weaver ====== diff --git a/birdhouse/env.local.example b/birdhouse/env.local.example index c247d506b..24f37af5c 100644 --- a/birdhouse/env.local.example +++ b/birdhouse/env.local.example @@ -632,6 +632,14 @@ export THREDDS_ADDITIONAL_CATALOG='' #export ALERTMANAGER_EXTRA_INHIBITION="" #export ALERTMANAGER_EXTRA_RECEIVERS="" +# Below are for the prometheus-longterm-metrics optional component +#export PROMETHEUS_LONGTERM_RETENTION_TIME=1y + +# Below are for the thanos optional component +# Change these from the default for added security +#export THANOS_MINIO_ROOT_USER="${__DEFAULT__THANOS_MINIO_ROOT_USER}" +#export THANOS_MINIO_ROOT_PASSWORD="${__DEFAULT__THANOS_MINIO_ROOT_PASSWORD}" + # Below are for the prometheus-log-parser optional component #export PROMETHEUS_LOG_PARSER_POLL_DELAY=1 # time in seconds #export PROMETHEUS_LOG_PARSER_TAIL=true diff --git a/birdhouse/optional-components/README.rst b/birdhouse/optional-components/README.rst index 2210bbe1b..a10d75b69 100644 --- a/birdhouse/optional-components/README.rst +++ b/birdhouse/optional-components/README.rst @@ -444,6 +444,80 @@ How to enable X-Robots-Tag Header in ``env.local`` (a copy from `env.local.examp .. seealso:: See the `env.local.example`_ file for more details about this ``BIRDHOUSE_PROXY_ROOT_LOCATION`` behaviour. +.. _prometheus-longterm-metrics + +Prometheus Long-term Metrics +---------------------------- + +This is a second prometheus instance that collects longterm monitoring metrics from the monitoring Prometheus instance +(the one created by the ``components/monitoring`` component). + +Longterm metrics are any prometheus rule that have the label ``group: longterm-metrics`` or in other words are +selectable using prometheus' ``'{group="longterm-metrics"}'`` query filter. To add some default longterm metrics rules +also enable the ``prometheus-longterm-rules`` component. + +You may also choose to create your own set of rules instead of, or as well as, the default ones. See how to +:ref:`add additional rules here `. + +To configure this component: + + * update the ``PROMETHEUS_LONGTERM_RETENTION_TIME`` variable to set how long the data will be kept by prometheus + +If the monitoring Prometheus instance that this Prometheus instance is tracking is not deployed on the same machine +(or at a non-default network address on the same machine), you may configure the network location of the monitoring +Prometheus instance by setting the ``PROMETHEUS_LONGTERM_TARGETS`` variable. For example, if the monitoring Prometheus +instance's API is available at `https://example.com/prometheus:9090` the you can set the variable: + +.. code:: + + export PROMETHEUS_LONGTERM_TARGETS='["https://example.com/prometheus:9090"]' + +.. note:: + + You may list multiple monitoring Prometheus instances to track in this way by adding more URLs to the list. + +.. warning:: + + Deploying the longterm metrics Prometheus instance on a separate machine from the monitoring Prometheus component + is untested and may require serious troubleshooting to work properly. + +Enabling this component creates the additional endpoint ``/prometheus-longterm-metrics``. + +.. _prometheus-longterm-rules + +Prometheus Long-term Rules +-------------------------- + +This adds some default longterm metrics rules to the `prometheus` component for use by the `prometheus-longterm-metrics` +component. These rules all have the label ``group: longterm-metrics``. + +To see which rules are added, check out the +`optional-components/prometheus-longterm-rules/config/monitoring/prometheus.rules` file. + +.. _thanos + +Thanos +------ + +This enables better storage of longterm metrics collected by the ``optional-components/prometheus-longterm-metrics`` +component. Data will be collected from the ``prometheus-longterm-metrics`` and stored in an S3 object store +indefinitely. + +When enabling this component, please change the default values for the ``THANOS_MINIO_ROOT_USER`` and +``THANOS_MINIO_ROOT_PASSWORD`` by updating the ``env.local`` file. These set the login credentials for the root user +that runs the minio_ object store. + +Enabling this component creates the additional endpoints: + * ``/thanos-query``: a prometheus-like query interface to inspect the data stored by thanos + * ``/thanos-minio``: a minio_ web console to inspect the data stored by minio_. + +.. note:: + + The `thanos` component must be deployed on the same machine as the `prometheus-longterm-metrics` component since + `thanos` needs access to the data stored by prometheus on disk (in docker this is acheived by sharing a named volume). + +.. _minio: https://min.io/ + .. _prometheus-log-parser Prometheus Log Parser diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/.gitignore b/birdhouse/optional-components/prometheus-longterm-metrics/.gitignore new file mode 100644 index 000000000..b7813ee7b --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/.gitignore @@ -0,0 +1,3 @@ +prometheus.yml +config/magpie/config.yml +config/proxy/conf.extra-service.d/monitoring.conf diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/config.yml.template b/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/config.yml.template new file mode 100644 index 000000000..420685852 --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/config.yml.template @@ -0,0 +1,28 @@ +providers: + prometheus-longterm-metrics: + # below URL is only used to fill in the required location in Magpie + # actual auth validation is performed with Twitcher 'verify' endpoint without accessing this proxied URL + url: http://proxy:80 + title: PrometheusLongtermMetrics + public: true + c4i: false + type: api + sync_type: api + +permissions: + - service: prometheus-longterm-metrics + permission: read + group: administrators + action: create + - service: prometheus-longterm-metrics + permission: write + group: administrators + action: create + - service: prometheus-longterm-metrics + permission: read + group: monitoring + action: create + - service: prometheus-longterm-metrics + permission: write + group: monitoring + action: create diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/docker-compose-extra.yml b/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/docker-compose-extra.yml new file mode 100644 index 000000000..4278c611e --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/config/magpie/docker-compose-extra.yml @@ -0,0 +1,7 @@ +version: "3.4" + +services: + magpie: + volumes: + - ./optional-components/prometheus-longterm-metrics/config/magpie/config.yml:${MAGPIE_PERMISSIONS_CONFIG_PATH}/prometheus-longterm-metrics.yml:ro + - ./optional-components/prometheus-longterm-metrics/config/magpie/config.yml:${MAGPIE_PROVIDERS_CONFIG_PATH}/prometheus-longterm-metrics.yml:ro diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/conf.extra-service.d/monitoring.conf.template b/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/conf.extra-service.d/monitoring.conf.template new file mode 100644 index 000000000..67c25c053 --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/conf.extra-service.d/monitoring.conf.template @@ -0,0 +1,18 @@ + location /prometheus-longterm-metrics { + auth_request /secure-prometheus-longterm-metrics-auth; + auth_request_set $auth_status $upstream_status; + proxy_pass http://prometheus-longterm-metrics:9090; + proxy_set_header Host $host; + } + + location = /secure-prometheus-longterm-metrics-auth { + internal; + proxy_pass https://${BIRDHOUSE_FQDN_PUBLIC}${TWITCHER_VERIFY_PATH}/prometheus-longterm-metrics$request_uri; + proxy_pass_request_body off; + proxy_set_header Host $host; + proxy_set_header Content-Length ""; + proxy_set_header X-Original-URI $request_uri; + proxy_set_header X-Forwarded-Proto $real_scheme; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Host $host:$server_port; + } diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/docker-compose-extra.yml b/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/docker-compose-extra.yml new file mode 100644 index 000000000..b25d3e080 --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/config/proxy/docker-compose-extra.yml @@ -0,0 +1,6 @@ +version: "3.4" + +services: + proxy: + volumes: + - ./optional-components/prometheus-longterm-metrics/config/proxy/conf.extra-service.d:/etc/nginx/conf.extra-service.d/prometheus-longterm-metrics:ro diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/default.env b/birdhouse/optional-components/prometheus-longterm-metrics/default.env new file mode 100644 index 000000000..8f6d9638a --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/default.env @@ -0,0 +1,29 @@ +export PROMETHEUS_LONGTERM_VERSION='${PROMETHEUS_VERSION:-"v2.52.0"}' +export PROMETHEUS_LONGTERM_DOCKER='${PROMETHEUS_DOCKER:-prom/prometheus}' +export PROMETHEUS_LONGTERM_IMAGE='${PROMETHEUS_LONGTERM_DOCKER}:${PROMETHEUS_LONGTERM_VERSION}' + +export PROMETHEUS_LONGTERM_RETENTION_TIME=1y +export PROMETHEUS_LONGTERM_SCRAPE_INTERVAL=1h + +# These are the prometheus defaults +export PROMETHEUS_LONGTERM_TSDB_MIN_BLOCK_DURATION=2h +export PROMETHEUS_LONGTERM_TSDB_MAX_BLOCK_DURATION=1d12h + +# These are the targets that +export PROMETHEUS_LONGTERM_TARGETS='["prometheus:9090"]' # yaml list syntax + +OPTIONAL_VARS=" + $OPTIONAL_VARS + \$PROMETHEUS_LONGTERM_SCRAPE_INTERVAL + \$PROMETHEUS_LONGTERM_TARGETS +" + +export DELAYED_EVAL=" + $DELAYED_EVAL + PROMETHEUS_LONGTERM_VERSION + PROMETHEUS_LONGTERM_DOCKER + PROMETHEUS_LONGTERM_IMAGE +" + +# Note that this component does not depend explicitly on the `components/monitoring` component so that this can +# theoretically be deployed on a different machine than the `prometheus` service. This is currently untested. diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/docker-compose-extra.yml b/birdhouse/optional-components/prometheus-longterm-metrics/docker-compose-extra.yml new file mode 100644 index 000000000..426d4d0ef --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/docker-compose-extra.yml @@ -0,0 +1,32 @@ +version: "3.4" + +x-logging: + &default-logging + driver: "json-file" + options: + max-size: "50m" + max-file: "10" + +services: + prometheus-longterm-metrics: + image: ${PROMETHEUS_LONGTERM_IMAGE} + container_name: prometheus-longterm-metrics + volumes: + - ./optional-components/prometheus-longterm-metrics/prometheus.yml:/etc/prometheus/prometheus.yml:ro + - prometheus_longterm_persistence:/prometheus:rw + command: + - --config.file=/etc/prometheus/prometheus.yml + - --storage.tsdb.path=/prometheus + - --web.console.libraries=/usr/share/prometheus/console_libraries + - --web.console.templates=/usr/share/prometheus/consoles + - --storage.tsdb.retention.time=${PROMETHEUS_LONGTERM_RETENTION_TIME} + - --web.external-url=https://${BIRDHOUSE_FQDN_PUBLIC}/prometheus-longterm-metrics/ + - --storage.tsdb.min-block-duration=${PROMETHEUS_LONGTERM_TSDB_MIN_BLOCK_DURATION} + - --storage.tsdb.max-block-duration=${PROMETHEUS_LONGTERM_TSDB_MAX_BLOCK_DURATION} + restart: always + logging: *default-logging + +volumes: + prometheus_longterm_persistence: + external: + name: prometheus_longterm_persistence diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/pre-docker-compose-up b/birdhouse/optional-components/prometheus-longterm-metrics/pre-docker-compose-up new file mode 100755 index 000000000..76a44e2e8 --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/pre-docker-compose-up @@ -0,0 +1,3 @@ +#!/bin/sh -x + +docker volume create prometheus_longterm_persistence # metrics db diff --git a/birdhouse/optional-components/prometheus-longterm-metrics/prometheus.yml.template b/birdhouse/optional-components/prometheus-longterm-metrics/prometheus.yml.template new file mode 100644 index 000000000..c0ade1ba1 --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-metrics/prometheus.yml.template @@ -0,0 +1,17 @@ +global: + external_labels: + instance_name: prometheus-longterm-metrics + +scrape_configs: + - job_name: 'federate' + scrape_interval: ${PROMETHEUS_LONGTERM_SCRAPE_INTERVAL} + + honor_labels: true + metrics_path: '/prometheus/federate' + + params: + 'match[]': + - '{group="longterm-metrics"}' + + static_configs: + - targets: ${PROMETHEUS_LONGTERM_TARGETS} diff --git a/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/docker-compose-extra.yml b/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/docker-compose-extra.yml new file mode 100644 index 000000000..0f701b30b --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/docker-compose-extra.yml @@ -0,0 +1,6 @@ +version: "3.4" + +services: + prometheus: + volumes: + - ./optional-components/prometheus-longterm-rules/config/monitoring/prometheus.rules:/etc/prometheus/prometheus-longterm-metrics.rules:ro diff --git a/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/prometheus.rules b/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/prometheus.rules new file mode 100644 index 000000000..465a8f20e --- /dev/null +++ b/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/prometheus.rules @@ -0,0 +1,15 @@ +groups: + - name: longterm-metrics-hourly + interval: 1h + rules: + # percentage of the time, over the last hour, that all CPUs were working + # 1 means all CPUs were working all the time, 0 means they were all idle all the time + - record: instance:cpu_load:avg_rate1h + expr: avg by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[1h])) + labels: + group: longterm-metrics + # total number of bytes that were sent or received over the network in the last hour + - record: instance:network_bytes_transmitted:sum_rate1h + expr: sum by(instance) (rate(node_network_transmit_bytes_total[1h]) + rate(node_network_receive_bytes_total[1h])) + labels: + group: longterm-metrics diff --git a/birdhouse/optional-components/thanos/.gitignore b/birdhouse/optional-components/thanos/.gitignore new file mode 100644 index 000000000..97ac1a63e --- /dev/null +++ b/birdhouse/optional-components/thanos/.gitignore @@ -0,0 +1,2 @@ +config/magpie/config.yml +config/proxy/conf.extra-service.d/monitoring.conf diff --git a/birdhouse/optional-components/thanos/config/magpie/config.yml.template b/birdhouse/optional-components/thanos/config/magpie/config.yml.template new file mode 100644 index 000000000..05633dff4 --- /dev/null +++ b/birdhouse/optional-components/thanos/config/magpie/config.yml.template @@ -0,0 +1,28 @@ +providers: + thanos: + # below URL is only used to fill in the required location in Magpie + # actual auth validation is performed with Twitcher 'verify' endpoint without accessing this proxied URL + url: http://proxy:80 + title: Thanos + public: true + c4i: false + type: api + sync_type: api + +permissions: + - service: thanos + permission: read + group: administrators + action: create + - service: thanos + permission: write + group: administrators + action: create + - service: thanos + permission: read + group: monitoring + action: create + - service: thanos + permission: write + group: monitoring + action: create diff --git a/birdhouse/optional-components/thanos/config/magpie/docker-compose-extra.yml b/birdhouse/optional-components/thanos/config/magpie/docker-compose-extra.yml new file mode 100644 index 000000000..fd3e207ac --- /dev/null +++ b/birdhouse/optional-components/thanos/config/magpie/docker-compose-extra.yml @@ -0,0 +1,7 @@ +version: "3.4" + +services: + magpie: + volumes: + - ./optional-components/thanos/config/magpie/config.yml:${MAGPIE_PERMISSIONS_CONFIG_PATH}/thanos.yml:ro + - ./optional-components/thanos/config/magpie/config.yml:${MAGPIE_PROVIDERS_CONFIG_PATH}/thanos.yml:ro diff --git a/birdhouse/optional-components/thanos/config/proxy/conf.extra-service.d/monitoring.conf.template b/birdhouse/optional-components/thanos/config/proxy/conf.extra-service.d/monitoring.conf.template new file mode 100644 index 000000000..e20d2a99b --- /dev/null +++ b/birdhouse/optional-components/thanos/config/proxy/conf.extra-service.d/monitoring.conf.template @@ -0,0 +1,38 @@ + location /thanos-query { + auth_request /secure-thanos-auth; + auth_request_set $auth_status $upstream_status; + proxy_pass http://thanos-query:19192; + proxy_set_header Host $host; + } + + location /thanos-minio/ { + auth_request /secure-thanos-auth; + auth_request_set $auth_status $upstream_status; + + rewrite ^/thanos-minio/(.*) /$1 break; + proxy_pass http://thanos-minio:9001; + + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "Upgrade"; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + + # This allows WebSocket connections + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + } + + location = /secure-thanos-auth { + internal; + proxy_pass https://${BIRDHOUSE_FQDN_PUBLIC}${TWITCHER_VERIFY_PATH}/thanos$request_uri; + proxy_pass_request_body off; + proxy_set_header Host $host; + proxy_set_header Content-Length ""; + proxy_set_header X-Original-URI $request_uri; + proxy_set_header X-Forwarded-Proto $real_scheme; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Host $host:$server_port; + } diff --git a/birdhouse/optional-components/thanos/config/proxy/docker-compose-extra.yml b/birdhouse/optional-components/thanos/config/proxy/docker-compose-extra.yml new file mode 100644 index 000000000..39977c0f4 --- /dev/null +++ b/birdhouse/optional-components/thanos/config/proxy/docker-compose-extra.yml @@ -0,0 +1,6 @@ +version: "3.4" + +services: + proxy: + volumes: + - ./optional-components/thanos/config/proxy/conf.extra-service.d:/etc/nginx/conf.extra-service.d/thanos:ro diff --git a/birdhouse/optional-components/thanos/default.env b/birdhouse/optional-components/thanos/default.env new file mode 100644 index 000000000..5780b5523 --- /dev/null +++ b/birdhouse/optional-components/thanos/default.env @@ -0,0 +1,50 @@ + +export THANOS_VERSION=v0.35.1 +export THANOS_DOCKER="thanosio/thanos" +export THANOS_IMAGE='${THANOS_DOCKER}:${THANOS_VERSION}' + +export THANOS_MINIO_VERSION=RELEASE.2024-05-27T19-17-46Z +export THANOS_MINIO_DOCKER=minio/minio +export THANOS_MINIO_IMAGE='${THANOS_MINIO_DOCKER}:${THANOS_MINIO_VERSION}' + +# Minio uses object storage on disk at this location +export THANOS_MINIO_DATA_STORE='${BIRDHOUSE_DATA_PERSIST_ROOT}/thanos_minio_data/' + +# Note that bucket names must only contain lowercase ascii, digits, - and . +export THANOS_MINIO_BUCKET_NAME=thanos-bucket + +# Minio credentials +export __DEFAULT__THANOS_MINIO_ROOT_USER=minioadmin +export __DEFAULT__THANOS_MINIO_ROOT_PASSWORD=minioadmin +export THANOS_MINIO_ROOT_USER="${__DEFAULT__THANOS_MINIO_ROOT_USER}" +export THANOS_MINIO_ROOT_PASSWORD="${__DEFAULT__THANOS_MINIO_ROOT_PASSWORD}" + +# Set a schedule to run the compactor. This should be at least double the largest longterm-metrics interval. +# eg. if thanos is collecting a metric that is calculated every 24h (daily) then this value should be at least 48h +export THANOS_COMPACTOR_WAIT_INTERVAL=48h + +# The longterm data retention time can be shortened back to the default since Thanos is now responsible for +# storing longterm data, not the prometheus-longterm-metrics component. +export PROMETHEUS_LONGTERM_RETENTION_TIME=15d + +# The thanos-sidecar component requires that these two values be equal or else it cannot perform its own compaction +# https://thanos.io/tip/components/sidecar.md/#sidecar +export PROMETHEUS_LONGTERM_TSDB_MIN_BLOCK_DURATION=2h +export PROMETHEUS_LONGTERM_TSDB_MAX_BLOCK_DURATION=2h + +VARS=" + $VARS + \$THANOS_MINIO_ROOT_USER + \$THANOS_MINIO_ROOT_PASSWORD +" + +export DELAYED_EVAL=" + $DELAYED_EVAL + THANOS_IMAGE + THANOS_MINIO_IMAGE + THANOS_MINIO_DATA_STORE +" + +COMPONENT_DEPENDENCIES=" + ./optional-components/prometheus-longterm-metrics +" diff --git a/birdhouse/optional-components/thanos/docker-compose-extra.yml b/birdhouse/optional-components/thanos/docker-compose-extra.yml new file mode 100644 index 000000000..424404f2b --- /dev/null +++ b/birdhouse/optional-components/thanos/docker-compose-extra.yml @@ -0,0 +1,85 @@ +version: "3.4" + +x-logging: + &default-logging + driver: "json-file" + options: + max-size: "50m" + max-file: "10" + +x-objstore-config: &objstore-config | + --objstore.config=type: S3 + config: + bucket: ${THANOS_MINIO_BUCKET_NAME} + access_key: ${THANOS_MINIO_ROOT_USER} + secret_key: ${THANOS_MINIO_ROOT_PASSWORD} + endpoint: thanos-minio:9000 + insecure: true # use http instead of https + +services: + thanos-sidecar: + image: ${THANOS_IMAGE} + container_name: thanos-sidecar + volumes: + - prometheus_longterm_persistence:/prometheus + user: nobody # prometheus runs as this user so the sidecar must as well + command: + - 'sidecar' + - '--tsdb.path=/prometheus' + - '--prometheus.url=http://prometheus-longterm-metrics:9090/prometheus-longterm-metrics' + - '--grpc-address=0.0.0.0:19090' + - '--http-address=0.0.0.0:19191' + - *objstore-config + depends_on: + - prometheus-longterm-metrics + - thanos-minio + restart: always + logging: *default-logging + + thanos-query: + image: ${THANOS_IMAGE} + container_name: thanos-query + command: + - 'query' + - '--http-address=0.0.0.0:19192' + - '--web.route-prefix=/thanos-query' + - '--web.external-prefix=/thanos-query' + - '--endpoint=thanos-sidecar:19090' + depends_on: + - thanos-sidecar + restart: always + logging: *default-logging + + thanos-compactor: + image: ${THANOS_IMAGE} + container_name: thanos-compactor + command: + - 'compact' + - '--data-dir=/tmp/data' # temporary workspace (doesn't need to be a volume) + - '--wait' + - '--wait-interval=${THANOS_COMPACTOR_WAIT_INTERVAL}' + - *objstore-config + depends_on: + - thanos-minio + restart: always + logging: *default-logging + + thanos-minio: + image: ${THANOS_MINIO_IMAGE} + container_name: thanos-minio + volumes: + - ${THANOS_MINIO_DATA_STORE}:/data + - ./optional-components/thanos/minio-entrypoint:/entrypoint + entrypoint: /entrypoint + command: + - 'minio' + - 'server' + - '--console-address' + - ':9001' + - '/data' + environment: + - MINIO_ROOT_USER=${THANOS_MINIO_ROOT_USER} + - MINIO_ROOT_PASSWORD=${THANOS_MINIO_ROOT_PASSWORD} + - MINIO_PROMETHEUS_AUTH_TYPE=public + - THANOS_MINIO_BUCKET_NAME=${THANOS_MINIO_BUCKET_NAME} + - MINIO_BROWSER_REDIRECT_URL=https://${BIRDHOUSE_FQDN_PUBLIC}/thanos-minio diff --git a/birdhouse/optional-components/thanos/minio-entrypoint b/birdhouse/optional-components/thanos/minio-entrypoint new file mode 100755 index 000000000..a02c3368f --- /dev/null +++ b/birdhouse/optional-components/thanos/minio-entrypoint @@ -0,0 +1,5 @@ +#!/bin/sh + +mkdir -p "/data/${THANOS_MINIO_BUCKET_NAME}" + +exec "$@"