Prometheus metrics for ingest wal usage are not working #5547

fredsig · 2024-11-14T11:53:17Z

Describe the bug

Both quickwit_ingest_wal_disk_used_bytes and quickwit_ingest_wal_memory_used_bytes are not working as expected. quickwit_ingest_wal_disk_used_bytes displays a constant value of 134217728 (max queue disk usage is set to 32GB and total size of disk is 250G). quickwit_ingest_wal_memory_used_bytes always report 0.

Steps to reproduce (if applicable)
I'm using default Prometheus scraping configuration provided by the helm chart. These are my ingest_api values:

  ingest_api:
    max_queue_memory_usage: 4GiB
    max_queue_disk_usage: 32GiB

Expected behavior
I expect both metrics to report WAL usage for both disk and memory.

Would also be great to have metrics to show max_queue_disk_usage and max_queue_mem_usage config setting.

Configuration:
Version: v0.8.1

node.yaml:

data_dir: /quickwit/qwdata
default_index_root_uri: s3://prod-<redacted>-quickwit/indexes
gossip_listen_port: 7282
grpc:
  max_message_size: 80 MiB
indexer:
  enable_otlp_endpoint: true
ingest_api:
  max_queue_disk_usage: 32GiB
  max_queue_memory_usage: 4GiB
listen_address: 0.0.0.0
metastore:
  postgres:
    acquire_connection_timeout: 30s
    idle_connection_timeout: 1h
    max_connection_lifetime: 1d
    max_connections: 50
    min_connections: 10
storage:
  s3:
    region: us-east-1
version: 0.8

The text was updated successfully, but these errors were encountered:

trinity-1686a · 2024-11-14T16:02:35Z

how are you ingesting data?
The wal is only used when ingesting from the api, not when pulling from something like kafka or kinesis. So if you don't use the ingest api (or are not currently ingesting anything), it's expected that the in-memory wal will be empty. The on disk wal also records queues it thinks exists, and always works in block of 128MiB, so it's never going to really be empty.

fredsig · 2024-11-14T18:09:16Z

I'm ingesting using both the ingest API and OTLP (no Kafka or Kinesis). This is related to #5548

fulmicoton · 2024-11-18T08:27:26Z

(I put the comment on a different issue by mistake)
I suspect the metric is only plugged in for ingest v2 and @fredsig is using ingest v1

fredsig · 2024-11-18T09:14:34Z

Thanks @fulmicoton, in the meantime, I've created my own wal watcher to give me stats for the queue directory on ingest v1, I just do a du -h through a pod exec (ugly but works for now).

fredsig added the bug Something isn't working label Nov 14, 2024

fredsig mentioned this issue Nov 14, 2024

Ingestion stops after getting the error: "ingestion rejected due to disk limit" #5548

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus metrics for ingest wal usage are not working #5547

Prometheus metrics for ingest wal usage are not working #5547

fredsig commented Nov 14, 2024

trinity-1686a commented Nov 14, 2024

fredsig commented Nov 14, 2024

fulmicoton commented Nov 18, 2024

fredsig commented Nov 18, 2024

Prometheus metrics for ingest wal usage are not working #5547

Prometheus metrics for ingest wal usage are not working #5547

Comments

fredsig commented Nov 14, 2024

trinity-1686a commented Nov 14, 2024

fredsig commented Nov 14, 2024

fulmicoton commented Nov 18, 2024

fredsig commented Nov 18, 2024