Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metrics for ingest wal usage are not working #5547

Open
fredsig opened this issue Nov 14, 2024 · 4 comments
Open

Prometheus metrics for ingest wal usage are not working #5547

fredsig opened this issue Nov 14, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@fredsig
Copy link

fredsig commented Nov 14, 2024

Describe the bug

Both quickwit_ingest_wal_disk_used_bytes and quickwit_ingest_wal_memory_used_bytes are not working as expected. quickwit_ingest_wal_disk_used_bytes displays a constant value of 134217728 (max queue disk usage is set to 32GB and total size of disk is 250G). quickwit_ingest_wal_memory_used_bytes always report 0.

Steps to reproduce (if applicable)
I'm using default Prometheus scraping configuration provided by the helm chart. These are my ingest_api values:

  ingest_api:
    max_queue_memory_usage: 4GiB
    max_queue_disk_usage: 32GiB

Expected behavior
I expect both metrics to report WAL usage for both disk and memory.

Would also be great to have metrics to show max_queue_disk_usage and max_queue_mem_usage config setting.

Configuration:
Version: v0.8.1

node.yaml:

data_dir: /quickwit/qwdata
default_index_root_uri: s3://prod-<redacted>-quickwit/indexes
gossip_listen_port: 7282
grpc:
  max_message_size: 80 MiB
indexer:
  enable_otlp_endpoint: true
ingest_api:
  max_queue_disk_usage: 32GiB
  max_queue_memory_usage: 4GiB
listen_address: 0.0.0.0
metastore:
  postgres:
    acquire_connection_timeout: 30s
    idle_connection_timeout: 1h
    max_connection_lifetime: 1d
    max_connections: 50
    min_connections: 10
storage:
  s3:
    region: us-east-1
version: 0.8
@fredsig fredsig added the bug Something isn't working label Nov 14, 2024
@trinity-1686a
Copy link
Contributor

how are you ingesting data?
The wal is only used when ingesting from the api, not when pulling from something like kafka or kinesis. So if you don't use the ingest api (or are not currently ingesting anything), it's expected that the in-memory wal will be empty. The on disk wal also records queues it thinks exists, and always works in block of 128MiB, so it's never going to really be empty.

@fredsig
Copy link
Author

fredsig commented Nov 14, 2024

I'm ingesting using both the ingest API and OTLP (no Kafka or Kinesis). This is related to #5548

@fulmicoton
Copy link
Contributor

(I put the comment on a different issue by mistake)
I suspect the metric is only plugged in for ingest v2 and @fredsig is using ingest v1

@fredsig
Copy link
Author

fredsig commented Nov 18, 2024

Thanks @fulmicoton, in the meantime, I've created my own wal watcher to give me stats for the queue directory on ingest v1, I just do a du -h through a pod exec (ugly but works for now).

Screenshot 2024-11-18 at 09 12 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants