[spring-cloud-gcp-autoconfigure] PubSubHealthIndicator Fails Under Large GCP Pub/Sub Backlogs, Triggering Negative Feedback Loop #3438

tangcent · 2025-01-03T08:10:58Z

Describe the bug

We’re experiencing periodic spikes in latency from the PubSubHealthIndicator in Spring Cloud GCP whenever there’s a large backlog in Google Cloud Pub/Sub. Although the backlog itself isn’t being pulled by the health check, the overall Pub/Sub system (or network) slows down enough that the “quick pull” call hangs or times out. This marks our service as DOWN in /actuator/health, which can trigger restarts in Kubernetes, creating a negative feedback loop.

Logs:

Health contributor org.springframework.cloud.gcp.autoconfigure.pubsub.health.PubSubHealthIndicator (pubSub) took 10936ms to respond
Health contributor org.springframework.cloud.gcp.autoconfigure.pubsub.health.PubSubHealthIndicator (pubSub) took 89844ms to respond

Im not sure if updating the health-check settings (e.g., timeouts) would resolve this issue, or if we should exclude the PubSubHealthIndicator from the group of core /actuator/health.
Since Pub/Sub is designed to handle backlogs to protect our service from being overwhelmed during high traffic periods, relying on a “quick pull” to measure health may not be best practice in production.
btw, I’m not entirely sure why the “quick pull” call either hangs or times out when other topics start having backlogs.

Any guidance or recommended patterns on handling these scenarios while still monitoring Pub/Sub health would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spring-cloud-gcp-autoconfigure] PubSubHealthIndicator Fails Under Large GCP Pub/Sub Backlogs, Triggering Negative Feedback Loop #3438

[spring-cloud-gcp-autoconfigure] PubSubHealthIndicator Fails Under Large GCP Pub/Sub Backlogs, Triggering Negative Feedback Loop #3438

tangcent commented Jan 3, 2025

[spring-cloud-gcp-autoconfigure] PubSubHealthIndicator Fails Under Large GCP Pub/Sub Backlogs, Triggering Negative Feedback Loop #3438

[spring-cloud-gcp-autoconfigure] PubSubHealthIndicator Fails Under Large GCP Pub/Sub Backlogs, Triggering Negative Feedback Loop #3438

Comments

tangcent commented Jan 3, 2025