You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We’re experiencing periodic spikes in latency from the PubSubHealthIndicator in Spring Cloud GCP whenever there’s a large backlog in Google Cloud Pub/Sub. Although the backlog itself isn’t being pulled by the health check, the overall Pub/Sub system (or network) slows down enough that the “quick pull” call hangs or times out. This marks our service as DOWN in /actuator/health, which can trigger restarts in Kubernetes, creating a negative feedback loop.
Logs:
Health contributor org.springframework.cloud.gcp.autoconfigure.pubsub.health.PubSubHealthIndicator (pubSub) took 10936ms to respond
Health contributor org.springframework.cloud.gcp.autoconfigure.pubsub.health.PubSubHealthIndicator (pubSub) took 89844ms to respond
Im not sure if updating the health-check settings (e.g., timeouts) would resolve this issue, or if we should exclude the PubSubHealthIndicator from the group of core /actuator/health.
Since Pub/Sub is designed to handle backlogs to protect our service from being overwhelmed during high traffic periods, relying on a “quick pull” to measure health may not be best practice in production.
btw, I’m not entirely sure why the “quick pull” call either hangs or times out when other topics start having backlogs.
Any guidance or recommended patterns on handling these scenarios while still monitoring Pub/Sub health would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Describe the bug
We’re experiencing periodic spikes in latency from the
PubSubHealthIndicator
in Spring Cloud GCP whenever there’s a large backlog in Google Cloud Pub/Sub. Although the backlog itself isn’t being pulled by the health check, the overall Pub/Sub system (or network) slows down enough that the “quick pull” call hangs or times out. This marks our service as DOWN in/actuator/health
, which can trigger restarts in Kubernetes, creating a negative feedback loop.Logs:
Im not sure if updating the health-check settings (e.g., timeouts) would resolve this issue, or if we should exclude the
PubSubHealthIndicator
from the group of core/actuator/health
.Since Pub/Sub is designed to handle backlogs to protect our service from being overwhelmed during high traffic periods, relying on a “quick pull” to measure health may not be best practice in production.
btw, I’m not entirely sure why the “quick pull” call either hangs or times out when other topics start having backlogs.
Any guidance or recommended patterns on handling these scenarios while still monitoring Pub/Sub health would be greatly appreciated.
The text was updated successfully, but these errors were encountered: