Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

oguzhansrky · 2024-10-24T14:44:42Z

Report

E1024 14:34:07.470495       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E1024 14:34:07.471060       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.471715       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.472363       1 timeout.go:141] post-timeout activity - time-elapsed: 8.018879ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.472951       1 timeout.go:141] post-timeout activity - time-elapsed: 4.440412ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.473569       1 timeout.go:141] post-timeout activity - time-elapsed: 9.043293ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
W1024 14:34:12.113708       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "keda-operator.keda.svc.cluster.local:9666",
  "ServerName": "keda-operator.keda.svc.cluster.local:9666",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.43.71.178:9666: connect: connection refused"
E1024 14:34:18.480827       1 wrap.go:53] timeout or abort while handling: method=GET URI="/apis/external.metrics.k8s.io/v1beta1" audit-ID="9a09bcaf-3a0e-41fc-8642-aadb2298f2c3"
E1024 14:34:18.480968       1 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
E1024 14:34:18.482389       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout

Expected Behavior

.

Actual Behavior

.

Steps to Reproduce the Problem

Logs from KEDA operator

2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162
2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162

KEDA Version

2.11.2

Kubernetes Version

1.28

Platform

Other

Scaler Details

RabbitMQ

Anything else?

I have been using Keda for a long time. Recently, the system has started to fail to respond. There are nearly 3000 scaledobjects in the system, maybe this is the cause of the problem, but I cannot find the place where it takes the timeout and I cannot intervene in Keda.

JorTurFer · 2024-11-03T17:18:54Z

Hello
There have been a lot of performance improvements since KEDA v2.11.2, I'd suggest upgrading it to a recent version and maybe the issues in the metrics servers disappears.

About the RabbitMQ issue, does the queue exist in the vhost? the error reports 404 searching the queue.

About the load (~3k SO), it's quite normal and KEDA should be able to handle it just configuring the kube-client parameters to support more request in its own local rate limiter -> https://keda.sh/docs/2.15/operate/cluster/#kubernetes-client-parameters
The improvements I meant above also includes changesin how KEDA handles the status of the SO, reducing the requests to the k8s api server

oguzhansrky added the bug Something isn't working label Oct 24, 2024

keda-automation added this to Roadmap - KEDA Core Oct 24, 2024

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

oguzhansrky commented Oct 24, 2024 •

edited by JorTurFer

Loading

JorTurFer commented Nov 3, 2024

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

Comments

oguzhansrky commented Oct 24, 2024 • edited by JorTurFer Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Nov 3, 2024

oguzhansrky commented Oct 24, 2024 •

edited by JorTurFer

Loading