Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

Open
oguzhansrky opened this issue Oct 24, 2024 · 1 comment
Open

Keda RabbitMQ Autoscaler Not Working and K8S API Server Timeout #6268

oguzhansrky opened this issue Oct 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@oguzhansrky
Copy link

oguzhansrky commented Oct 24, 2024

Report

E1024 14:34:07.470495       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E1024 14:34:07.471060       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.471715       1 writers.go:131] apiserver was unable to write a fallback JSON response: http: Handler timeout
E1024 14:34:07.472363       1 timeout.go:141] post-timeout activity - time-elapsed: 8.018879ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.472951       1 timeout.go:141] post-timeout activity - time-elapsed: 4.440412ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
E1024 14:34:07.473569       1 timeout.go:141] post-timeout activity - time-elapsed: 9.043293ms, GET "/apis/external.metrics.k8s.io/v1beta1" result: <nil>
W1024 14:34:12.113708       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {
  "Addr": "keda-operator.keda.svc.cluster.local:9666",
  "ServerName": "keda-operator.keda.svc.cluster.local:9666",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 10.43.71.178:9666: connect: connection refused"
E1024 14:34:18.480827       1 wrap.go:53] timeout or abort while handling: method=GET URI="/apis/external.metrics.k8s.io/v1beta1" audit-ID="9a09bcaf-3a0e-41fc-8642-aadb2298f2c3"
E1024 14:34:18.480968       1 writers.go:118] apiserver was unable to write a JSON response: http: Handler timeout
E1024 14:34:18.482389       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout

Expected Behavior

.

Actual Behavior

.

Steps to Reproduce the Problem

Logs from KEDA operator

2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162
2024-10-24T14:39:51Z	ERROR	scalers_cache	error getting scale decision	{"scaledobject.Name": "bug", "scaledObject.Namespace": "bug", "scaleTarget.Name": "bug", "error": "error inspecting rabbitMQ: Exception (404) Reason: \"NOT_FOUND - no queue 'bug' in vhost '/'\""}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetScaledObjectState
	/workspace/pkg/scaling/cache/scalers_cache.go:155
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:360
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:162

KEDA Version

2.11.2

Kubernetes Version

1.28

Platform

Other

Scaler Details

RabbitMQ

Anything else?

I have been using Keda for a long time. Recently, the system has started to fail to respond. There are nearly 3000 scaledobjects in the system, maybe this is the cause of the problem, but I cannot find the place where it takes the timeout and I cannot intervene in Keda.

@oguzhansrky oguzhansrky added the bug Something isn't working label Oct 24, 2024
@JorTurFer
Copy link
Member

Hello
There have been a lot of performance improvements since KEDA v2.11.2, I'd suggest upgrading it to a recent version and maybe the issues in the metrics servers disappears.

About the RabbitMQ issue, does the queue exist in the vhost? the error reports 404 searching the queue.

About the load (~3k SO), it's quite normal and KEDA should be able to handle it just configuring the kube-client parameters to support more request in its own local rate limiter -> https://keda.sh/docs/2.15/operate/cluster/#kubernetes-client-parameters
The improvements I meant above also includes changesin how KEDA handles the status of the SO, reducing the requests to the k8s api server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

2 participants