-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU stuck at 100% upon network drops to Azure #850
Comments
Two questions here to clarify the specific code-segments that are involved:
|
Since ~2 hours, this is broken on |
@swapneils Apologies for the delayed response.
No, we completely dropped the aws-for-fluent-bit image and are purely using the standard fluent-bit 3.1.4 image.
We are using the Azure Blob plugin |
@guidoiaquinti Are you saying you tested this case ~2 hours ago, or that this case was previously working for you and is now failing with the In the latter case, is the |
Maybe this is completely unrelated, and to be honest, I'm not sure what has changed (I'm currently on mobile with limited connectivity), but all our deployments started failing approximately two hours ago with the following errors:
The timeframe aligns with the update of the |
This seems unrelated seeing as my issue is not exclusively on the new |
Thanks Bradley (and sorry for this additional ping :) ) @guidoiaquinti After making the new Issue, could you pin to The first point is because we plan to update our stable image later this week unless we see issues in stability testing (which I don't expect). The account ID is so I can share test aws-for-fluent-bit images with you to facilitate investigation. |
Describe the question/issue
Once we enabled aws-for-fluent-bit image with our own fluent bit configuration with new Azure Blob outputs at scale, we see these errors on occasion
After enough of these we see the container reach a point of no return where CPU spikes to 100% and stays there until the ALB finally marks the task as unhealthy.
We had to move off of the aws-for-fluent-bit image and onto the latest v3.1.4 of fluent bit.
Configuration
Fluent Bit Log Output
We have enabled debug logs and nothing in the logs indicate that the CPU should be having issues.
Fluent Bit Version Info
amazon/aws-for-fluent-bit:2.32.2
which uses v1.9.10 of fluent bit under the hood.
Cluster Details
We're running ECS Fargate w/ sidecar deployment of aws-for-fluent-bit.
(This repros locally btw)
Application Details
I was able to repro this locally with the following throughput:
Steps to reproduce issue
Related Issues
No related issues but a suspect fix is in fluent/fluent-bit#5918
My suggestion would be to consider upgrading to the latest fluent bit version.
The text was updated successfully, but these errors were encountered: