Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Agent falls back to TCP log submission even if the site does not support it #31014

Open
ollien opened this issue Nov 12, 2024 · 0 comments

Comments

@ollien
Copy link

ollien commented Nov 12, 2024

Agent Environment

Agent 7.55.1 - Commit: 8ec9dff - Serialization version: v5.0.119 - Go version: go1.21.11

Describe what happened:

In production, we observed a number of our agent instances failing to resolve agent-intake.logs.us5.datadoghq.com, which does not exist, since US5 does not support TCP log submission. This caused our logs to not be ingested.

2024-11-12 14:32:20 UTC | CORE | WARN | (pkg/logs/client/tcp/connection_manager.go:108 in NewConnection) | dial tcp: lookup agent-intake.logs.us5.datadoghq.com: no such host

After some investigation, it seems that our agents had failed the HTTP health check at startup, and fell back to TCP

2024-11-12 14:32:20 UTC | CORE | WARN | (pkg/logs/client/http/destination.go:442 in CheckConnectivity) | HTTP connectivity failure: Post "https://agent-http-intake.logs.us5.datadoghq.com/api/v2/logs": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024-11-12 14:32:20 UTC | CORE | WARN | (comp/logs/agent/config/config.go:120 in BuildEndpointsWithConfig) | You are currently sending Logs to Datadog through TCP (either because logs_config.force_use_tcp or logs_config.socks5_proxy_address is set or the HTTP connectivity test has failed) To benefit from increased reliability and better network performances, we strongly encourage switching over to compressed HTTPS which is now the default protocol.

Describe what you expected:

I would not expect the agent to fall back to a TCP endpoint that does not exist. If US5 does not support TCP, then the agent should act as if force_http is enabled, or fail loudly in some other way.

Steps to reproduce the issue:
I don't have explicit steps to do this, but if you can make the HTTP probe fail in some way (perhaps an iptables rule to drop it so it times out), you can get into this state.

Additional environment details (Operating System, Cloud provider, etc): The container image used is gcr.io/datadoghq/agent:7.55.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant