You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
It is not a bug report but an attempt to warn on harm from code of logrus logger hook. https://github.com/elastic/apm-agent-go/blob/main/module/apmlogrus/hook.go In our case, this code produce a 5 second performance penalty on API endpoints which produces errors (like http 404 error). Moreover, since this hook was a part of a custom legacy library, it was hard to find a reason of such behavior. But switching off a tracer integration removed the lag, so an Apm Agent and tracing was a first suspect.
To Reproduce
Steps to reproduce the behavior:
Switch on the hook.
Make a http middleware or grpc interceptor which will create a critical log entry from an error of a main server function. I know, this is silly decision, but please take into account that I'm talking about the legacy code and library.
Use a broken link to an APM hostname or a loaded ELK APM tracer on a loaded production service.
Requests with errors would have a 5 seconds performance penalty.
Expected behavior
I think the code of this hook should be deprecated and abandoned in the future. My points for this decision:
the hook intended to work synchronously, so it's usage in the servers middlewares and interceptors will slow down responses (in case of busy APM Server)
since the unpredictable calls of this hook, the flushing load to the APM Server is also upredictable. It could nullify Elastic team attempts to optimise the a communication between the APM Server and the Go Agent.
5 seconds timeout context is too long
this hook breaks an isolation of different infrastructure subsystem of the service: logger for some reason decides to flush calls to an APM server. It's quite an unexpectable behavior.
// In its default configuration, logrus will exit the process
// following a fatal log message, so we flush the tracer.
Given that the default behaviour for fatal level logging is for logrus to exit the process, it seems like a reasonable behaviour. You can override the timeout by setting Hook.FatalFlushTimeout:
I admit that we've faced performance issue because of some silly decisions in an old library and misconfigured logger. But I also want to point out that it's a bad system design to rule one infra subsystem from another with some hookish code. It's not obvious for people who would debug same problems that the root of such a lag is a logger but not the tracer or APM agent itself.
Describe the bug
It is not a bug report but an attempt to warn on harm from code of logrus logger hook. https://github.com/elastic/apm-agent-go/blob/main/module/apmlogrus/hook.go In our case, this code produce a 5 second performance penalty on API endpoints which produces errors (like http 404 error). Moreover, since this hook was a part of a custom legacy library, it was hard to find a reason of such behavior. But switching off a tracer integration removed the lag, so an Apm Agent and tracing was a first suspect.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I think the code of this hook should be deprecated and abandoned in the future. My points for this decision:
The text was updated successfully, but these errors were encountered: