Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [REQUEST] - Collect Cloud Trace data to troubleshoot latency issues and timeouts #442

Open
1 task done
lvaylet opened this issue Apr 4, 2024 · 4 comments
Open
1 task done
Assignees
Labels
question Further information is requested

Comments

@lvaylet
Copy link
Collaborator

lvaylet commented Apr 4, 2024

Summary

Cloud Trace can collect spans exported from OpenTelemetry. See https://cloud.google.com/trace/docs/setup/python-ot.

Basic Example

These traces and spans will help troubleshoot latency and timeouts issues, for example the 504 errors described in #441.

Screenshots

No response

Drawbacks

Collecting traces might have a little impact on latency, but it is probably negligible.

Unresolved questions

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lvaylet lvaylet added the question Further information is requested label Apr 4, 2024
@lvaylet lvaylet self-assigned this Apr 4, 2024
@lvaylet
Copy link
Collaborator Author

lvaylet commented Apr 4, 2024

The Google documentation mentions that "OpenTelemetry's Flask instrumentation is designed to simplify capturing trace content related to HTTP requests". That should cover most (all?) of the user journeys when using the API.

Reference: https://cloud.google.com/trace/docs/setup/python-ot#sample_flask_application

@lvaylet
Copy link
Collaborator Author

lvaylet commented Apr 4, 2024

The sample Flask application collects traces and sends them directly to Cloud Trace by default. Would it make sense to instrument the application in a more agnostic way, and use a Cloud Run sidecar to collect the traces and decide where to send them? That would make the instrumentation more portable and independent of Google Cloud. The SLO Generator should be able to run anywhere, and OpenTelemetry is platform-agnostic by nature.

See https://github.com/GoogleCloudPlatform/opentelemetry-cloud-run for an example of a Cloud Run sidecar that collects traces exposed by OpenTelemetry and sends them to Cloud Trace.

@lvaylet
Copy link
Collaborator Author

lvaylet commented Apr 4, 2024

Steps:

  1. Implement and test https://opentelemetry.io/docs/languages/python/getting-started/ locally
  2. Deploy to Cloud Run, in single container mode
  3. Add a Cloud Run sidecar to collect the traces in Cloud Trace, as demonstrated in https://github.com/GoogleCloudPlatform/opentelemetry-cloud-run

@lvaylet
Copy link
Collaborator Author

lvaylet commented Apr 6, 2024

I managed to get a basic Flask API instrumented with OpenTelemetry, hosted on Cloud Run. The traces are exported to OLTP, captured by an OpenTelemetry Collector running in a sidecar and forwarded to Cloud Trace. That makes for a nice, simple, agnostic approach so any user can run its own OpenTelemetry Collector and analyze the traces in their preferred piece of software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant