Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose more data in middleware to improve X-Ray and OTel traces #4902

Closed
2 tasks
bilalq opened this issue Jul 1, 2023 · 7 comments
Closed
2 tasks

Expose more data in middleware to improve X-Ray and OTel traces #4902

bilalq opened this issue Jul 1, 2023 · 7 comments
Assignees
Labels
feature-request New feature or enhancement. May require GitHub community feedback.

Comments

@bilalq
Copy link

bilalq commented Jul 1, 2023

Describe the feature

The X-Ray SDK generates much more useful traces with AWS SDK v2 than it does with v3 (e.g., table/bucket specific nodes rather than generic "DynamoDB" and "S3" nodes). The X-Ray team has claimed that this is due to the v3 SDK exposing less data in its middleware.

mattfysh commented on Nov 29, 2022:

In the recent release of the Node.js v18 runtime on Lambda, the embedded SDK has been upgraded to v3: https://aws.amazon.com/blogs/compute/node-js-18-x-runtime-now-available-in-aws-lambda/

The current state of v3 support in aws-xray-sdk-node appears to be rather patchy. e.g.

  1. Much of the command data no longer appears in the trace
  2. Wrapping most clients with captureAWSv3Client throws TypeScript errors. In fact, the only client I've found that works without a TS error is AWSXRay.captureAWSv3Client(new S3Client({}))

I was wondering if the team were aware of the shift to v3 in the latest lambda runtime environmnent, and whether this may translate to improved support for v3 by the Xray SDK? Thanks for reading!

willarmiros commented on Dec 19, 2022:

Hello,

Thanks for raising this issue, I'll try to address both issues:

Much of the command data no longer appears in the trace

This is unfortunately a limitation of the AWS SDK v3 itself. The V3 middleware exposes much less data than v2 did, and as such our instrumentation can only read the basic attributes you see today. This limitation exists in ADOT as well, and I would suggest opening an issue against the aws-sdk-js-v3 repo to suggest exposing this info more readily.

Wrapping most clients with captureAWSv3Client throws TypeScript errors

This is a known issue and aws/aws-xray-sdk-node#449 was an attempt to fix it, but unfortunately there were issues with the fix and we were not able to prioritize it since. Once we are able to take another look we can track it here.

Could the SDK and X-Ray/OTel teams collaborate here to improve tracing? Now that SDK v2 is being deprecated, this has essentially become a service degradation for customers.

@willarmiros, Could you maybe chime in here on what specifically is missing from the data exposed in the middleware?

Use Case

Traces are noticeably worse when using AWS SDK v3 over v2.

Proposed Solution

Expose more data in the middleware so that tracing libraries can be at parity with SDK v2.

Other Information

It's also notable that capturing v3 clients fails typechecks in TypeScript. The X-Ray team has suggested casting to any as a workaround, but that is unacceptable and incredibly error-prone. I don't know if interface compatibility changes are needed or if this is just a matter of using a peerDep instead of a normal dep. Maybe an adjustment of version range? Whatever the case, the current experience is troublesome.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

N/A

Environment details (OS name and version, etc.)

N/A

@bilalq bilalq added feature-request New feature or enhancement. May require GitHub community feedback. needs-triage This issue or PR still needs to be triaged. labels Jul 1, 2023
@RanVaknin
Copy link
Contributor

RanVaknin commented Jul 3, 2023

Hi @bilalq ,

JS SDK v3 exposes the same exact parameters that v2 does, the only difference is that those parameters are accessible via the middleware stack, which will require the Xray team to make adjustments on how the Xray clients accesses the data.

Im sorry to ping pong you around back to the Xray team, but I did have a conversation with their PM @willarmiros (who now is no longer at Amazon) on March 2023, about this issue and had proven that all the relevant info is available via middleware stack.

Since there's no action needed from the JS SDK team so Im going to close this. I encourage you to open the same feature request in their repo, so it can get traction and prioritized.

All the best,
Ran~

@RanVaknin RanVaknin self-assigned this Jul 3, 2023
@RanVaknin RanVaknin removed the needs-triage This issue or PR still needs to be triaged. label Jul 3, 2023
@RanVaknin RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2023
@bilalq
Copy link
Author

bilalq commented Jul 4, 2023

Thanks for clarifying that, @RanVaknin. An issue for this has been open for over half a year now. Could you help surface visibility of this to stakeholders on the X-Ray team? Without proper tracing support, it's not safe for us to migrate to SDK v3, despite deprecation of v2.

@RanVaknin
Copy link
Contributor

Hi @bilalq ,

You can use OTEL instead of xray for instrumenting v3. It is now supported there.

Thanks,
Ran~

@carolabadeer
Copy link

carolabadeer commented Jul 6, 2023

Hi @bilalq, v2 and v3 metadata is captured in opentelemetry JS (additional attributes will be part of the next opentelemetry release). We recommend using the OpenTelemetry JS SDK for instrumenting AWS SDK v3 clients as it provides a consistent experience across v2 and v3

@bilalq
Copy link
Author

bilalq commented Jul 7, 2023

Hi @RanVaknin and @carolabadeer,

Unfortunately, the OpenTelemetry JS SDK is not suitable in Lambda for most production use-cases. It comes with a massive performance and latency penalty. It's a complete non-starter to add tracing through it when the overhead is so large.

Related issues showing that this is still a problem today:

aws-observability/aws-otel-lambda#228
open-telemetry/opentelemetry-lambda#263

This is even highlighted in the official service docs for X-Ray:

Note
AWS Distro for OpenTelemetry offers a simpler getting started experience for instrumenting your Lambda functions. However, due to the flexibility OpenTelemetry offers, your Lambda function will require additional memory and invocations may experience cold start latency increases, which can lead to additional charges. If you're optimizing for low-latency and do not require OpenTelemetry's advanced capabilities such as dynamically configurable backend destinations, you may want to use the AWS X-Ray SDK to instrument your application.

Source: https://docs.aws.amazon.com/xray/latest/devguide/xray-instrumenting-your-app.html#xray-instrumenting-choosing

@bilalq
Copy link
Author

bilalq commented Jul 11, 2023

Hey @RanVaknin and @carolabadeer. Just pinging on this. The current situation is that existing tools are outdated and deprecated and the recommended replacements have show-stopping bugs and usability regressions.

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request New feature or enhancement. May require GitHub community feedback.
Projects
None yet
Development

No branches or pull requests

3 participants