Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added OTLP Attributes and testing #10

Merged
merged 9 commits into from
Mar 21, 2022
Merged

Conversation

leggebroten
Copy link
Contributor

@leggebroten leggebroten commented Feb 24, 2022

Since CQRS is considered a messaging system, changed emitted attributes to follow that semantic and added tests.

Assuming the testing of tracing events is adequate, all that's needed is to assert that we've attached correctly to the telemetry callbacks and that the spans are correctly constructed in the three cases: success, error, exception

closes: #1 , #2 , #3 , #4

* Changed mix dependency for Commanded to get unreleased merge DavyDog made for event store
* Changed callback module to capture correct set of events
* Modified attributes to match messaging semantics like others w/i this library
@@ -34,22 +34,28 @@ defmodule OpentelemetryCommanded.EventHandler do

def handle_start(_event, _measurements, meta, _) do
event = meta.recorded_event

# TODO: do we want to trap this with helpful message about needing Middleware?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davydog187 and/or @bryannaegele
If the user hasn't added the middleware to their app this will cause an exception and kills this process.

Should I trap here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the design philosophy behind Otel is to fail gracefully, i.e. it shouldn't impact the uptime of the system. So catching/rescuing any issues here and downgrading the system with a helpful error or warning log seems like a good move

Comment on lines 64 to 65
# TODO does this need to change based on the event? Or maybe `internal`?
"messaging.operation": "receive",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryannaegele
For spans created by a messaging system all have operation types from that pool (send, receive, subscribe)?

This doesn't seem to fit well. It seems an "internal" span w/i the CQRS messaging system.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where subscribe comes from. Messaging operations are send, receive, process. Span type of internal is only appropriate where a function is calling another function. If there's a message or operation crossing a system boundary, e.g. to the database, then it's not internal.

CQRS is a bit of a hybrid between Messaging and RPC, so I'm not sure of all the specifics of how it should be treated. You can always solicit opinions in the #opentelemetry channel of the CNCF slack.

"messaging.operation": "receive",
"messaging.message_id": event.causation_id,
"messaging.conversation_id": event.correlation_id,
# "messaging.destination": meta.handler_module,
Copy link
Contributor Author

@leggebroten leggebroten Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davydog187 and/or @bryannaegele
The above question about the span kind being "internal" is supported by the fact there is no known source or destination for this span.

Comment on lines +56 to +59
# TODO add back
# consistency: meta.consistency,
# TODO add this back into commanded
# "event.last_seen": meta.last_seen_event
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left @davydog187 's comments.

Copy link
Contributor Author

@leggebroten leggebroten Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryannaegele
After reading a ton, I think that I should remove the messaging. prefix from the Commanded attributes.
This would also allow easier query building too I would think
Thoughts?

e.g.

      "messaging.commanded.application": meta.application,
      "messaging.commanded.event": recorded_event.event_type,
      "messaging.commanded.event_id": recorded_event.event_id,
      "messaging.commanded.event_number": recorded_event.event_number,
      "messaging.commanded.process_uuid": meta.process_uuid,
      "messaging.commanded.stream_id": recorded_event.stream_id,
      "messaging.commanded.stream_version": recorded_event.stream_version,
      "messaging.commanded.handler_name": meta.process_manager_name

should be

      "commanded.application": meta.application,
      "commanded.event": recorded_event.event_type,
      "commanded.event_id": recorded_event.event_id,
      "commanded.event_number": recorded_event.event_number,
      "commanded.process_uuid": meta.process_uuid,
      "commanded.stream_id": recorded_event.stream_id,
      "commanded.stream_version": recorded_event.stream_version,
      "commanded.handler_name": meta.process_manager_name

application: meta.application,
"causation.id": context.causation_id,
"correlation.id": context.correlation_id,
"aggregate.function": context.function,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want these two fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate.function": context.function is still there, just massaged to fit naming for messaging attributes:

      "messaging.commanded.function": context.function

I can restore context.lifespan but while testing, didn't see it's value as an attribute since it doesn't effect the processing of the current span. Thoughts?

… boundaries.

Fixed and added tests.

Safely handle case where the user hasn't added middleware `middleware(OpentelemetryCommanded.Middleware)` to the router

attributes = [
"messaging.system": "commanded",
"messaging.protocol": "cqrs",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cqrs isn't a transport protocol, it's a pattern. This attribute should probably not be included.

Comment on lines 64 to 65
# TODO does this need to change based on the event? Or maybe `internal`?
"messaging.operation": "receive",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where subscribe comes from. Messaging operations are send, receive, process. Span type of internal is only appropriate where a function is calling another function. If there's a message or operation crossing a system boundary, e.g. to the database, then it's not internal.

CQRS is a bit of a hybrid between Messaging and RPC, so I'm not sure of all the specifics of how it should be treated. You can always solicit opinions in the #opentelemetry channel of the CNCF slack.

Copy link
Collaborator

@davydog187 davydog187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the work here, @leggebroten

@leggebroten leggebroten merged commit 4118aca into main Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests!
3 participants