Use find_package instead of adding include directories 1 by 1

open-telemetry · Apr 19, 2024 · 3fd5934 · 3fd5934
1 parent 8eb43eb
commit 3fd5934
Show file tree

Hide file tree

Showing 30 changed files with 735 additions and 183 deletions.
diff --git a/.github/component-label-map.yml b/.github/component-label-map.yml
@@ -0,0 +1,76 @@
+blog:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/blog/**
+sig:cpp:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/cpp/**
+sig:erlang:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/erlang/**
+sig:go:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/go/**
+sig:java:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/java/**
+sig:js:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/js/**
+sig:dotnet:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/net/**
+sig:php:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/php/**
+sig:python:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/python/**
+sig:ruby:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/ruby/**
+sig:rust:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/rust/**
+sig:swift:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/languages/swift/**
+sig:operator:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/kubernetes/operator/**
+sig:helm:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/kubernetes/helm/**
+sig:security:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/security/**
+sig:demo:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/demo/**
+sig:collector:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/collector/**
+sig:enduser:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/community/end-user/**
+sig:spec:
+  - changed-files:
+      - any-glob-to-any-file:
+          - content/en/docs/specs/**
diff --git a/.github/workflows/label-prs.yml b/.github/workflows/label-prs.yml
@@ -0,0 +1,16 @@
+name: 'Label PR'
+on:
+  - pull_request_target
+
+jobs:
+  labeler:
+    name: 'Add component labels'
+    permissions:
+      contents: read
+      pull-requests: write
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/labeler@v5
+        with:
+          configuration-path: '.github/component-label-map.yml'
diff --git a/.htmltest.yml b/.htmltest.yml
@@ -35,6 +35,8 @@ IgnoreURLs: # list of regexs of paths or URLs to be ignored
   - ^https://x.com
   # OTel Google calendar - curl returns 200, but the link checker gets a 401:
   - ^https://calendar.google.com/calendar/embed\?src=google.com_b79e3e90j7bbsa2n2p5an5lf60%40group.calendar.google.com
+  # YouTube playlists sometimes give a 404, although they give a 200 when accessed via browser:
+  - ^https://www.youtube.com/playlist\?list=
 
   # Ignore Docsy-generated GitHub links for now
   - ^https?://github\.com/.*?/.*?/(new|edit)/ # view-page, edit-source etc

diff --git a/content/en/blog/2024/otel-errors/OTel-spans.png b/content/en/blog/2024/otel-errors/OTel-spans.png
diff --git a/content/en/blog/2024/otel-errors/index.md b/content/en/blog/2024/otel-errors/index.md
@@ -0,0 +1,293 @@
+---
+title: Dude, where's my error? How OpenTelemetry records errors
+linkTitle: Understanding OTel Errors
+date: 2024-04-19
+author: >-
+  [Reese Lee](https://github.com/reese-lee) (New Relic),  [Adriana
+  Villela](https://github.com/avillela) (ServiceNow)
+cSpell:ignore: Dalle
+canonical_url: https://newrelic.com/blog/how-to-relic/dude-wheres-my-error
+---
+
+![A confused penguin trying to learn about errors and exceptions. Image generated with AI using Dalle3 via Bing Copilot](penguin-chalkboard.jpg)
+
+Depending on the language you’re used to developing in, you may have certain
+ideas about what an error is, as well as what constitutes an exception and how
+it should be handled. For example, Go doesn't have exceptions, partly to
+discourage programmers from labeling too many ordinary errors as exceptional. On
+the other hand, languages such as Java and Python provide built-in support for
+throwing and catching exceptions.
+
+When different languages disagree about what errors or exceptions are and how to
+handle them, what do you use when you need standardized telemetry and error
+reporting across microservices written in those languages? OpenTelemetry is the
+tool with which we'll address the following, and more:
+
+- How an error is visualized in a backend might not be where you think it’ll be,
+  or how you expect it to look.
+- How span kind affects error reporting.
+- Errors reported by spans versus logs.
+
+## Errors versus exceptions
+
+Before we get into how OTel deals with errors and exceptions, let’s establish
+what they are, and how they differ from each other. While there are variations
+on the definitions of these terms, we’ve landed on the following ones, which
+we’ll be using in this article. Note that this is **not** official OTel
+language; they are general industry definitions:
+
+An **error** is an unexpected issue in a program that hinders its execution.
+Examples include syntax errors, such as a missing semicolon or incorrect
+indentation, and runtime errors, resulting from errors in logic.
+
+An **exception** is a type of runtime error that disrupts the normal flow of a
+program. Examples include dividing by zero or accessing an invalid memory
+address.
+
+Some languages, such as Python and JavaScript, treat errors and exceptions as
+synonyms; others, such as PHP and Java, do not. Understanding the distinction
+between errors and exceptions is crucial for effective error handling, because
+it empowers you to adopt more nuanced strategies for handling and recovering
+from failures in your applications.
+
+## Handling errors in OTel
+
+So how does OTel deal with all these conceptual differences across languages?
+This is where the [specification](/docs/specs/otel/) (or “spec” for short) comes
+in. The spec provides a blueprint for developers working on various parts of the
+project, and standardizes implementation across all languages.
+
+Since language APIs and SDKs are implementations of the spec, there are general
+rules against implementing anything that isn’t covered in the spec. This
+provides a guiding principle to help organize contributions to the project. In
+practice, there are a few exceptions; for example, a language might prototype a
+new feature as part of adding it to the spec, but the feature may be published
+(usually as alpha or experimental) before the corresponding language is added.
+
+Another exception is when a language might decide to diverge from the spec.
+Although it is generally not advised, sometimes there are strong
+language-specific reasons to do something different. In this way, the spec
+allows for some flexibility for each language to implement features as
+idiomatically as possible. For example, most languages have implemented
+`RecordException` (for example,
+[Python](https://opentelemetry-python.readthedocs.io/en/latest/_modules/opentelemetry/sdk/trace.html#Span.record_exception)),
+while Go has implemented
+[`RecordError`](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span.go),
+which does the same thing.
+
+You can view this
+[compliance matrix](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md)
+of the spec across all languages, but you’ll get the most updated info by
+checking the individual language repository. Now we can begin figuring out how
+to handle errors in OTel, starting with how to report them:
+
+- Spans
+- Logs
+
+### Errors in spans
+
+In OTel, spans are the building blocks of distributed traces, representing
+individual units of work within a distributed system. Spans are related to each
+other and to a trace through context. Put simply, context is the glue that turns
+a pack of data into a unified trace. Context propagation allows us to pass
+information across multiple systems, therefore tying them together. Traces can
+tell us all sorts of things about our applications through metadata and span
+events.
+
+![Graphic that shows the spans within a trace](OTel-spans.png)
+
+### Enhancing spans with metadata
+
+OTel enables you to enhance spans with metadata
+([attributes](/docs/concepts/signals/traces/#attributes)) in the form of
+key-value pairs. By attaching relevant information to spans, such as user IDs,
+request parameters, or environment variables, you can gain deeper insights into
+the circumstances surrounding an error and quickly identify its root cause. This
+metadata-rich approach to error handling can significantly reduce the time and
+effort required to diagnose and resolve issues, ultimately improving the
+reliability and maintainability of your applications.
+
+Spans also have a [span kind](/docs/concepts/signals/traces/#span-kind) field,
+which gives us some additional metadata that can help developers troubleshoot
+errors. OTel defines several span kinds, each of which has unique implications
+for error reporting:
+
+- **client**: For outgoing synchronous remote calls (for example, outgoing HTTP
+  request or DB call)
+- **server**: For incoming synchronous remote calls (for example, incoming HTTP
+  request or remote procedure call)
+- **internal**: For operations that do not cross process boundaries (for
+  example, instrumenting a function call)
+- **producer**: For the creation of a job which may be asynchronously processed
+  later (for example, job inserted into a job queue)
+- **consumer**: For the processing of a job created by a producer, which may
+  start long after the producer span has ended
+
+Span kind is determined automatically by the instrumentation libraries used.
+
+Spans can be further enhanced with
+[span status](/docs/concepts/signals/traces/#span-status). By default, span
+status is marked as `Unset` unless otherwise specified. You can mark a span
+status as `Error` if the resulting span depicts an error, and `Ok` if the
+resulting span is error-free.
+
+### Enhancing spans with span events
+
+A [span event](/docs/concepts/signals/traces/#span-events) is a structured log
+message embedded within a span. Span events help enhance spans by providing
+descriptive information about a span.
+[Span events can also have attributes of their own](/docs/languages/ruby/instrumentation/#add-span-events).
+
+When a span status is set to `Error`, a span event is created automatically,
+capturing the span’s resulting error message and stack trace as an event on that
+span. You can further enhance this span error by adding attributes to it.
+
+Earlier, we mentioned a method called `RecordException`. Per
+[the spec](/docs/specs/otel/trace/api/#record-exception) (emphasis our own), “To
+facilitate recording an exception languages SHOULD provide a RecordException
+method **if the language uses exceptions**… The signature of the method is to be
+determined by each language and can be overloaded as appropriate.”
+
+Since Go doesn’t support the “conventional” concept of exceptions, it instead
+supports
+[`RecordError`](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/trace/span.go#L445-L467),
+which essentially does the same thing idiomatically. You have to make an
+additional call to set its status to `Error` if that’s what it should be, as it
+won’t automatically be set to that. Similarly, `RecordException` can be used to
+record span events without setting the span’s status to `Error`, which means you
+can use it to record any additional data about a span.
+
+By decoupling the span status from being automatically set to `Error` when a
+span exception occurs, you can support the use case where you can have an
+exception event with a status of `Ok` or `Unset`. This gives instrumentation
+authors the most flexibility.
+
+### Errors in logs
+
+In OTel, a log is a structured, timestamped message emitted by a service or
+other component. The recent addition of logs to OTel gives us yet another way of
+reporting errors. Logs have traditionally had different severity levels for
+representing the type of message being emitted such as `DEBUG`, `INFO`,
+`WARNING`, `ERROR`, and `CRITICAL`.
+
+OTel allows for the correlation of logs to traces, in which a log message can be
+associated to a span within a trace, via trace context correlation. Hence,
+looking for a log message with a log level of `ERROR` or `CRITICAL` can yield
+further information of what led to that error, by pulling up the correlated
+trace.
+
+To record an error on a log, either `exception.type` or `exception.message` is
+required, while `exception.stacktrace` is recommended. For more information, see
+[Semantic Conventions for Exceptions in Logs](/docs/specs/semconv/exceptions/exceptions-logs/).
+
+## Logs or spans to capture errors?
+
+After all this, you might be wondering which signal to use to capture errors:
+spans or logs? The answer is: "It depends!" Perhaps your team primarily uses
+traces; perhaps it primarily uses logs.
+
+Spans can be great for capturing errors, because if the operation errors out,
+marking a span as an error makes it stand out and therefore easier to spot. On
+the other hand, if you’re not filtering or tail sampling your traces and your
+system is producing thousands of spans per minute, you could miss errors that
+aren’t occurring frequently, but that still need to be handled.
+
+What about using span events versus logs? Again, this depends. It may be
+convenient to use span events, because when a span status is set to `Error`, a
+span event with the exception message (and other metadata you may wish to
+capture) is automatically created.
+
+Another consideration is your observability backend. Does your backend render
+both logs and traces? How easily queryable or discoverable are your logs, spans,
+and span events? Is logs and trace correlation supported?
+
+## Visualizing errors in different backends
+
+While OTel provides us with the raw telemetry data emitted by our systems, it
+doesn’t provide data visualization or interpretation. This is done by an
+observability backend. Because OTel is vendor-neutral, it means that the same
+information emitted can be visualized and interpreted by different backends
+without re-instrumenting your application.
+
+### Jaeger
+
+Let’s take a look at what an OTel error looks like in
+[Jaeger](https://www.jaegertracing.io/). The error data was generated by the
+code in [this repository](https://github.com/avillela/otel-errors-talk). Here is
+a trace view for the service
+[py-otel-server](https://github.com/avillela/otel-errors-talk/blob/main/src/python/server.py).
+As you can see below, the error spans show up as red dots:
+
+![List of traces in the Jaeger UI](jaeger-high-level-view.png)
+
+And if we drill down and zero in on the error span, we can click into `Logs`,
+which is how span events are expressed in Jaeger, and view the information that
+was captured on it:
+
+![Attributes and other metadata for an error span in Jaeger](jaeger-error.png)
+
+The span is clearly marked as error, and includes a span event with the
+exception captured. Jaeger expresses the span event as a log, but does not
+visualize logs outside of spans.
+
+### Proprietary backends
+
+If you’ve been using a proprietary agent to monitor your applications and have
+recently migrated to OTel, you might notice that an OTel error may not be
+expressed the way you expect in your observability backend, as compared to the
+same error captured by the proprietary agent. This is most likely due to the
+fact that OTel simply models errors differently than how vendors have been
+modeling them.
+
+As a broad example, vendors might have their own notion of what constitutes a
+logical unit of work in an application. You may be familiar with the term
+`transaction`, which means something slightly different from vendor to vendor.
+In OTel, this is represented by a trace. You’ve likely noticed differences in
+your data visualization experiences as vendors make their own adjustments to
+their platforms to accommodate OTLP as a first-class citizen data type.
+
+As a more specific example, OTel’s notion of span kinds may affect how your OTel
+error is expressed in your backend. For instance, if you have a trace that has
+one exception and it’s on an internal span with its status set to `Error`, you
+should see the trace marked with an error, but it may not be counted toward your
+overall app error rate. This is because the vendor might have an opinion that
+only errors on entry point spans (server spans) and consumer spans should be
+counted toward your error rate.
+
+If your backend supports trace and log correlation, you should be able to
+navigate to the associated trace from the log, and vice versa. Furthermore,
+while Jaeger visualizes span events as logs, some vendors might synthesize span
+events as its own data type instead of as a log data type, which would impact
+the way you query that data.
+
+## Conclusion
+
+We’ve just explored the challenges of handling errors and exceptions across
+different programming languages within a microservices architecture, and
+introduced OTel as a solution for standardized telemetry and error reporting.
+The OTel specification serves as a blueprint for standardizing error handling
+across various languages, providing guidelines for implementation, but allowing
+for a degree of flexibility.
+
+You can record errors on spans by making use of your language SDK’s
+`RecordException` or its equivalent, and enrich the span events further by
+adding custom attributes. You can also record errors on logs by adding
+`exception.type` or `exception.message`, and capture the stack trace by adding
+`exception.stacktrace` to yield further information about what happened.
+
+Once that data is in your observability backend, if you have previously used
+their proprietary monitoring agent, you might notice that there is a difference
+in how OTel-instrumented errors are visualized versus how the agent-instrumented
+errors were visualized. This is mainly because OTel models errors differently
+than vendors might have previously done.
+
+By leveraging OTel's capabilities to record errors through logs and spans and to
+enhance them with metadata, you can gain deeper insights into your applications'
+behavior and more effectively troubleshoot issues. You'll be better equipped to
+build and maintain resilient, reliable, and high-performing software
+applications in today's dynamic and demanding environments. To learn more, see
+[Error handling in OpenTelemetry](/docs/specs/otel/error-handling/).
+
+_A version of this article was
+[originally posted](https://newrelic.com/blog/how-to-relic/dude-wheres-my-error)
+on the New Relic blog._
diff --git a/content/en/blog/2024/otel-errors/jaeger-error.png b/content/en/blog/2024/otel-errors/jaeger-error.png
diff --git a/content/en/blog/2024/otel-errors/jaeger-high-level-view.png b/content/en/blog/2024/otel-errors/jaeger-high-level-view.png
diff --git a/content/en/blog/2024/otel-errors/penguin-chalkboard.jpg b/content/en/blog/2024/otel-errors/penguin-chalkboard.jpg