Skip to content

Latest commit

 

History

History
135 lines (120 loc) · 6.61 KB

README.md

File metadata and controls

135 lines (120 loc) · 6.61 KB

Telemetry Schema Resolution Process

Resolution Process Status:

  • Semantic Convention Registry: Fully Implemented, Partially Tested
  • Application Telemetry Schema: Partially Implemented

This crate describes the resolution process for the OpenTelemetry telemetry schema. The resolution process takes a telemetry schema and/or a semantic convention registry and produces a resolved telemetry schema. The resolved telemetry schema is a self-contained and consistent schema that can be used to validate telemetry data, generate documentation and code, and perform other tasks.

Important Note: Currently, 2 versions of the resolution process are present in the weaver_resolver crate. The first version of the resolution process is incomplete and still used by the weaver CLI (feature 'experimental'). A second version is under active development and is expected to replace the first version in the near future.

Semantic Conventions - Parsing, Resolution and Validation

The parsing is implemented using the serde library. A collection of serde-annotated Rust structures is used to represent semantic convention entities. Optional fields are represented as Option types.

The attribute macro #[serde(deny_unknown_fields)] is used to ensure that unknown fields are not allowed in the semantic convention entities. This, combined with the distinction between optional and required fields in the entities, ensures that the semantic conventions are validated in terms of structure during the parsing process.

The resolution process for semantic conventions is a multistep process that involves the following steps:

  • Load all semantic conventions from the registry
  • Resolve iteratively all semantic conventions. This involves the maintenance of an unresolved semantic convention list and a resolved semantic convention list. The resolution process involves the following steps:
    • Resolve iteratively all extends parent/child clauses until no more resolvable extends are found. The extended entity inherits prefix, attributes, and constraints from the parent entity.
    • Resolve iteratively all attributes ref until no more resolvable ref are found.
  • Apply constraints any_of and include.
  • Validate the resolved semantic conventions
    • No more unresolved ref or extends clauses. The unresolved list should be empty.
    • All constraints satisfied.

Lineage (experimental)

Note: The lineage feature is experimental and has not yet been fully discussed with the OpenTelemetry community. The intent is to include the lineage information in the resolved application telemetry schema to provide a way to trace the origin of each field in the schema. A discussion will be opened in the OpenTelemetry community to discuss the inclusion of this feature as an optional part or as a mandatory part of the resolved schema.

The resolution process can optionally compute the lineage for each attribute of a semantic convention registry. The lineage as such is not part of the syntax and structure of a semantic convention; rather, it's an extension produced by the weaver tool, intended for use in scenarios such as:

  • A semconv author wishes to verify the exact path followed by the resolution process in the case of a complex cascade of inheritance across multiple levels between groups.
  • A documentation process aims to add lineage information to the documentation of each attribute to allow readers to understand where the definitions of each field were defined.
  • The lineage information of a semantic convention could eventually feed into an enterprise data catalog to improve the data governance process.

The general structure of the lineage generated by the resolution process is as follows:

{
  "registry_url": "<registry url>",
  "groups": [
    {
      // Standard semconv fields 
      "id": "<group id>",
      "type": "<group type>",
      "brief": "<group brief>",
      "attributes": [ /* ... */ ],
      "lineage": {
        // `source_file` is either:
        // - a relative path in the case of a registry that has been explicitly
        // resolved from a directory in the file system. This type of registry
        // is mainly used in unit tests, where using a relative path makes
        // these tests independent of where they are executed.
        // - an absolute URL when the resolution applies to a registry
        // accessible from a GitHub repository, for example.
        "source_file": "<path or url>",
        "attributes": {
          "<attribute id>": {
            // The group ID where the attribute fields were defined before a
            // possible local redefinition. In the case of an inherited
            // attribute, this field contains the ID of the group defined in
            // the `extends` clause of the initial definition. In the case of
            // a reference to a root attribute, this field contains the ID of
            // the group where this attribute was defined globally.
            "source_group": "<group id>",
            // The field names inherited from the source group. This field is
            // present only if the attribute is inherited.
            // We assume all are inherited unless overridden.
            "inherited_fields": [ "<field name>", /* ... */ ],
            // The field names overridden in the local group.
            "locally_overridden_fields": [ "<field name>", /* ... */ ],
          }
        }
      }
    }
  ]
}

Important note: Attributes that appear in the group but are not mentioned in the lineage, by convention, have the lineage of the current group.

With this information, it is possible for each field of an attribute to trace inheritance relationships, overrides or local redefinitions, global references to attributes. The definition of an attribute's fields can be a complex mix of definitions coming from several attribute groups.

Open question/concern: Handling of relative paths versus absolute URLs in application telemetry schema resolution. The concern addresses the evolution of file references within the schema, questioning whether a consumer retains these as relative paths or converts them to absolute URLs based on the original source's location. The issue is illustrated through the process where the Semantic Convention (Semconv) resolves its metrics with lineage via relative paths, and Application A subsequently creates its own schema using Semconv. The core question is whether lineage references will display relative paths, URLs, or both. Additionally, there's worry about the scalability of using the source_file string field to track origins as resolutions progress from Semconv to application levels.