Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new blog by OTel CI/CD SIG - Repost from cncf.io/blogs #5718

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
143 changes: 143 additions & 0 deletions content/en/blog/2024/otel-cicd-sig/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
---
title: OpenTelemetry Is Expanding Into CI/CD Observability
linkTitle: OpenTelemetry Is Expanding Into CI/CD Observability
date: 2024-12-01
author: >-
[Dotan Horovits](https://github.com/horovits/) (CNCF Ambassador),
[Adriel Perkins](https://github.com/adrielp) (Liatrio)
canonical_url: https://www.cncf.io/blog/2024/11/04/opentelemetry-is-expanding-into-ci-cd-observability/
issue: 5546
sig: CI/CD Observability
horovits marked this conversation as resolved.
Show resolved Hide resolved
cSpell:ignore: horovits liatrio
---

## OpenTelemetry Is Expanding Into CI/CD Observability


We’ve been talking about the need for a common “language” for reporting and observing CI/CD pipelines for years, and finally, we see the first “words” of this language entering the “dictionary” of observability—the OpenTelemetry open specification. With the recent release of OpenTelemetry’s Semantic Conventions, v1.27.0, you can find [designated attributes for reporting CI/CD pipelines](https://opentelemetry.io/docs/specs/semconv/attributes-registry/cicd/).

This is the result of the hard work of the [CI/CD Observability Special Interest Group (SIG) within OpenTelemetry](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md). As we accomplish the core milestone for the first phase, we thought it’d be a good time to share it with the world.

## Engineers need observability into their CI/CD pipelines

[CI/CD observability](https://medium.com/@horovits/fcc6c10c4987) is essential for ensuring that software is released to production efficiently and reliably. Well-functioning CI/CD pipelines directly impact business outcomes by shortening [Lead Time for Changes DORA metrics](https://horovits.medium.com/improving-devops-performance-with-dora-metrics-918b9604f8e2) and enabling fast identification and resolution of broken or flaky processes. By integrating observability into CI/CD workflows, teams can monitor the health and performance of their pipelines in real time, gaining insights into bottlenecks and areas that require improvement.

Leveraging the same well-established tools used for monitoring production environments, organizations can extend their observability capabilities to include the release cycle, fostering a holistic approach to software delivery. Whether open source or proprietary tools, there’s no need to reinvent the wheel when choosing the observability toolchain for CI/CD pipelines.

## The need for standardization

However, the diverse landscape of CI/CD tools creates challenges in achieving consistent end-to-end observability. With each tool having its own means, format, and semantic conventions for reporting the pipeline execution status, fragmentation within the toolchain can hinder seamless monitoring. Migrating between tools becomes painful, as it requires reimplementing existing dashboards, reports, and alerts.

Things become even more challenging when you need to monitor multiple tools involved in the release pipeline in a uniform manner. This is where [open standards and specifications become critical](https://horovits.medium.com/the-rise-of-open-standards-in-observability-highlights-from-kubecon-13694e732c97). They create a common uniform language, one which is tool- and vendor-agnostic, enabling cohesive observability across different tools and allowing teams to maintain a clear and comprehensive view of their CI/CD pipeline performance.

The need for standardization is relevant for creating the semantic conventions mentioned above, the language for reporting what goes on in the pipeline. Standardization is also needed for the means in which this reporting is propagated through the system, such as upon spawning processes during the pipeline execution. This led us to promote standardization for using environment variables for context and baggage propagation between processes, another important milestone that was recently approved and merged.

## OpenTelemetry: the natural home for CI/CD observability specification

This realization drove us to look for the right way to approach creating a specification. OpenTelemetry emerges as the standard for telemetry generation and collection. The OpenTelemetry specification is tasked with exactly this problem: creating a common uniform and vendor-agnostic specification for telemetry. And housed under the Cloud Native Computing Foundation (CNCF) can ensure it remains open and vendor-neutral. As long standing advocates of OpenTelemetry, it only made sense to extend OpenTelemetry to cover this important DevOps use case.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This realization drove us to look for the right way to approach creating a specification. OpenTelemetry emerges as the standard for telemetry generation and collection. The OpenTelemetry specification is tasked with exactly this problem: creating a common uniform and vendor-agnostic specification for telemetry. And housed under the Cloud Native Computing Foundation (CNCF) can ensure it remains open and vendor-neutral. As long standing advocates of OpenTelemetry, it only made sense to extend OpenTelemetry to cover this important DevOps use case.
This realization drove us to look for the right way to approach creating a specification. OpenTelemetry emerges as the standard for telemetry generation and collection. The OpenTelemetry specification is tasked with exactly this problem: creating a common uniform and vendor-agnostic specification for telemetry. And its support from the Cloud Native Computing Foundation (CNCF) ensures it remains open and vendor-neutral. As long standing advocates of OpenTelemetry, it only made sense to extend OpenTelemetry to cover this important DevOps use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a syndicated article from the CNCF.io blog, I'd rather not change wording unless meaningful.
In particular when positioning the project under the CNCF, we should align with the cncf.io.


We started with an [OpenTelemetry extension proposal (OTEP #223)](https://github.com/open-telemetry/oteps/pull/223) a couple of years ago, proposing our idea to extend OpenTelemetry to cover the CI/CD observability use case. In parallel, we’ve started a Slack channel on the CNCF Slack to gather fellow enthusiasts behind the idea and start brainstorming what that should look like. The Slack channel grew and we quickly discovered that the problem is common across many organizations.

With the feedback from the Technical Oversight Committee and others within the CNCF, we’ve taken the path of asking the mandate to start a dedicated Working Group for the topic under OpenTelemetry’s Semantic Conventions SIG (SIG SemConv in short). With their blessing, we [launched the formal CI/CD Observability SIG](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md) to formalize our previous Slack group discussions and goals.

## OpenTelemetry’s CI/CD Observability SIG

Since November of 2023, the SIG has been actively working to develop the standard for semantics around CI/CD observability in collaboration with experts from multiple companies and open source projects. At its inception, we decided to focus on a few key areas for 2024:

* An initial set of common attributes across CI/CD systems.
* Develop prototype(s) to include both holistic and signal-specific attributes.
* Carry forward the proposal to add environment variables as context propagators to the OpenTelemetry specification (OTEP #258).
* A strategy for bridging OpenTelemetry conventions with [CDEvents](https://cdevents.dev/docs/) and [Eiffel](https://eiffel-community.github.io/).

At first, our SIG met during the larger Semantic Conventions Working Group meetings every Monday. This provided a good opportunity for us to get our bearings as we researched and discussed how we would accomplish the goals on our roadmap. This also enabled us to get to know many members of the larger OpenTelemetry community, solicit feedback on our designs, and get direction on how to proceed. The OpenTelemetry Semantic Convention Working Group has been extraordinarily supportive of the CI/CD initiative.

Upon completion and release of its initial milestone (see below), our SIG was granted its own [dedicated meeting slot](https://github.com/open-telemetry/community/pull/2293) on the [OpenTelemetry calendar](https://github.com/open-telemetry/community#calendar), every Thursday at 0600 PT. The group gets together here to discuss current and future work prior to bringing to the larger Semantic Conventions meetings on Monday. We greatly look forward to the continued support and participation of the community as we continue to drive forward this critical area of standardization.

## CI/CD is part of the latest OpenTelemetry Semantic Conventions

Over the course of months of iteration and feedback, the [first set of Semantic Conventions was merged](https://github.com/open-telemetry/semantic-conventions/pull/1075) in for the v1.27.0 release. This change brought forth the first set of foundational semantics for CI/CD under the `CICD`, `artifacts`, `VCS`, `test`, and `deployment` namespaces. This was a significant milestone for the CI/CD Observability SIG and industry as a whole. This creates the foundation for which all of our group’s other goals can begin to take form, and reach implementation.

Check warning on line 58 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (CICD) Suggestions: (ciad, cica, cicy, Ciça, CCD)

But what does that actually mean? What value does it provide? Let’s consider real world examples for two of the namespaces.

### Tracking release revisions from Version Control Systems (VCS)

[Version Control System (VCS) attributes](https://opentelemetry.io/docs/specs/semconv/attributes-registry/vcs/) cover multiple areas common in a VCS like refs and changes (pull/merge requests). The `vcs.repository.ref.revision` attribute is a key piece of metadata. As Version Control Systems like GitHub and GitLab emit events, they can now have this semantically compliant attribute. That means when integrating code, releasing it, and deploying it to environments, systems can include this attribute and trace the code revision across bounds more easily. In the event a deployment fails, you can quickly look at the revision of code and track it back to the buggy release. This attribute is actually a key piece of metadata for [DORA metrics](https://dora.dev/guides/dora-metrics-four-keys/) too, as you calculate Change lead time and Failed deployment recovery time.

### Artifacts for supply chain security, aligned with the SLSA specification

Check warning on line 66 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (SLSA) Suggestions: (sisa, sosa, salsa, sisá, Sosa)

The [artifact attribute namespace](https://opentelemetry.io/docs/specs/semconv/attributes-registry/artifact/) had multiple attributes for its first implementation. One key set of attributes within this namespace cover [attestations](https://slsa.dev/attestation-model) that closely align with the [SLSA](https://slsa.dev/spec/v1.0/about) model. This is really the first time a direct connection is being made between Observability and Software Supply Chain Security. Consider the following [supply chain threat model](https://slsa.dev/spec/v1.0/threats) defined by SLSA:

Check warning on line 68 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (SLSA) Suggestions: (sisa, sosa, salsa, sisá, Sosa)

Check warning on line 68 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (SLSA) Suggestions: (sisa, sosa, salsa, sisá, Sosa)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [artifact attribute namespace](https://opentelemetry.io/docs/specs/semconv/attributes-registry/artifact/) had multiple attributes for its first implementation. One key set of attributes within this namespace cover [attestations](https://slsa.dev/attestation-model) that closely align with the [SLSA](https://slsa.dev/spec/v1.0/about) model. This is really the first time a direct connection is being made between Observability and Software Supply Chain Security. Consider the following [supply chain threat model](https://slsa.dev/spec/v1.0/threats) defined by SLSA:
The [artifact attribute namespace](https://opentelemetry.io/docs/specs/semconv/attributes-registry/artifact/) had multiple attributes for its first implementation. One key set of attributes within this namespace cover [attestations](https://slsa.dev/attestation-model) that closely align with the [SLSA](https://slsa.dev/spec/v1.0/about) model. This is really the first time a direct connection is being made between observability and software supply chain security. Consider the following [supply chain threat model](https://slsa.dev/spec/v1.0/threats) defined by SLSA:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are terms and should IMO be capitalized. please revisit and refer to guidance if otherwise stated.

{{< figure class="figure" src="SLSA-supply-chain-model.png" attr="SLSA Community Specification License 1.0" attrlink="https://github.com/slsa-framework/slsa?tab=License-1-ov-file" >}}

Check warning on line 69 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (SLSA) Suggestions: (sisa, sosa, salsa, sisá, Sosa)

Check warning on line 69 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (SLSA) Suggestions: (sisa, sosa, salsa, sisá, Sosa)

These new attributes for artifacts and attestations help observe the sequence of events modeled in the above diagram in real time. Really, the conventions that exist today and those that will be added in the future enable interoperability between core software delivery capabilities like security and platform engineering via observability semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These new attributes for artifacts and attestations help observe the sequence of events modeled in the above diagram in real time. Really, the conventions that exist today and those that will be added in the future enable interoperability between core software delivery capabilities like security and platform engineering via observability semantics.
These new attributes for artifacts and attestations help observe the sequence of events modeled in the above diagram in real time. Really, the conventions that exist today and those that will be added in the future enable interoperability between core software delivery capabilities like security and platform engineering using observability semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a syndicated article from the CNCF.io blog, I'd rather not change wording unless meaningful, and "via" vs "using" seems pure styling IMO


## What’s next for CI/CD Observability Working Group

The first major milestone we shared above, was the merge of the OTEP for extending the semantic conventions with the new attributes, which is now part of the OpenTelemetry Semantic Conventions latest release.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The first major milestone we shared above, was the merge of the OTEP for extending the semantic conventions with the new attributes, which is now part of the OpenTelemetry Semantic Conventions latest release.
As already mentioned, the first major milestone we reached was the merge of the OTEP for extending the semantic conventions with the new attributes, which is now part of the OpenTelemetry Semantic Conventions latest release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a syndicated article from the CNCF.io blog, I'd rather not change wording unless meaningful.


The other important milestone was [OTEP #258](https://github.com/open-telemetry/oteps/pull/258) for Environment Variable Context Propagation that was just approved and merged. This OTEP sets the ground for writing the specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The other important milestone was [OTEP #258](https://github.com/open-telemetry/oteps/pull/258) for Environment Variable Context Propagation that was just approved and merged. This OTEP sets the ground for writing the specification.
The second important milestone is [OTEP #258](https://github.com/open-telemetry/oteps/pull/258) for Environment Variable Context Propagation, which was just approved and merged. This OTEP sets the foundation for writing the specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a syndicated article from the CNCF.io blog, I'd rather not change wording unless meaningful.


Since we’ve made progress on our initial milestones, we’ve updated the [CI/CD Observability SIG milestones for the remainder of 2024](https://github.com/open-telemetry/community/blob/main/projects/ci-cd.md). Our goal is to finish out as many of the defined milestones as possible by the end of the year. Notably, we’re focused on:

* Adding [metric conventions for version control systems](https://github.com/open-telemetry/semantic-conventions/pull/1383).
* Building tracing prototypes in CICD systems (for example, ArgoCD, GitHub, GitLab, Jenkins).

Check warning on line 82 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (CICD) Suggestions: (ciad, cica, cicy, Ciça, CCD)
* Getting [OTEP #258](https://github.com/open-telemetry/oteps/pull/258) ready for implementation for the addition to the specification.
* Adding additional attributes to the registry covering more domains like:
* [Software outage incidents](https://github.com/open-telemetry/semantic-conventions/issues/1185)
* [System attributes around CI/CD runners](https://github.com/open-telemetry/semantic-conventions/issues/1184)
* Beginning work on trace and event (log) signal specifics to build the bridge for interoperability between other specifications.
* Adopting the changes from the [Entity and Resource OTEP](https://github.com/open-telemetry/oteps/pull/264).
* [Enabling vendor-specific extension(s)](https://github.com/open-telemetry/semantic-conventions/issues/1193).
* Open source community outreach strategy for semantic adoption.

All that has been mentioned thus far is just the beginning! We have lots of work defined on our [CICD Project Board](https://github.com/orgs/open-telemetry/projects/79), and we have work in progress! We’ll continue to iterate on the above milestones that we’ve set out for the remainder of 2024. Here’s a couple things to look out for.

Check warning on line 92 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (CICD) Suggestions: (ciad, cica, cicy, Ciça, CCD)

* Version Control System metrics—leading indicators for DORA
* Traces from GitHub Actions and Audit Logs
* Special thanks to the following people who are making this component possible:
* Tyler Helmuth – Honeycomb

Check warning on line 97 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Helmuth) Suggestions: (Helmut, Helmuti, helmet, helmets, helminth)
* Andrzej Stencel – Elastic

Check warning on line 98 in content/en/blog/2024/otel-cicd-sig/index.md

View workflow job for this annotation

GitHub Actions / SPELLING check

Unknown word (Andrzej) Suggestions: (Andreza, Andre, Andrea, Andrée, Andrei)
* Curtis Robert – Splunk
* Justin Voss
* Kristof Kowalski – Anz Bank
* Mike Sarahan – Nvidia
* A corresponding version of the GitHub Receiver Component but implemented in GitLab

And much more!

## It takes a village to extend OpenTelemetry

Woah, that’s a lot to do! Most certainly this SIG will continue beyond 2024 and through 2025. Standards are hard, but essential. And, we have some amazing folks that are part of the SIG and contributing to these standards! Who you may ask?

Firstly we’d like to acknowledge key members of OpenTelemetry leadership committees who have heavily enabled the work we’ve done thus far, and will continue to do.

From the OpenTelemetry Technical Committee we have two core sponsors, Carlos Alberto from Lightstep and Josh Suereth from Google. Both Carlos and Josh have been so supportive of the CICD work, really guiding us through the process and details we need to be successful.

From the OpenTelemetry Governance Committee we’ve had Trask Stalnaker from Microsoft act as an exceptional ally, and Daniel Blanco from Skyscanner who now acts as our current Liaison. Both Trask and Daniel have been instrumental in supporting the SIG and enabling us to have our own meeting in the OpenTelemetry community.

In addition to those folks, we’ve had significant feedback, support, and contributions from the following key folks:

* Yuri Shkuro – Creator of Jaeger, Co-Founder of OpenTelemetry
* Andrea Frittoli – Tekton CD Maintainer, CDEvents Co-creator, IBM
* Emil Bäckmark – CDEvents and Eiffel Maintainer, Ericsson
* Magnus Bäck – Eiffel, Axis Communications
* Liudmila Molkova – Microsoft
* Christopher Kamphaus – Jemmic, Jenkins
* Giordano Ricci – Grafana Labs
* Giovanni Liva – Dynatrace, Keptn
* Ivan Calvo – Elastic, Jenkins
* Armin Ruech – Dynatrace
* Michael Safyan – Google
* Robb Kidd – Honeycomb
* Pablo Chacin – Grafana Labs
* Alexandra Konrad – Elastic
* Alexander Wert – Elastic
* Joao Grassi – Dynatrace
* DJ Gregor – Discover

That was a lot of names to name! We greatly appreciate everyone who has supported this initiative and helped bring it to fruition! It takes significant thinking ability and time to build industry wide standards. Hard problems are hard, but these folks have risen to the challenge to make the world of observability and CICD systems a better, more interoperable place!

## Join the Working Group discourse and make an impact

Want to learn more? Want to get involved in shaping CI/CD Observability?

We invite developers and practitioners to participate in the discussions, contribute ideas, and help shape the future of CI/CD observability and the OpenTelemetry semantic conventions. Discussion takes place in the CNCF Slack workspace under the #cicd-o11y channel, and you can chime in on GitHub and join the CICD SIG weekly calls every Thursday at 0600 PT.
Loading