Data reliability and persistence story for SDK #633

reyang · 2019-04-29T20:16:17Z

This is a follow up on #632.

In SDK, we need to have a clear story for the following situations, we can either decide to support it in the core OpenCensus SDK, or leave it to specific exporter.

When the SDK failed to export data to the backend system due to networking issues, to prevent eating up all the memory, we need to either discard excessive data (depending on the case, it could be either latest or oldest), or store them locally (e.g. file, log, reliable pipe, ETW).
In case of application exit/restart/crash, we want to reduce the data loss. Although data loss is unavoidable given we're not a fully transactional system (e.g. your code writes traces to a queue, and the process got killed before the queue item got processed, the data will get lost), having ability to store things locally and being able to pick up later (after machine or application restart) would be useful for some cases.
Console application (backend job, periodic task, command line tools) might need to store the traces during the exit grace period, since sending all the data across networking might not be possible within that grace period.
There are cases where developers need more reliability, for example, auditing logs and QoS logs. We might need to provide an alternative way, so developers can sacrifice performance (e.g. without going through the queue, synchronously persist the log in a local storage or even transmit the data across the network) for reliability.

The design principles:

Need to work in a multi-threading environment.
Need to work in a multi-processing environment (e.g. one application has multiple process instances running at the same time).
Should leverage existing stuff if possible, rather than reinventing wheels.
Need to have solution for both agent and agent-less scenario.

reyang · 2019-04-29T20:17:16Z

@bogdandrutu @c24t @songy23

reyang added the enhancement label Apr 29, 2019

reyang self-assigned this Apr 29, 2019

reyang added the P2 label Apr 29, 2019

reyang mentioned this issue Apr 29, 2019

Introduce persistent storage to Azure exporter #632

Merged

rajkumar-rangaraj mentioned this issue Sep 16, 2020

Add persistent storage to exporter open-telemetry/opentelemetry-dotnet#1278

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data reliability and persistence story for SDK #633

Data reliability and persistence story for SDK #633

reyang commented Apr 29, 2019 •

edited

Loading

reyang commented Apr 29, 2019

Data reliability and persistence story for SDK #633

Data reliability and persistence story for SDK #633

Comments

reyang commented Apr 29, 2019 • edited Loading

reyang commented Apr 29, 2019

reyang commented Apr 29, 2019 •

edited

Loading