-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce persistent storage to Azure exporter #632
Conversation
contrib/opencensus-ext-azure/opencensus/ext/azure/common/schedule.py
Outdated
Show resolved
Hide resolved
The implementation looks good, but if you have the option to export to the agent without buffering and then exporting from the agent to some persistent sink (a file, a message queue, etc.), I think this would be better than building filesystem access into the exporters. I don't know if this would solve your use cases though. It would solve (3), but not (2), although I'm not sure that this solves (2) either. I agree that we need a better general solution for (1), e.g. letting the user configure whether to drop messages when the backend is down, which messages to drop when we hit the memory limit, etc. @songy23 can speak to persistence in the agent. |
Why do you not have a FileExporter, and run a demon on that machine (oc-agent for example) to read from these files and export traces/metrics? |
Why I think this is better:
|
For Microsoft, there are client scenario (e.g. command line tools) which doesn't allow agent. |
Don't know what commanline tools you have there but usually they run for a short period of time. I think having a small document where we try to capture all the requirements and possible implementations and analyze tradeoffs will be great for this feature. I am not against having this capability but I would like to understand what design fits best our requirements. |
I've described the requirement in this PR. Let's use this PR for now, if the conversation goes too long to fit here, I'm okay to create a separate document.
Yep, totally understand. |
Couple of questions about the design:
|
@bogdandrutu these are great questions! Here goes the definition of the
|
Definitely. For Windows we have ETW. If we can have a cross-platform mechanism in OpenCensus, that'll be fantastic!
For (2), it is tricky, given OpenCensus is not designed to be full-transactional (e.g. When we end a span, we put it in the memory queue and wait for the exporter to pick it up later. This means the span data could get lost if application crashed in the middle). Making it fully transactional is way too expensive for the major scenarios, and probably is not what we want to shoot for. (@c24t and I discussed about this, and so far we only see auditing scenario which might require this) In this PR, I want to provide a certain level of solution to 2), instead of a 100% guarantee.
|
@reyang thank you! Persistent storage will definitely help in specific scenarios and is a valuable addition to Azure exporter. It's great that you are working on it in a generic fashion so it may be reused. On this path, you will probably hit the need to introduce SpanData exporting feature census-instrumentation/opencensus-specs#255 that is currently implemented in C# and OpenConsensus project. And then to solve a problem of a potentially writing span to disk again. Long term we will need to post guidance on how persistence can be achieved via the combination of Some things to remember while implementing:
|
We're using
We have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but otherwise LGTM. We can revisit before moving this out of the azure exporter.
}, | ||
timeout=self.options.timeout, | ||
) | ||
except Exception as ex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to catch RequestException
specifically here to tell retryable from non-retryable errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO comment for now, will revisit when refactoring the exporter specific context (e.g. instead of having exporter manipulating the blacklist, having a dedicated context flag in exporter logic, so that integrations like requests would not intercept such activities and cause dead loop).
contrib/opencensus-ext-azure/opencensus/ext/azure/trace_exporter/__init__.py
Outdated
Show resolved
Hide resolved
:param args: The kwargs passed in while calling `function`. | ||
""" | ||
|
||
def __init__(self, interval, function, args=None, kwargs=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature change here means we lose access to the other Thread
constructor args, but that's probably fine.
elapsed_time = time.time() - start_time | ||
wait_time = max(self.interval - elapsed_time, 0) | ||
|
||
def cancel(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called this stop
instead of cancel
originally because it's still possible to call it after the function has run, but the change makes sense if you want this to mimic Timer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to name things, here we take the easy approach to mimic Timer
.
My personal thinking: stop
sounds like a synchronous operation, which notifies the thread and wait for it to join. (e.g. I would expect the thread to be stopped after stop
returns)
start_time = time.time() | ||
self.function(*self.args, **self.kwargs) | ||
elapsed_time = time.time() - start_time | ||
wait_time = max(self.interval - elapsed_time, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better than the original.
contrib/opencensus-ext-azure/opencensus/ext/azure/common/storage.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/common/storage.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/trace_exporter/__init__.py
Outdated
Show resolved
Hide resolved
else: | ||
yield LocalFileBlob(path) | ||
|
||
def get(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is unused, what do you need it for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not used except as an util for test cases. I was thinking to keep it as symmetrical to put
?
Discussion of data reliability and persistence will continue in #633. |
NOTE: discussion of data reliability and persistence will continue in #633.
@c24t @songy23 I hope to get your early feedback before I finish the docs/tests. And to see if we could take the same approach for agents / other SDKs.
I plan to add local file persistency for Azure trace exporter, to help the following cases:
The
LocalFileStorage
is designed to take minimum dependency on operating system level synchronization primitives, the only requirement is file rename should be mutual exclusive. This makes it easier to port the logic to other languages/platforms.