-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor sidecar/interface.rs into smaller files #395
Conversation
sidecar/src/service/session_info.rs
Outdated
/// session_info.shutdown_running_instances().await; | ||
/// } | ||
/// ``` | ||
pub async fn shutdown_running_instances(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this function out for review as it has not only moved, but changed. The previous implementation was cloning the runtimes and never actually shutting them down. This implementation drains the runtimes hashmap and then shuts them down.
I doubt this is the optimal implementation, but it at least works as advertised now.
sidecar/src/service/session_info.rs
Outdated
/// session_info.shutdown_runtime(&"runtime1".to_string()).await; | ||
/// } | ||
/// ``` | ||
pub async fn shutdown_runtime(&self, runtime_id: &String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this function out as there is a minor change (in addition to being moved). This was the only public function that took self
instead of &self
. This meant that after shutting down a single runtime the session_info
object was moved and you couldn't do anything else with it. I'm not that familiar with how SessionInfo
should be used, but since it supports multiple runtimes it should probably be usable after a single one is shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't matter, because it implements Clone and all the relevant data within is Arc. But yes, using &self
is correct.
a499487
to
b658e1c
Compare
ed4b03e
to
d2cf389
Compare
// TODO: APM-1076 - This file contains a fair amount of expects. While in most cases it is unlikely | ||
// we will ever hit these, we should consider adding more robust error handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in these cases the error handling we have is quite fine. In fact, the error handling is tokio catching panics in async tasks and transforming the tasks into an error Result, which then end up logged.
The important part is that these panics on expect() don't bring the application into a deadlocked state where e.g. no further traces are processed at all.
In general all these expects() should actually never be hit and it's a coding mistake. I'm not a big proponent of having a proper error handling code for things which are not supposed to be possible (unless the code itself is broken).
On top of that a crash of the whole sidecar, while a minor data loss, is not catastrophic either as it will be restarted by the PHP processes quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big proponent of having a proper error handling code for things which are not supposed to be possible
I tend to agree as long as 1) we are confident that these errors should never happen and 2) If 1) is wrong, we don't put the application in a bad state. Your comment indicates that we satisfy both these points so I'm ok with leaving these expects as is. I can remove the TODO.
use crate::tracer; | ||
|
||
use crate::service::InstanceId; | ||
/// `SessionInfo` holds information about a session. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explaining what are sessions and runtimes here would be neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very dumb answer is: a session is identified by a session id, and a runtime by a runtime id :-P
Now, what exactly these are depends a bit on the language, but session ids are supposed to be shared for the main process and all it's forks while runtimes are mostly per process.
In context of PHP it's: one session+runtime id tuple = one sidecar connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwoebi Thanks for the clarification. I can update the doc comment to reflect this better.
67807b8
to
4af0b0d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #395 +/- ##
==========================================
+ Coverage 64.08% 64.83% +0.74%
==========================================
Files 169 183 +14
Lines 22073 22359 +286
==========================================
+ Hits 14146 14496 +350
+ Misses 7927 7863 -64
|
d25e068
to
d35bbe5
Compare
882a4c9
to
19f3249
Compare
sidecar/src/service/runtime_info.rs
Outdated
|
||
// TODO-EK: Reduce the type complexity before merging | ||
#[allow(clippy::type_complexity)] | ||
// TODO-EK - Investigate why this needs to be exposed this way before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is simply simplifying the code a bit whether every access is multiple lines, see #392 (comment) for context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing the type complexity here sort of defeats the point of just being a simple helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwoebi your comments are from an outdated version of this file, so I'm not sure if still applies.
The complexity isn't due to the MutexGuard
, it's due to the app HashMap type of HashMap<(String, String), Shared<ManualFuture<Option<AppInstance>>>>
. I created the type AppMap
and use that throughout RuntimeInfo
and was able to remove the clippy ignore.
I think we should strive to ignore linter errors only when it is absolutely necessary. I can't speak for @bantonsson on #392 (comment), but I would argue that a helper function isn't all that helpful if you have to suppress valid linter errors. The current implementation resolves the clippy error and maintains the helpfulness Bjorn was suggesting...at least in my opinion.
I added a TODO (with a jira ticket) to just investigate if it makes sense / is possible to not mutably expose the hashmap at a later time. In the past, I've found that this pattern can lead to problems but I understand there may be no reasonable way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, if it were me, I would just disable the type complexity clippy error for the whole repository. I don't think that's a good lint.
Sometimes types are simply complex and using a type
is basically just a workaround and requiring you to search for the type
definition to just see the actual type.
sidecar/src/service/runtime_info.rs
Outdated
// TODO-EK: Can this be refactored more before merging? We may be able to encapsulate some of the | ||
// functionality in session_info to RuntimeInfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO was just a reminder for myself while the PR was in draft. At the moment, I don't think this is possible in the short-term and isn't really blocking progress. I created a jira ticket and updated the TODO comment to reflect it.
6e3d0c4
to
13e8c7f
Compare
ceeddc3
to
2da5a21
Compare
@bwoebi I did this for two reasons:
Your assumption is correct. Functionally, we don't need to be this verbose. If there is a general consensus against this, I can revert in a follow-up PR. |
interface MSRV is greater than 1.60, so we can remove this allow.
Also, add basic test and doc comments
And add tests and doc comments
And rename to RuntimeMetadata, add basic doc comments and tests.
* Fix typo for intitial_acitons input paramater to get_app * Reorder impl to match trait order * remove redundant prefixes
… file Also, The From SerializedTracerHeaderTags and From TracerHeaderTags trait impls were changed to try_into trait impls as it is possible (however unlikely) that the code within the the trait impls could return errors and it is preferable to let the caller decide how to handle those errors rather than unwrap a Result and potentially panic.
and add doc comments
Also, add rustdoc comments and tests. Uncovered a bug in the shtudown_running_instances function where it never shutdown the running instances.
Refactor into separate function. Also, replace unwraps() with expects().
Moving from interface.rs to separate files
…ts to separate files
interface.rs has been refactored in to a "service" module within sidecar. During refactor the access level of some functions and types was increased while they were being moved around. This should "fix" the access level to the most restrictive possible.
After refactor of sidecar::interface.rs into multiple files AppOrQueue belongs in the telemetry namespace and SidecarStats belongs closer to the code it is generating stats for in SidecarServer.
382cb18
to
64b271d
Compare
What does this PR do?
Motivation
Prep work for introducing retry logic sending traces via the sidecar
Additional Notes
This is a a fairly disruptive refactor. If there are non-critical things we can address in follow up PRs I would prefer to do that versus keeping this open for a long time. Obviously, any critical issues should be addressed ASAP.
How to test the change?
Unit tests are being added, but some sort of integration test with PHP is probably a good idea.
For Reviewers
@DataDog/security-design-and-guidance
.