Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor sidecar/interface.rs into smaller files #395

Merged
merged 24 commits into from
May 8, 2024

Conversation

ekump
Copy link
Contributor

@ekump ekump commented Apr 17, 2024

What does this PR do?

  • Break up the various types and implementations in sidecar/interface.rs into separate files within a sidecar::service module. (I'm very open to a better name than "service").
  • Introduce some tests and some documentation. I only added some tests that didn't require much mocking or setup. APMSP-1079 has been created to track a more comprehensive unit testing solution.
  • Fixes some minor bugs uncovered through addition of tests.
  • Miscellaneous small fixes for typos, cleanup, and Rust idiomaticness.
  • As I am fairly new to the sidecar I made my best guesses at rustdoc comments, tests, and naming. Please call out any errors.

Motivation

Prep work for introducing retry logic sending traces via the sidecar

Additional Notes

This is a a fairly disruptive refactor. If there are non-critical things we can address in follow up PRs I would prefer to do that versus keeping this open for a long time. Obviously, any critical issues should be addressed ASAP.

How to test the change?

Unit tests are being added, but some sort of integration test with PHP is probably a good idea.

For Reviewers

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

/// session_info.shutdown_running_instances().await;
/// }
/// ```
pub async fn shutdown_running_instances(&self) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this function out for review as it has not only moved, but changed. The previous implementation was cloning the runtimes and never actually shutting them down. This implementation drains the runtimes hashmap and then shuts them down.

I doubt this is the optimal implementation, but it at least works as advertised now.

/// session_info.shutdown_runtime(&"runtime1".to_string()).await;
/// }
/// ```
pub async fn shutdown_runtime(&self, runtime_id: &String) {
Copy link
Contributor Author

@ekump ekump Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this function out as there is a minor change (in addition to being moved). This was the only public function that took self instead of &self. This meant that after shutting down a single runtime the session_info object was moved and you couldn't do anything else with it. I'm not that familiar with how SessionInfo should be used, but since it supports multiple runtimes it should probably be usable after a single one is shutdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't matter, because it implements Clone and all the relevant data within is Arc. But yes, using &self is correct.

@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch from a499487 to b658e1c Compare April 17, 2024 22:36
@ekump ekump changed the title Refactor sidecar/interface.rs into smaller files WIP - refactor sidecar/interface.rs into smaller files Apr 18, 2024
@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch from ed4b03e to d2cf389 Compare April 30, 2024 17:11
Comment on lines 4 to 5
// TODO: APM-1076 - This file contains a fair amount of expects. While in most cases it is unlikely
// we will ever hit these, we should consider adding more robust error handling.
Copy link
Contributor

@bwoebi bwoebi May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in these cases the error handling we have is quite fine. In fact, the error handling is tokio catching panics in async tasks and transforming the tasks into an error Result, which then end up logged.
The important part is that these panics on expect() don't bring the application into a deadlocked state where e.g. no further traces are processed at all.
In general all these expects() should actually never be hit and it's a coding mistake. I'm not a big proponent of having a proper error handling code for things which are not supposed to be possible (unless the code itself is broken).

On top of that a crash of the whole sidecar, while a minor data loss, is not catastrophic either as it will be restarted by the PHP processes quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwoebi

I'm not a big proponent of having a proper error handling code for things which are not supposed to be possible

I tend to agree as long as 1) we are confident that these errors should never happen and 2) If 1) is wrong, we don't put the application in a bad state. Your comment indicates that we satisfy both these points so I'm ok with leaving these expects as is. I can remove the TODO.

use crate::tracer;

use crate::service::InstanceId;
/// `SessionInfo` holds information about a session.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining what are sessions and runtimes here would be neat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very dumb answer is: a session is identified by a session id, and a runtime by a runtime id :-P
Now, what exactly these are depends a bit on the language, but session ids are supposed to be shared for the main process and all it's forks while runtimes are mostly per process.
In context of PHP it's: one session+runtime id tuple = one sidecar connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwoebi Thanks for the clarification. I can update the doc comment to reflect this better.

@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch from 67807b8 to 4af0b0d Compare May 3, 2024 20:50
@codecov-commenter
Copy link

codecov-commenter commented May 3, 2024

Codecov Report

Attention: Patch coverage is 27.99130% with 993 lines in your changes are missing coverage. Please review.

Project coverage is 64.83%. Comparing base (5a69bdb) to head (64b271d).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #395      +/-   ##
==========================================
+ Coverage   64.08%   64.83%   +0.74%     
==========================================
  Files         169      183      +14     
  Lines       22073    22359     +286     
==========================================
+ Hits        14146    14496     +350     
+ Misses       7927     7863      -64     
Components Coverage Δ
crashtracker 20.37% <ø> (ø)
data-pipeline 51.45% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 81.23% <ø> (ø)
ddcommon-ffi 74.93% <ø> (ø)
ddtelemetry 53.72% <ø> (ø)
ipc 79.30% <ø> (ø)
profiling 77.05% <ø> (ø)
profiling-ffi 60.09% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 28.66% <27.99%> (+10.67%) ⬆️
sidecar-ffi 0.00% <0.00%> (ø)
spawn-worker 50.49% <ø> (ø)
trace-mini-agent 69.12% <ø> (ø)
trace-normalization 97.79% <ø> (ø)
trace-obfuscation 95.74% <ø> (ø)
trace-protobuf 25.64% <ø> (ø)
trace-utils 68.85% <ø> (+0.15%) ⬆️

@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch 5 times, most recently from d25e068 to d35bbe5 Compare May 6, 2024 20:47
@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch 3 times, most recently from 882a4c9 to 19f3249 Compare May 7, 2024 18:43

// TODO-EK: Reduce the type complexity before merging
#[allow(clippy::type_complexity)]
// TODO-EK - Investigate why this needs to be exposed this way before merging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is simply simplifying the code a bit whether every access is multiple lines, see #392 (comment) for context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing the type complexity here sort of defeats the point of just being a simple helper function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwoebi your comments are from an outdated version of this file, so I'm not sure if still applies.

The complexity isn't due to the MutexGuard, it's due to the app HashMap type of HashMap<(String, String), Shared<ManualFuture<Option<AppInstance>>>>. I created the type AppMap and use that throughout RuntimeInfo and was able to remove the clippy ignore.

I think we should strive to ignore linter errors only when it is absolutely necessary. I can't speak for @bantonsson on #392 (comment), but I would argue that a helper function isn't all that helpful if you have to suppress valid linter errors. The current implementation resolves the clippy error and maintains the helpfulness Bjorn was suggesting...at least in my opinion.

I added a TODO (with a jira ticket) to just investigate if it makes sense / is possible to not mutably expose the hashmap at a later time. In the past, I've found that this pattern can lead to problems but I understand there may be no reasonable way around it.

Copy link
Contributor

@bwoebi bwoebi May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, if it were me, I would just disable the type complexity clippy error for the whole repository. I don't think that's a good lint.
Sometimes types are simply complex and using a type is basically just a workaround and requiring you to search for the type definition to just see the actual type.

Comment on lines 42 to 43
// TODO-EK: Can this be refactored more before merging? We may be able to encapsulate some of the
// functionality in session_info to RuntimeInfo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO was just a reminder for myself while the PR was in draft. At the moment, I don't think this is possible in the short-term and isn't really blocking progress. I created a jira ticket and updated the TODO comment to reflect it.

@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch 3 times, most recently from 6e3d0c4 to 13e8c7f Compare May 7, 2024 22:21
@ekump ekump marked this pull request as ready for review May 7, 2024 22:24
@ekump ekump requested a review from a team as a code owner May 7, 2024 22:24
@ekump ekump changed the title WIP - refactor sidecar/interface.rs into smaller files refactor sidecar/interface.rs into smaller files May 8, 2024
sidecar/src/service/mod.rs Outdated Show resolved Hide resolved
@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch 2 times, most recently from ceeddc3 to 2da5a21 Compare May 8, 2024 15:29
@ekump
Copy link
Contributor Author

ekump commented May 8, 2024

Is there a reason why you’ve aggressively put pub(crate) on everything instead of simply pub within the individual rust files?
Looks easier to read if it's just pub.

And I assume it’s enough to just make the modules then pub(crate) in the mod.rs files? What you’re anyway already doing.
Feels cleaner, and gives the ability to just toggle the visibility of a whole module from pub(crate) to pub when needed instead of touching every single codeline in that mod.

@bwoebi I did this for two reasons:

  1. Consistency and safety (not rust "safety"). There are situations where we don't want to mark an entire file as pub(crate) but we do mix exposing certain functions, types, or struct members as either pub(crate) or just pub. In those cases we have no choice but to be verbose in the file. It seems awkward to me that sometimes we set pub(crate) on just the mod.rs level and sometimes we do it at the implementation. I also don't find it hard to read and I like knowing the access level with the implementation details. But if the consensus is that this is superfluous I can revert this in another PR.

  2. It's related to reason 1, but I generally believe that we need to lock down the public surface area of the library as much as possible. We should only expose things publicly that we actually want public, and we're willing to support long-term and properly document. At the moment, I don't think we have any tools in place to help us not "leak" things we don't want public. Until we have something to do this for us (via static analysis or testing or something else) I'd prefer to err on the side of caution.

Your assumption is correct. Functionally, we don't need to be this verbose. If there is a general consensus against this, I can revert in a follow-up PR.

ekump added 24 commits May 8, 2024 13:53
interface

MSRV is greater than 1.60, so we can remove this allow.
Also, add basic test and doc comments
And rename to RuntimeMetadata, add basic doc comments and tests.
* Fix typo for intitial_acitons input paramater to get_app
* Reorder impl to match trait order
* remove redundant prefixes
… file

Also, The From SerializedTracerHeaderTags and From TracerHeaderTags trait
impls were changed to try_into trait impls as it is possible (however
unlikely) that the code within the the trait impls could return errors
and it is preferable to let the caller decide how to handle those errors
rather than unwrap a Result and potentially panic.
Also, add rustdoc comments and tests. Uncovered a bug in the
shtudown_running_instances function where it never shutdown the running
instances.
Refactor into separate function. Also, replace unwraps() with expects().
interface.rs has been refactored in to a "service" module within
sidecar. During refactor the access level of some functions and types
was increased while they were being moved around. This should "fix" the
access level to the most restrictive possible.
After refactor of sidecar::interface.rs into multiple files AppOrQueue belongs in the telemetry namespace and SidecarStats belongs closer to the code it is generating stats for in SidecarServer.
@ekump ekump force-pushed the ekump/refactor-sidecar-interface-pt1 branch from 382cb18 to 64b271d Compare May 8, 2024 17:53
@ekump ekump merged commit d3f8544 into main May 8, 2024
20 checks passed
@ekump ekump deleted the ekump/refactor-sidecar-interface-pt1 branch May 8, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants