-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ConfigurationCallbackStorage
Crash
#1385
Conversation
ConfigurationCallbackStorage
CrashConfigurationCallbackStorage
Crash
…flag Signed-off-by: Jax DesMarais-Leder <[email protected]>
25f95c3
to
8f8b5ab
Compare
Co-authored-by: Victoria Park <[email protected]>
ConfigurationCallbackStorage
CrashConfigurationCallbackStorage
Crash
} | ||
|
||
if let error { | ||
// TODO: - Consider updating all feature clients to use async version of this method? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created DTMOBILES-980 for this TODO. Want to focus on tackling the crash first, then we can follow up with this change
UnitTests/BraintreeCoreTests/Configuration/ConfigurationLoader_Tests.swift
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!!!! 🚀 👏
if let error { | ||
// TODO: - Consider updating all feature clients to use async version of this method? | ||
|
||
Task { @MainActor in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure that these don't cause issues with nested calls to fetchOrReturnConfig.
Checking if removing @mainactor here and in the async fetchConfiguration function and just running completion handler and setupHTTPCredentials on main would preserve all the analytics calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MainActor
is equivalent to running on the main dispatch queue: https://developer.apple.com/documentation/swift/mainactor. So should behave the exact same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think main difference for our purposes is that @mainactor can have suspension points so we wouldn't have deadlocks with nested completion handlers that call fetchOrReturnRemoteConfiguration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just puzzled why there are differences in analytics output with this commit as opposed to what's on main or previous commits on this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are just doing same on two call sites, so coordinating the two configurationLoader.getConfig calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test you quoted above, testCallbacks_useMainDispatchQueue()
, it expects the completion handler to return on main.
So entire Task block of fetchOrReturnRemoteConfiguration doesn't need to be annotated @mainactor,
just the setupHTTPCredentials and completion invocation need to be wrapped in await MainActor.run to make this test pass, as I mentioned at top of this thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't personally seen weirdness or difference in the FPTI between what was on main before, before this crash fix was implemented, after adding it, with or without MainActor annotations for fetchConfig functions.
But I trust that you saw something, there is a lot going on with sending analytics.
I do understand your concerns that fetchOrReturnRemoteConfiguration
previously had expectation to be called from main. But I am not sure if this was true in case in analytics. In BTAnalyticsService
, timer's event handler that calls sendQueuedAnalyticsEvents
is awaited in a Task block and will run in the background thread.
This particular implementation detail was same before the crash fix.
Some completion handlers were returned on main in BTHTTP
callCompletionAsync
.
One small issue I can see right now is if network request fails, events will be dropped.
But these are not related to @mainactor annotation on fetchConfiguration.
My main concern is potential for more latency for fetching configuration for sendQueuedAnalyticsEvents
with @mainactor annotation. But it's hard to tell.
Let's keep an eye on latency metrics and then we can revisit. What are your thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test you quoted above, testCallbacks_useMainDispatchQueue(), it expects the completion handler to return on main.
So entire Task block of fetchOrReturnRemoteConfiguration doesn't need to be annotated @mainactor,
just the setupHTTPCredentials and completion invocation need to be wrapped in await MainActor.run to make this test pass, as I mentioned at top of this thread.
imo, adding await MainActor.run
for just the completions/setupHTTPCredentials complicates the callsite vs running the task on MainActor
. There is no observed latency since getConfig
is a global isolated actor already. Having a cleaner callsite is more important as there is no difference in wrapping the completions in a MainActor.run
vs what exists today, but this method will also entirely be removed in a future PR.
I haven't personally seen weirdness or difference in the FPTI between what was on main before, before this crash fix was implemented, after adding it, with or without MainActor annotations for fetchConfig functions.
If you look at ticket DTMOBILES-998 you will see this same behavior where we make multiple network called and the config isn't cached properly. There are screenshots in that ticket with additional details from FPTI.
My main concerns are potential for more latency for fetching configuration for sendQueuedAnalyticsEvents with @mainactor annotation. But it's hard to tell. Let's keep an eye on latency metrics and then we can revisit.
There is clearly a need to annotate as is, removing it causes the issues we can clearly see in FPTI noted above. Removing the duplicate config call and just relying on the async method should resolve any remaining confusion. There is not additional latency added that did not exist previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the screenshots. Do these extra calls happen in 6.23.2? I can check.
I believe you that you observed these, I just didn't see it myself. I am trying to figure out why.
I want to make sure I understand what @mainactor annotation is fixing.
Looks like the a lot of extra v1/config are coming from analytics sent from fetchAPITiming function.
I'll look into this. But I'm ok to leave the @mainactor annotation and observing any effects on latency,
I just want to make sure we understand why this is happening.
This is strange, I got different results from FPTI. I will post on your PR after doing a few more runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made comments on your jira ticket with screenshots. I tested 6.23.3 with @mainactor annotation as it is currently on main wit no changes, 6.23.3 without @mainactor annotation but with await MainActor.run on setHTTPCredentials and completion calls and 6.23.2 with no changes.
I didn't incorporate your changes on this branch, I am going to try that as well.
I found that there are no duplicate calls to v1/configuration in any of them, they are all fetching from cache but 6.23.3 without @mainactor annotation results in analytic events being listed more out of order, this was also true for 6.23.2, the analytics were out of order.
I just ran through each once, so I will try several times for verification.
Ran through another set of manual tests on sand and prod while triggering memory warnings and all looks good from my side. Let me know once you complete going through the flows as well @KunJeongPark and I'll merge this in. |
Roger! running tests and will let you know. |
I want to point out for future that |
} | ||
} | ||
|
||
func fetchConfiguration() async throws -> BTConfiguration { | ||
@MainActor func fetchConfiguration() async throws -> BTConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar ❓ - why did we need these MainActor attributes?
@@ -1,15 +1,20 @@ | |||
import Foundation | |||
|
|||
@globalActor actor ConfigurationActor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat! Can we add a docstring for what purpose this serves?
func getConfig(completion: @escaping (BTConfiguration?, Error?) -> Void) { | ||
@ConfigurationActor | ||
func getConfig() async throws -> BTConfiguration { | ||
if let existingTask { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check the cache before bothering checking if there are any pending tasks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jax had the same question. I was thinking of scenario where there may be read/write cache race condition if there is a pending task. The task includes writing into the cache.
notifyCompletions(nil, BTAPIClientError.configurationUnavailable) | ||
return | ||
|
||
let task = Task { [weak self] in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do existingTask = Task { [weak self] in ...
instead?
Line 75 is a little confusing why we're doing the await on the task
constant and not the re-assigned existingTask
variable.
Summary of changes
ConfigurationCallbackStorage
async/throws
get
methodgetConfig()
to useasync/throws
- use task blockasync/await
methodsTo Test
Checklist
Authors
@scannillo @jaxdesmarais @KunJeongPark