Fix `ConfigurationCallbackStorage` Crash #1385

scannillo · 2024-08-06T14:55:50Z

Summary of changes

Fixes Crash in ConfigurationCallbackStorage After Upgrading to 6.23.2 #1382
Remove ConfigurationCallbackStorage
Add async/throws get method
Update getConfig() to use async/throws - use task block
Update tests to use async/await methods

To Test

The configuration is being cached and we use the cache if available - verify via FPTI
When triggering a memory warning a crash is not present
There are no crashes when instantiating multiple instances at the same time

Checklist

Added a changelog entry

Authors

@scannillo @jaxdesmarais @KunJeongPark

…allback array

…flag Signed-off-by: Jax DesMarais-Leder <[email protected]>

Co-authored-by: Victoria Park <[email protected]>

jaxdesmarais · 2024-08-08T19:59:42Z

Sources/BraintreeCore/BTAPIClient.swift

-            }
-
-            if let error {
+        // TODO: - Consider updating all feature clients to use async version of this method?


Created DTMOBILES-980 for this TODO. Want to focus on tackling the crash first, then we can follow up with this change

…safety

UnitTests/BraintreeCoreTests/Configuration/ConfigurationLoader_Tests.swift

richherrera

Amazing!!!! 🚀 👏

KunJeongPark · 2024-08-09T17:38:54Z

Sources/BraintreeCore/BTAPIClient.swift

-            if let error {
+        // TODO: - Consider updating all feature clients to use async version of this method?
+
+        Task { @MainActor in


Just making sure that these don't cause issues with nested calls to fetchOrReturnConfig.
Checking if removing @mainactor here and in the async fetchConfiguration function and just running completion handler and setupHTTPCredentials on main would preserve all the analytics calls.

@MainActor is equivalent to running on the main dispatch queue: https://developer.apple.com/documentation/swift/mainactor. So should behave the exact same.

I think main difference for our purposes is that @mainactor can have suspension points so we wouldn't have deadlocks with nested completion handlers that call fetchOrReturnRemoteConfiguration

I'm just puzzled why there are differences in analytics output with this commit as opposed to what's on main or previous commits on this PR.

I guess we are just doing same on two call sites, so coordinating the two configurationLoader.getConfig calls.

That test you quoted above, testCallbacks_useMainDispatchQueue(), it expects the completion handler to return on main.
So entire Task block of fetchOrReturnRemoteConfiguration doesn't need to be annotated @mainactor,
just the setupHTTPCredentials and completion invocation need to be wrapped in await MainActor.run to make this test pass, as I mentioned at top of this thread.

I haven't personally seen weirdness or difference in the FPTI between what was on main before, before this crash fix was implemented, after adding it, with or without MainActor annotations for fetchConfig functions.

But I trust that you saw something, there is a lot going on with sending analytics.
I do understand your concerns that fetchOrReturnRemoteConfiguration previously had expectation to be called from main. But I am not sure if this was true in case in analytics. In BTAnalyticsService, timer's event handler that calls sendQueuedAnalyticsEvents is awaited in a Task block and will run in the background thread.
This particular implementation detail was same before the crash fix.
Some completion handlers were returned on main in BTHTTP callCompletionAsync.

One small issue I can see right now is if network request fails, events will be dropped.
But these are not related to @mainactor annotation on fetchConfiguration.

My main concern is potential for more latency for fetching configuration for sendQueuedAnalyticsEvents with @mainactor annotation. But it's hard to tell.

Let's keep an eye on latency metrics and then we can revisit. What are your thoughts on that?

That test you quoted above, testCallbacks_useMainDispatchQueue(), it expects the completion handler to return on main.
So entire Task block of fetchOrReturnRemoteConfiguration doesn't need to be annotated @mainactor,
just the setupHTTPCredentials and completion invocation need to be wrapped in await MainActor.run to make this test pass, as I mentioned at top of this thread.

imo, adding await MainActor.run for just the completions/setupHTTPCredentials complicates the callsite vs running the task on MainActor. There is no observed latency since getConfig is a global isolated actor already. Having a cleaner callsite is more important as there is no difference in wrapping the completions in a MainActor.run vs what exists today, but this method will also entirely be removed in a future PR.

I haven't personally seen weirdness or difference in the FPTI between what was on main before, before this crash fix was implemented, after adding it, with or without MainActor annotations for fetchConfig functions.

If you look at ticket DTMOBILES-998 you will see this same behavior where we make multiple network called and the config isn't cached properly. There are screenshots in that ticket with additional details from FPTI.

My main concerns are potential for more latency for fetching configuration for sendQueuedAnalyticsEvents with @mainactor annotation. But it's hard to tell. Let's keep an eye on latency metrics and then we can revisit.

There is clearly a need to annotate as is, removing it causes the issues we can clearly see in FPTI noted above. Removing the duplicate config call and just relying on the async method should resolve any remaining confusion. There is not additional latency added that did not exist previously.

Thank you for the screenshots. Do these extra calls happen in 6.23.2? I can check.
I believe you that you observed these, I just didn't see it myself. I am trying to figure out why.
I want to make sure I understand what @mainactor annotation is fixing.

Looks like the a lot of extra v1/config are coming from analytics sent from fetchAPITiming function.
I'll look into this. But I'm ok to leave the @mainactor annotation and observing any effects on latency,
I just want to make sure we understand why this is happening.

This is strange, I got different results from FPTI. I will post on your PR after doing a few more runs.

I made comments on your jira ticket with screenshots. I tested 6.23.3 with @mainactor annotation as it is currently on main wit no changes, 6.23.3 without @mainactor annotation but with await MainActor.run on setHTTPCredentials and completion calls and 6.23.2 with no changes.

I didn't incorporate your changes on this branch, I am going to try that as well.

I found that there are no duplicate calls to v1/configuration in any of them, they are all fetching from cache but 6.23.3 without @mainactor annotation results in analytic events being listed more out of order, this was also true for 6.23.2, the analytics were out of order.
I just ran through each once, so I will try several times for verification.

jaxdesmarais · 2024-08-09T18:53:54Z

Ran through another set of manual tests on sand and prod while triggering memory warnings and all looks good from my side. Let me know once you complete going through the flows as well @KunJeongPark and I'll merge this in.

KunJeongPark · 2024-08-09T19:09:59Z

Roger! running tests and will let you know.

KunJeongPark · 2024-08-12T15:13:11Z

I want to point out for future that ConfigurationCache, as it is a singleton class, might need to ensure thread-safety as well.

scannillo · 2024-08-14T15:33:02Z

Sources/BraintreeCore/BTAPIClient.swift

        }
    }

-    func fetchConfiguration() async throws -> BTConfiguration {
+    @MainActor func fetchConfiguration() async throws -> BTConfiguration {


Similar ❓ - why did we need these MainActor attributes?

scannillo · 2024-08-14T15:33:28Z

Sources/BraintreeCore/Configuration/ConfigurationLoader.swift

@@ -1,15 +1,20 @@
 import Foundation

+@globalActor actor ConfigurationActor {


This is neat! Can we add a docstring for what purpose this serves?

scannillo · 2024-08-14T15:38:23Z

Sources/BraintreeCore/Configuration/ConfigurationLoader.swift

-    func getConfig(completion: @escaping (BTConfiguration?, Error?) -> Void) {
+    @ConfigurationActor
+    func getConfig() async throws -> BTConfiguration {
+        if let existingTask {


Shouldn't we check the cache before bothering checking if there are any pending tasks?

@jaxdesmarais @KunJeongPark

Jax had the same question. I was thinking of scenario where there may be read/write cache race condition if there is a pending task. The task includes writing into the cache.

scannillo · 2024-08-15T14:17:32Z

Sources/BraintreeCore/Configuration/ConfigurationLoader.swift

-                    notifyCompletions(nil, BTAPIClientError.configurationUnavailable)
-                    return
+
+        let task = Task { [weak self] in


Can we just do existingTask = Task { [weak self] in ... instead?

Line 75 is a little confusing why we're doing the await on the task constant and not the re-assigned existingTask variable.

scannillo added 2 commits August 6, 2024 09:51

Move from DispatchQueue to actor for thread-safety

43de2b6

CHANGELOG

8f51234

scannillo changed the title ~~Fix ConfigurationCallbackStorage Crash~~ [DO NOT REVIEW] Fix ConfigurationCallbackStorage Crash Aug 6, 2024

scannillo added 2 commits August 6, 2024 10:10

Revert actor changes; don't use any thread-safety wrapper on config c…

d546f2e

…allback array

WIP - Use while loop to prevent duplicate config GET requests & bool …

8f8b5ab

…flag Signed-off-by: Jax DesMarais-Leder <[email protected]>

scannillo force-pushed the fix-config-callback-crash branch from 25f95c3 to 8f8b5ab Compare August 7, 2024 20:35

jaxdesmarais and others added 8 commits August 7, 2024 16:21

remove ConfigurationCallbackStorage; cleanup

3b68a60

WIP - start updating tests

97d769f

run configuration fetch on main thread

5c7ae5c

update tests; cleanup logic

2951226

combine task approach vs bool for config cache

ecf9759

Co-authored-by: Victoria Park <[email protected]>

clear existingTask at end of Task block and add weak self

cf9990c

address todo; update test

cb3bf33

remove redundant self; check config first

ad83d5c

jaxdesmarais changed the title ~~[DO NOT REVIEW] Fix ConfigurationCallbackStorage Crash~~ Fix ConfigurationCallbackStorage Crash Aug 8, 2024

jaxdesmarais marked this pull request as ready for review August 8, 2024 19:48

jaxdesmarais requested a review from a team as a code owner August 8, 2024 19:48

jaxdesmarais reviewed Aug 8, 2024

View reviewed changes

KunJeongPark added 2 commits August 8, 2024 13:02

revert cache get/ existingTask check order swap for cache read/write …

da5da37

…safety

make tests parallel and add globalActor on getConfig

498b348

jaxdesmarais reviewed Aug 9, 2024

View reviewed changes

UnitTests/BraintreeCoreTests/Configuration/ConfigurationLoader_Tests.swift Show resolved Hide resolved

richherrera approved these changes Aug 9, 2024

View reviewed changes

run fetchConfiguration for analytics on @mainactor

20d7a46

dane-thomas-vs mentioned this pull request Aug 9, 2024

Crash in ConfigurationCallbackStorage After Upgrading to 6.23.2 #1382

Closed

KunJeongPark reviewed Aug 9, 2024

View reviewed changes

KunJeongPark approved these changes Aug 9, 2024

View reviewed changes

jaxdesmarais merged commit 6c2ddb8 into main Aug 9, 2024
8 checks passed

jaxdesmarais deleted the fix-config-callback-crash branch August 9, 2024 20:40

scannillo commented Aug 14, 2024

View reviewed changes

scannillo commented Aug 15, 2024

View reviewed changes

jaxdesmarais mentioned this pull request Aug 20, 2024

Configuration Cleanup #1394

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `ConfigurationCallbackStorage` Crash #1385

Fix `ConfigurationCallbackStorage` Crash #1385

scannillo commented Aug 6, 2024 •

edited by jaxdesmarais

Loading

jaxdesmarais Aug 8, 2024

richherrera left a comment

KunJeongPark Aug 9, 2024 •

edited

Loading

jaxdesmarais Aug 9, 2024

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

jaxdesmarais Aug 22, 2024

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

jaxdesmarais commented Aug 9, 2024

KunJeongPark commented Aug 9, 2024

KunJeongPark commented Aug 12, 2024 •

edited

Loading

scannillo Aug 14, 2024

scannillo Aug 14, 2024

scannillo Aug 14, 2024

KunJeongPark Aug 14, 2024 •

edited

Loading

scannillo Aug 15, 2024

		@@ -1,15 +1,20 @@
		import Foundation

		@globalActor actor ConfigurationActor {

Fix ConfigurationCallbackStorage Crash #1385

Fix ConfigurationCallbackStorage Crash #1385

Conversation

scannillo commented Aug 6, 2024 • edited by jaxdesmarais Loading

Summary of changes

To Test

Checklist

Authors

jaxdesmarais Aug 8, 2024

Choose a reason for hiding this comment

richherrera left a comment

Choose a reason for hiding this comment

KunJeongPark Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

jaxdesmarais Aug 9, 2024

Choose a reason for hiding this comment

KunJeongPark Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

KunJeongPark Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

KunJeongPark Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

KunJeongPark Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

KunJeongPark Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jaxdesmarais Aug 22, 2024

Choose a reason for hiding this comment

KunJeongPark Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

KunJeongPark Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jaxdesmarais commented Aug 9, 2024

KunJeongPark commented Aug 9, 2024

KunJeongPark commented Aug 12, 2024 • edited Loading

scannillo Aug 14, 2024

Choose a reason for hiding this comment

scannillo Aug 14, 2024

Choose a reason for hiding this comment

scannillo Aug 14, 2024

Choose a reason for hiding this comment

KunJeongPark Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

scannillo Aug 15, 2024

Choose a reason for hiding this comment

Fix `ConfigurationCallbackStorage` Crash #1385

Fix `ConfigurationCallbackStorage` Crash #1385

scannillo commented Aug 6, 2024 •

edited by jaxdesmarais

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 9, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark Aug 22, 2024 •

edited

Loading

KunJeongPark commented Aug 12, 2024 •

edited

Loading

KunJeongPark Aug 14, 2024 •

edited

Loading