Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry PoolNeedsConnection Errors #1010

Merged
merged 7 commits into from
Aug 29, 2024
Merged

Conversation

nplasterer
Copy link
Contributor

@nplasterer nplasterer commented Aug 28, 2024

Makes PoolNeedsConnection Errors retryable for actions that require backgrounding the app while doing an action like receiving a signature on adding and revoking installations and applying a signature request.

Also adds a new error type that can be caught for the group create issue.

Fixes #996

@nplasterer nplasterer self-assigned this Aug 28, 2024
@nplasterer nplasterer marked this pull request as ready for review August 28, 2024 23:32
@nplasterer nplasterer requested a review from a team as a code owner August 28, 2024 23:32
@@ -68,6 +68,7 @@ impl RetryableError for StorageError {
Self::Pool(_) => true,
Self::Lock(_) => true,
Self::SqlCipherNotLoaded => true,
Self::PoolNeedsConnection => true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means the error will retry 5 times before erroring. That might be okay? I also took a pass at just doing it in the code and retrying 3 times here 45adbfe open to others thoughts on how we should handle this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely feels like it should be retryable. Thanks for fixing

@nplasterer nplasterer requested a review from insipx August 28, 2024 23:59
@insipx
Copy link
Contributor

insipx commented Aug 29, 2024

Just reposting here for visibility :

Was just thinking about this a little

During these sessions, is the XMTP client still in memory/we're not re-instantiating the struct?

Instead of retrying, which may still fail depending on the threading situation -- sortof feels like we're just waiting for a race which reconnects the db in another thread -- we could instead add a flag on the struct and a boolean passed to release_connection like reconnect_on_reentry, that would auto-re-establish a database connection instead of throwing an error. This seems to check if you're already calling to release the db connection. I don't fully understand what the app is back active callback is, though. would be happy to look through the sdk code for that if someone can link it.

also, this retry would only work if the reconnect is already in a codepath that is in a retry!() block. We could surround this in another retry block, but since this is a function used in lots of other places, may inadvertently cause double-retry times

@nmalzieu
Copy link

Just reposting here for visibility :

Was just thinking about this a little

During these sessions, is the XMTP client still in memory/we're not re-instantiating the struct?

Instead of retrying, which may still fail depending on the threading situation -- sortof feels like we're just waiting for a race which reconnects the db in another thread -- we could instead add a flag on the struct and a boolean passed to release_connection like reconnect_on_reentry, that would auto-re-establish a database connection instead of throwing an error. This seems to check if you're already calling to release the db connection. I don't fully understand what the app is back active callback is, though. would be happy to look through the sdk code for that if someone can link it.

also, this retry would only work if the reconnect is already in a codepath that is in a retry!() block. We could surround this in another retry block, but since this is a function used in lots of other places, may inadvertently cause double-retry times

Hey @insipx , I'm not 100% sure I get what you mean, but one thing we DON'T want is for libxmtp tu auto-re-establish a database connection. Because libxmtp has no idea if the app is active or backgrounded, and on iOS opening a connection to the libxmtp db when the app gets suspended will lead to a crash

@@ -187,6 +187,8 @@ pub enum GroupError {
SqlKeyStore(#[from] sql_key_store::SqlKeyStoreError),
#[error("No pending commit found")]
MissingPendingCommit,
#[error("Sync failed to wait for intent: {0}")]
SyncFailedToWait(String),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neekolas adding this so it's easier to catch the error that is causing group creates to fail.

Comment on lines +270 to +277

let current_state = retry_async!(
Retry::default(),
(async {
self.get_association_state(&self.store().conn()?, &inbox_id, None)
.await
})
)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cover the case converse is hitting when the database is trying to reconnect after getting a signature from a different app for revoking installations.

Copy link
Contributor

@insipx insipx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

When writing errors, the #[error("display impl")] is the Display impl for to_string, so an extra string is redundant

xmtp_mls/src/groups/mod.rs Outdated Show resolved Hide resolved
xmtp_mls/src/groups/mod.rs Outdated Show resolved Hide resolved
xmtp_mls/src/groups/sync.rs Outdated Show resolved Hide resolved
xmtp_mls/src/groups/sync.rs Outdated Show resolved Hide resolved
xmtp_mls/src/groups/sync.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry Pool Disconnection Errors
5 participants