Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect assumption about PUBLISH errors in autopaho can cause indefinite head-of-queue looping #214

Closed
vishnureddy17 opened this issue Dec 25, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@vishnureddy17
Copy link
Contributor

vishnureddy17 commented Dec 25, 2023

In autopaho, managePublishQueue assumes that if paho.PublishWithOptions() returns an error that is not paho.ErrNetworkErrorAfterStored, the error is either temporary or the connection will drop:
https://github.com/eclipse/paho.golang/blob/a6def521ee1aecf0b1cef6bfc1a7022102457807/autopaho/auto.go#L531-L553

However, this is not always the case. All the errors returned in the code snippet below for paho.PublishWithOptions() are neither temporary nor imply a pending disconnection:
https://github.com/eclipse/paho.golang/blob/a6def521ee1aecf0b1cef6bfc1a7022102457807/paho/client.go#L748-L762

As a result, managePublishQueue will loop indefinitely on the first entry in the queue in these cases.

Proposed Solution
To solve this, I propose creating a custom error type called PahoArgumentError, which will be returned by paho.PublishWithOptions() in these cases (this error type will likely be relevant elsewhere in paho). Then autopaho can do a type assertion and quarantine any publish in the queue that returned PahoArgumentError.

@MattBrittan
Copy link
Contributor

Agreed; I did consider doing something similar when introducing this, but felt that it was OK for an initial PR (hard to know when to stop and submit the PR). My thought was something like paho.FatalError (indicating that it's pointless to retry) because this could also apply if the broker returns something like "Payload format invalid"). Happy with either option (will try to get a bit more work done on this over the next couple of days).

@vishnureddy17
Copy link
Contributor Author

That sounds good, I also think it's worth returning such an error from Subscribe and Unsubscribe, which would be used to distinguish between errors that are due to a client failure/disconnection and errors that are due to something fatally wrong with the subscribe/unsubscribe

@MattBrittan
Copy link
Contributor

I've had an initial go at this; went with your solution (it's probably useful to be able to differentiate between
a range of "fatal errors"). Ideally this would be extended to look at broker response codes (but I don't think that
is as urgent).

@MattBrittan MattBrittan added the enhancement New feature or request label Jan 20, 2024
@vishnureddy17
Copy link
Contributor Author

closing due to #226, @MattBrittan feel free to reopen if you still think it's important to keep around

@vishnureddy17
Copy link
Contributor Author

Ideally this would be extended to look at broker response codes (but I don't think that
is as urgent).

This is really only doable if #216 is addressed.

Also, even it was possible, I don't think that autopaho should retry if the ack came back with an error reason code. What if we're close to the server's receive maximum or we are about to run out of packet IDs? Then retrying could cause issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants