Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency Issues with WebSub #174

Open
kevincox opened this issue Dec 14, 2021 · 4 comments
Open

Consistency Issues with WebSub #174

kevincox opened this issue Dec 14, 2021 · 4 comments

Comments

@kevincox
Copy link
Contributor

Right now there are a number of gaps in WebSub that make it possible to miss updates without hacky workarounds. It would be nice if reliable subscriptions could be provided without extra feed fetches.

Consider the following scenarios:

Fetch - Subscribe Race

  1. Subscriber fetches document.
  2. Document Updates
  3. Subscriber subscribes to hub.

In this case the subscriber would never notice that an item was posted.

Workaround - Refetch

One workaround is to simply refetch the feed a short while after the subscription is confirmed. This still relies on some consistency between HTTP caches and the hub but with a sufficent delay should be sufficient.

Solution - If-Match

If the subscriber could include an ETag or Last-Modified header from when they fetched the document the hub could notify the subscriber if there are missed entires.

Solution - Proactive Push

Another option is that once a subscription is confirmed the hub could push the full "current" state of the document form its point of view. Future delta updates are sent relative to the initial full push.

Resubscribe Race

  1. Subscription expires.
  2. Document updates.
  3. Subscriber "re-subscribes"

IIUC there is currently no indication if a subscription request updated an existing subscription or created a new subscription.

Workaround - Resubscribe Early

If resubscribing sufficiently early it is probably safe to assume that clocks are vaguely in sync.

Workaround - Refetch

Much like the initial subscription race you can simply refetch the feed a short while after the subscription is confirmed.

Solution - Proactive Push

If using proactive push above this could be done for new subscriptions, for extended subscriptions the push would not reoccur.

Solution - Resubscription Confirmation

The hub can respond to the subscription request with some sort of indicator confirming that the subscription has been uninterrupted.

@dissolve
Copy link
Contributor

I'm not sure the resubscribe case even makes sense. The time between unsubscribe and resubscribe could be significant. So it's sort of assumed the client should be pulling a full update. It basically reduces to the same as the subscribe.

That said, I'm not sure consistency is ever promised with WebSub. It's sort of not the intent of the spec. One could easily implement with just a full pull of the document on every notification and it would still serve as a significant improvement over continual polling for updates.

@kevincox
Copy link
Contributor Author

kevincox commented Dec 14, 2021

If the resubscribe is expected to fetch the full feed first that is fine. I thought it was desirable that a subscriber could just keep resubscribing without needing to refetch the feed. But I agree it is not a major concern if it should be started with a refetch.

One could easily implement with just a full pull of the document on every notification and it would still serve as a significant improvement over continual polling for updates.

This isn't quite true. Because if a feed updates infrequently (maybe a blog that updates ~monthly) and I set up a long subscription (a couple of weeks) I may miss that post until I poll again at the end of the subscription. The chance of this happening is low but it would be nice if there wasn't a possibility of a very long delay for updates, only bounded by the subscription length. This is my biggest concern, that there is this gap between the fetch and the subscription being active. Especially with CDN caching and similar this is a non-trivial (but usually not huge) gap that will cause very long delays if an update happens to fall into it.

Or is the intent that the subscriber continues to poll as normal and WebSub is just for faster updates between polls, not for reducing required work.

@sandhawke
Copy link
Collaborator

As I read the spec, the expectation is that one will re-subscribe well before expiration, to prevent this gap you mention:

This is required so subscribers can renew their subscriptions before the lease seconds period is over without any interruption.

at the end of https://www.w3.org/TR/websub/#subscriber-sends-subscription-request

I don't see (or remember) anything meant to address the "Subscribe Race". Doing a conditional request (using if-none-match) as you suggest seems fairly reasonable, but I agree including an ETag in the subscription request could be nice. I don't recall that being discussed, but it wouldn't surprise me if it was seen as putting complexity in the hub which can be handled by the subscriber. Keeping hubs as simple as possible was seen a priority here. If there were a thriving market for hubs, that might be different.

@kevincox
Copy link
Contributor Author

As I read the spec, the expectation is that one will re-subscribe well before expiration, to prevent this gap you mention:

This is a good idea but not a strong guarantee. Also if you want to rely on the hub for long periods of time it is nice to have confirmation that the subscription is continued, not forgotten somewhere. I suspect it would be fairly easy for most hubs to include something like hub.resubscribe=true which completely removes this race condition and reassures against forgotten subscriptions for other reasons.

I agree that the current state is ok, but it would definitely be a nice-to-have for a very small implementation cost.

it wouldn't surprise me if it was seen as putting complexity in the hub which can be handled by the subscriber

I don't think this really can be handled by the subscriber. Other than doing one extra poll and hoping for the best there is no good solution here. The core of the problem is that the hub may be sending delta updates, but the subscriber doesn't know what the base version is! There is an assumption that the subscriber's fetch was a superset of the hub's current state but that isn't a great assumption if you ask me.

I agree that it would be good to keep complexity low on the hub but it would be nice to have an optional solution here that can be used to close this gap.

My current thinking would be something like:

  1. An optional subscribe parameter hub.etag which can be used to pass the etag that the subscriber is aware of.
  2. (maybe) An optional subscribe parameter hub.last_modified which can be used to pass the last modified date that the subscriber is aware of.
  3. After confirming the subscription the hub will send an update with the full contents that it is aware of to the subscriber.
  • If etag or last-modified value passed is known this can be a delta-push based on that version. If the delta is empty it can be skipped entirely.

Pros:

  • An updated hub fixes this gap for all subscribers.
  • Works correctly even for feeds with no etag or last-modified.
  • For the etag case it is just as efficient.
  • Simple to implement the basics (non-delta push).
  • Can be implemented in a very resource-efficient manor for all sides.
  • Forwards and backwards compatible. Maybe just need to add an indicator to subscribers so that they know if the hub supports this or if they should perform another poll to check if something appeared in "the gap".

Cons:

  • For optimal deltas the hub needs to remember a couple versions.
  • With no etag the push adds extra work for hub + subscriber (which would affect all new subscriptions for older subscribers).
  • A hub with no cache will need to refetch the feed for every subscription.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants