Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PULP-214] Added docs about on-demand content streaming caveats #6101

Merged

Conversation

pedro-psb
Copy link
Member

We have some problems we can't solve with our current design of how RemoteArtifact and on-demand works.

We are documenting known caveats and limitation to help avoiding surprises.

Closes #1975 #3212

@pedro-psb pedro-psb changed the title Added docs about on-demand content streaming caveats [PULP-214] Added docs about on-demand content streaming caveats Dec 2, 2024

Unless a plugin has enabled either the 'on_demand' or 'streamed' values for the `policy` attribute
you will receive an error. Check that plugin's documentation also.

Example of the "Create Remote" endpoints for some plugins that supports that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"these features" sounds better than "that"


* Given User A and User B both synced the same on-demand content from their separate remotes (there are two different sources for the same content).
* When User B request the content
* Then the credentials used for the download could be User's A.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User A's

* When User B request the content
* Then the credentials used for the download could be User's A.

If a user don't want their registered Remotes to be indirectly used by other users, they should use a separate domain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user "doesn't"

In a worst case, the Content is shared across multiple Repositories and the Remote's removal
can invalidate all those repositories at once.

In either cases, proceed the deletion of a remote with great care.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In either case, proceed with the deletion of a remote with great care.

@@ -39,25 +41,82 @@ instance, syncing from a nightly repo would cause Pulp to store every nightly ev
is likely not valuable. Units created from this mode are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Also download all metadata is likely not valuable" is nearly incomprehensible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Also download all metadata" and "is likely not valuable" are 17 lines apart 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, whoops :)

### External dependency and error handling

The content might become unavailable or corrupted on the remote server.
This makes it hard for Pulp to provide an accurate error messages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with "provide an accurate error message" or "provide accurate error messages"

### Implicit credential sharing within a domain

In the same domain, a request to a on-demand content may use any available Remotes associated with that content,
regardless of who created it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who -> which user


* Given User A and User B both synced the same on-demand content from their separate remotes (there are two different sources for the same content).
* When User B request the content
* Then the credentials used for the download could be User A's.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could potentially be User A's

* Then some request for the content might fail with close connection errors* and future request will try the next ones, eventually reaching the good remote. Even though the content might be reachable, the failures can be confusing.

!!! note "* Why we close the connection?"
The connection closes happens because Pulp streams content directly from remote, so if the content is bad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection close happens (not "closes")

@pedro-psb
Copy link
Member Author

I've ran the new added section through a grammar checker this time.

* Given there is more than one remote for the content and at least one of them is good.
* When the user requests that content through a distribution
* Then some requests for the content might fail with close connection errors* and future requests will try the next ones, eventually reaching the good remote.
Even though the content might be reachable, the failures can be confusing.
Copy link
Contributor

@dralley dralley Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be structured better. For instance, if all remote sources are "corrupted" then you'll get connection closed failures, but those aren't mentioned in the first example, only in the second example.

I also have a mild preference for normal paragraphs here I think? Like:

In the case where the selected remote source for the content is unreachable, then when a user requests that content through a distribution they will receive a 404 error message

In the case where the selected remote source for the content is corrupt, then when a user requests that content through a distribution they will receive a connection closed error <..expand..>

In the case where multiple remote sources exist, then Pulp will cycle through potential remote sources trying each of them in turn until a valid one is found. Users may therefore see failures <..expand..> before a request for the content is successful.

If all remote sources are found to be invalid <....>

Feel free to adjust or rewrite it differently. Making them bullet points is fine too


In the first case, Pulp will try all the available remote sources for the requested content and will return a 404 if all of them fail *with this same type of error*.

In the second case, Pulp already sent the corrupted data to the client and can't recover from it, so it will close the connection to prevent the client from consolidation the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consolidation -> consolidating

In the second case, Pulp already sent the corrupted data to the client and can't recover from it, so it will close the connection to prevent the client from consolidation the file.
When this happens, the content-app will ignore that remote source for a certain amount of time, which will enable future requests to select a different remote source.
A 404 is returned if no more remote source is available (e.g., they're all ignored).
Pulp doesnt't permanently invalidate the remote because it can't know if the error is transient or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo doesnt't

The previous sentence should read "If all remote sources are ignored due to prior failure, then a 404 will be returned for all requests of that content until the cooldown period for one of those sources has expired"

Pulp doesnt't permanently invalidate the remote because it can't know if the error is transient or not.

The second case is complex and can be confusing to the user.
The core reason for this complexity lies on the very nature of on-demand serving, which imposes that Pulp must fetch and stream the content on request time, and has no way to know anything about the remote before that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lies on the very nature of -> lies in the very nature of

@dralley
Copy link
Contributor

dralley commented Dec 4, 2024

This is much better! Thanks 👍

Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dralley
Copy link
Contributor

dralley commented Dec 5, 2024

@pedro-psb Please squash + merge

@pedro-psb pedro-psb force-pushed the docs/add-caveats-about-on-demand-content branch from a56b79c to d32103b Compare December 5, 2024 14:53
We have some problems we can't solve with our current design of how
RemoteArtifact and on-demand works.

We are documenting known caveats and limitation to help avoiding
surprises.

Closes pulp#1975 pulp#3212
@pedro-psb pedro-psb force-pushed the docs/add-caveats-about-on-demand-content branch from d32103b to 8974864 Compare December 5, 2024 14:54
@pedro-psb pedro-psb enabled auto-merge (rebase) December 5, 2024 14:54
@pedro-psb pedro-psb merged commit f7efeb2 into pulp:main Dec 5, 2024
12 checks passed
@pedro-psb pedro-psb deleted the docs/add-caveats-about-on-demand-content branch December 5, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deleting a remote used as source for live content corrupts ContentArtifact records
2 participants