Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce pagination of resource listings from the server-side #550

Open
soxofaan opened this issue Dec 3, 2024 · 12 comments
Open

Enforce pagination of resource listings from the server-side #550

soxofaan opened this issue Dec 3, 2024 · 12 comments
Labels
breaking Breaking changes, requires a major-version (2.0.0 for example)
Milestone

Comments

@soxofaan
Copy link
Member

soxofaan commented Dec 3, 2024

Context: we're battling performance issues caused by job listings from user with a lot of jobs, so we're looking into implementing/improving pagination of job listings.

Some notes about what the spec currently states about pagination of job listings:

GET /logs
Lists all batch jobs submitted by a user
...
query param limit:
This parameter enables pagination
Pagination is OPTIONAL: back-ends or clients may not support it. Therefore it MUST be implemented in a way that clients not supporting pagination get all resources regardless. Back-ends not supporting pagination MUST return all resources

In nutshell: pagination is optional and should be opt-in from client side (by setting limit param).

I'm afraid however (as we currently see in practice) that this is not a sustainable behavior: the backend must also be able to limit the job listing and enable pagination. It's not uncommon for larger use cases to have users with multiple thousands of jobs, and fetching all the related metadata/status of all these jobs regularly is quite expensive, especially as most users just look at, say the last 10 jobs anyway.

Some more concrete questions:

  • is there any wiggle room to finetune the spec to also allow the backend to enable pagination in some way?
  • what should a backend do, under existing spec, if they want to enable pagination even if user/client didn't use limit? There doesn't seem like a solution that is both pragmatic and follows the spec to the letter.
    • E.g. can a backend use a default limit value, e.g. 100, which is different from "unset by default" in the spec?
    • Raise an error if user has too much jobs
@m-mohr
Copy link
Member

m-mohr commented Dec 4, 2024

Unfortunately, it seems this is only possible in v2.0 in the spec. What we can do in the meantime is to implement pagination in clients so that if a back-end supports it, it is automatically used.

I see issues in Web Editor and Python Client, but we should probably open issues in JS and R (and Julia if they have job listings implemented). Opened one for JS: Open-EO/openeo-js-client#64

@m-mohr m-mohr added the breaking Breaking changes, requires a major-version (2.0.0 for example) label Dec 4, 2024
@m-mohr m-mohr added this to the 2.0.0 milestone Dec 4, 2024
@m-mohr
Copy link
Member

m-mohr commented Dec 4, 2024

We discussed potential default values for the clients and thought 100 or maybe 50 seem to be reasonable defaults.

@soxofaan
Copy link
Member Author

soxofaan commented Dec 6, 2024

Another pagination thing that might deserve some clarification in current spec is how jobs are partitioned in pages.
I guess it makes most sense to order by creation date from recent to older, so that first page (the initial listing request) has the most recently created jobs

@m-mohr
Copy link
Member

m-mohr commented Dec 6, 2024

That's up to the backend to decide, but I guess it makes most sense for backends to sort by creation or last update date. I guess we can add a recommendation, but I'm not sure what the best recommendation is :-)

@soxofaan
Copy link
Member Author

soxofaan commented Dec 6, 2024

Usability- and user-experience-wise I would require to show most recent jobs first, those are the jobs users are most interested in in practice, so you don't want them to have to wait for possibly tens of HTTP roundtrips just to get the most interesting page

@m-mohr
Copy link
Member

m-mohr commented Dec 6, 2024

Yeah, but most recent for me would mean sorted by "updated".
But updated is optional, we only require "created".

@soxofaan
Copy link
Member Author

soxofaan commented Dec 6, 2024

with recent I mean "by created" here (not "updated"), indeed because it is a required value

@m-mohr
Copy link
Member

m-mohr commented Dec 6, 2024

created is probably also better because it's stable so that entries don't jump between pages... okay, convinced ;-)
But then, what's the recommended value for the other paginated resources?

@m-mohr m-mohr changed the title Pagination of batch job listings Pagination of resource listings Dec 6, 2024
@m-mohr m-mohr changed the title Pagination of resource listings Enforce pagination of resource listings from the server-side Dec 6, 2024
@soxofaan
Copy link
Member Author

soxofaan commented Dec 6, 2024

But then, what's the recommended value for the other paginated resources?

For "user-created content" (batch jobs, UDPs, ...) I would go for "more recently created first" as much as possible, because that gives best user experience.
For back-end managed resource (collections, processes, ...) I think it's fine to leave that open to back-ends (or recommend alphabetical by id)

@m-mohr
Copy link
Member

m-mohr commented Dec 17, 2024

That opens up the question whether we should add a sorting parameter ;-)

For example, for UDPs I could see that alphabetical would also be sorted by ID instead of creation. UDPs are not that different from processes. Similarly, files could also be sorted by path, which I would find more user-friendly ;-) So I'm not sure I necessarily agree with sorting by creation for all user-submitted resources.

@soxofaan
Copy link
Member Author

Yes, in the long term a configurable sorting parameter (and direction) might become important, but I don't think we should wait for that to settle, we can already set some recommendations without that.
I also think that clients can play a role here to adapt the low-level, limited API aspects to a more user friendly experience (e.g. show UDPs alphabetically even if they are paged on creation date)

The main triggering issue here is that there are users that create/handle a lot of jobs (hundreds, thousands, ...). With all other resources there is not a comparable scaling issue as far as I know.

From a pragmatic, incremental viewpoint, I see it like this:

  • recommend pagination of jobs by creation date from newer to older (as mentioned higher, this gives best UX I think)
  • for other resources: let backends decide on page order for now, that gives some time to experiment and collect feedback on what feels best
  • leave it to clients to fill in the UX gaps (e.g. show alphabetical even if pagination is not like that)

@m-mohr
Copy link
Member

m-mohr commented Dec 18, 2024

Opened a separate issue for sorting: #555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes, requires a major-version (2.0.0 for example)
Projects
None yet
Development

No branches or pull requests

2 participants