Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't get over 10000 results from the API #310

Closed
jaanisoe opened this issue Jan 8, 2018 · 15 comments
Closed

Can't get over 10000 results from the API #310

jaanisoe opened this issue Jan 8, 2018 · 15 comments
Assignees
Labels
API Concerns the bio.tools API. bug A bug or suspected bug. critical priority Our top priorities, including most of the reported bugs. fix verified An issue labelled "done - staged for release" has been independently verified as working OK.

Comments

@jaanisoe
Copy link

jaanisoe commented Jan 8, 2018

Or more precisely, the maximum working value of the page parameter is 1000:
https://bio.tools/api/tool?page=1000

When the next page is tried (as listed by "next": "?page=1001"), I get a "Server Error (500)":
https://bio.tools/api/tool?page=1001

@albangaignard
Copy link
Contributor

I'm writing a crawler for data exchange and the same happens when page = 1001.

@hansioan
Copy link
Member

In the case of bio.tools it would make sense to have page 1001 since there are 10,000+ tools and 10 tools per page. But I think this is a hard limit in the code/framework/server which gives the server error.

I've tried the same on dev.bio.tools:
https://dev.bio.tools/api/tool?page=1001
and get the same error, but since on dev.bio.tools there are only 9,582 resources we can enforce this hard limit assumption by checking that page 999 and 1000:
https://dev.bio.tools/api/tool?page=999
https://dev.bio.tools/api/tool?page=1000
actually give a 404 which is supposed to happen after the page number is too large to hold any results, which should be the case of 1001 as well.

@joncison joncison added bug A bug or suspected bug. API Concerns the bio.tools API. critical priority Our top priorities, including most of the reported bugs. labels Jan 11, 2018
@redmitry
Copy link
Contributor

It looks DJango limit:
http://www.django-rest-framework.org/api-guide/pagination/

one should modify the limit:

class LargeResultsSetPagination(PageNumberPagination):
    page_size = 1000
    page_size_query_param = 'page_size'
    max_page_size = 10000

class StandardResultsSetPagination(PageNumberPagination):
    page_size = 100
    page_size_query_param = 'page_size'
    max_page_size = 1000

@joncison joncison assigned piotrgithub1 and unassigned ekry Feb 14, 2018
@joncison
Copy link
Member

Reassigning to you @piotrgithub1 - this looks like it's easy to fix

@piotrgithub1
Copy link

Hi @jaanisoe
Apologies for the inconvenience, we honestly did not expect anyone to go so deep in the page number as supporting ontology is supposed to be helpful in quickly get what you are looking for.
The fix is not as easy as one would expect, we'll put it on top of the new todo list after we are done with current batch of issues.

Best regards,
Piotr

@joncison
Copy link
Member

joncison commented Feb 14, 2018

thanks @piotrgithub1 - we have already quite a few groups / dependencies (in USA, France, Spain ...) which are taking the whole bio.tools data, so there's a real use-case here.

@piotrgithub1
Copy link

Made a temporary workaround for the bug.
Long-term we need to take a closer look at how this is used by the dependant services and make a more use specialized functionality for getting larger amount of content out of the registry as tight coupling with ontologies is clearly insufficient.

NOTE: leaving this open until we get a confirmation from @jaanisoe

Best regards,
Piotr

@albangaignard
Copy link
Contributor

albangaignard commented Feb 16, 2018 via email

@jaanisoe
Copy link
Author

Yes, it's working now. Thanks!

@piotrgithub1
Copy link

Awesome :)
As it is a temporary workaround while we are in the process of making a permanent fix, I will monitor it daily if it works.

@albangaignard
Copy link
Contributor

albangaignard commented Feb 23, 2018 via email

@joncison
Copy link
Member

joncison commented Sep 4, 2018

We need to check this still works when we make the next big release

@joncison joncison added the done - staged for release Issue is implemented in dev.bio.tools. label Sep 4, 2018
@joncison
Copy link
Member

joncison commented Sep 4, 2018

See also #355

@hmenager
Copy link
Member

hmenager commented Dec 2, 2018

fix confirmed

@joncison joncison added fix verified An issue labelled "done - staged for release" has been independently verified as working OK. and removed done - staged for release Issue is implemented in dev.bio.tools. labels Dec 14, 2018
@joncison
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Concerns the bio.tools API. bug A bug or suspected bug. critical priority Our top priorities, including most of the reported bugs. fix verified An issue labelled "done - staged for release" has been independently verified as working OK.
Projects
None yet
Development

No branches or pull requests

8 participants