Can't get over 10000 results from the API #310

jaanisoe · 2018-01-08T15:39:54Z

Or more precisely, the maximum working value of the page parameter is 1000:
https://bio.tools/api/tool?page=1000

When the next page is tried (as listed by "next": "?page=1001"), I get a "Server Error (500)":
https://bio.tools/api/tool?page=1001

The text was updated successfully, but these errors were encountered:

albangaignard · 2018-01-11T14:37:03Z

I'm writing a crawler for data exchange and the same happens when page = 1001.

hansioan · 2018-01-11T14:51:38Z

In the case of bio.tools it would make sense to have page 1001 since there are 10,000+ tools and 10 tools per page. But I think this is a hard limit in the code/framework/server which gives the server error.

I've tried the same on dev.bio.tools:
https://dev.bio.tools/api/tool?page=1001
and get the same error, but since on dev.bio.tools there are only 9,582 resources we can enforce this hard limit assumption by checking that page 999 and 1000:
https://dev.bio.tools/api/tool?page=999
https://dev.bio.tools/api/tool?page=1000
actually give a 404 which is supposed to happen after the page number is too large to hold any results, which should be the case of 1001 as well.

redmitry · 2018-02-13T15:37:18Z

It looks DJango limit:
http://www.django-rest-framework.org/api-guide/pagination/

one should modify the limit:

class LargeResultsSetPagination(PageNumberPagination):
    page_size = 1000
    page_size_query_param = 'page_size'
    max_page_size = 10000

class StandardResultsSetPagination(PageNumberPagination):
    page_size = 100
    page_size_query_param = 'page_size'
    max_page_size = 1000

joncison · 2018-02-14T11:29:57Z

Reassigning to you @piotrgithub1 - this looks like it's easy to fix

piotrgithub1 · 2018-02-14T11:49:05Z

Hi @jaanisoe
Apologies for the inconvenience, we honestly did not expect anyone to go so deep in the page number as supporting ontology is supposed to be helpful in quickly get what you are looking for.
The fix is not as easy as one would expect, we'll put it on top of the new todo list after we are done with current batch of issues.

Best regards,
Piotr

joncison · 2018-02-14T11:56:00Z

thanks @piotrgithub1 - we have already quite a few groups / dependencies (in USA, France, Spain ...) which are taking the whole bio.tools data, so there's a real use-case here.

piotrgithub1 · 2018-02-16T14:02:07Z

Made a temporary workaround for the bug.
Long-term we need to take a closer look at how this is used by the dependant services and make a more use specialized functionality for getting larger amount of content out of the registry as tight coupling with ontologies is clearly insufficient.

NOTE: leaving this open until we get a confirmation from @jaanisoe

Best regards,
Piotr

albangaignard · 2018-02-16T14:09:05Z

Thanks a lot, Piotr ! I will try it right know. The use case is the following : - iterate over all entries and get its JSON serialization - transform it into RDF (JSON-LD) - populate a knowledge base for further semantic web querying. Alban

…

Le 16 févr. 2018 à 15:02, piotrgithub1 ***@***.***> a écrit : Made a temporary workaround for the bug. Long-term we need to take a closer look at how this is used by the dependant services and make a more use specialized functionality for getting larger amount of content out of the registry as tight coupling with ontologies is clearly insufficient. NOTE: leaving this open until we get a confirmation from @jaanisoe <https://github.com/jaanisoe> Best regards, Piotr — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#310 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKLGELJDYRjFcfeMFTsgxBfO8ENZrHOkks5tVYpggaJpZM4RWi3C>.

-- Alban Gaignard, PhD, CNRS

jaanisoe · 2018-02-23T07:35:19Z

Yes, it's working now. Thanks!

piotrgithub1 · 2018-02-23T07:36:57Z

Awesome :)
As it is a temporary workaround while we are in the process of making a permanent fix, I will monitor it daily if it works.

albangaignard · 2018-02-23T08:11:17Z

Hi all, I confirm, I've been able to crawl all the 10059 entries ! Thanks again, Alban

…

Le 16 févr. 2018 à 15:09, Alban Gaignard ***@***.***> a écrit : Thanks a lot, Piotr ! I will try it right know. The use case is the following : - iterate over all entries and get its JSON serialization - transform it into RDF (JSON-LD) - populate a knowledge base for further semantic web querying. Alban > Le 16 févr. 2018 à 15:02, piotrgithub1 ***@***.*** ***@***.***>> a écrit : > > Made a temporary workaround for the bug. > Long-term we need to take a closer look at how this is used by the dependant services and make a more use specialized functionality for getting larger amount of content out of the registry as tight coupling with ontologies is clearly insufficient. > > NOTE: leaving this open until we get a confirmation from @jaanisoe <https://github.com/jaanisoe> > Best regards, > Piotr > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub <#310 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKLGELJDYRjFcfeMFTsgxBfO8ENZrHOkks5tVYpggaJpZM4RWi3C>. > -- Alban Gaignard, PhD, CNRS

-- Alban Gaignard, PhD, CNRS

joncison · 2018-09-04T11:25:25Z

We need to check this still works when we make the next big release

joncison · 2018-09-04T11:28:06Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't get over 10000 results from the API #310

Can't get over 10000 results from the API #310

jaanisoe commented Jan 8, 2018

albangaignard commented Jan 11, 2018

hansioan commented Jan 11, 2018

redmitry commented Feb 13, 2018

joncison commented Feb 14, 2018

piotrgithub1 commented Feb 14, 2018

joncison commented Feb 14, 2018 •

edited

Loading

piotrgithub1 commented Feb 16, 2018

albangaignard commented Feb 16, 2018 via email

jaanisoe commented Feb 23, 2018

piotrgithub1 commented Feb 23, 2018

albangaignard commented Feb 23, 2018 via email

joncison commented Sep 4, 2018

joncison commented Sep 4, 2018

hmenager commented Dec 2, 2018

joncison commented Dec 14, 2018

Can't get over 10000 results from the API #310

Can't get over 10000 results from the API #310

Comments

jaanisoe commented Jan 8, 2018

albangaignard commented Jan 11, 2018

hansioan commented Jan 11, 2018

redmitry commented Feb 13, 2018

joncison commented Feb 14, 2018

piotrgithub1 commented Feb 14, 2018

joncison commented Feb 14, 2018 • edited Loading

piotrgithub1 commented Feb 16, 2018

albangaignard commented Feb 16, 2018 via email

jaanisoe commented Feb 23, 2018

piotrgithub1 commented Feb 23, 2018

albangaignard commented Feb 23, 2018 via email

joncison commented Sep 4, 2018

joncison commented Sep 4, 2018

hmenager commented Dec 2, 2018

joncison commented Dec 14, 2018

joncison commented Feb 14, 2018 •

edited

Loading