-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upper bound of 10000 queries means I can't access the entirety of INSPIRE institutions #20
Comments
You're doing everything right, this is an unfortunate limitation on our side (ElasticSearch is used as a search engine, but the API we're using for pagination has a limit at 10000 results). I hope we can improve this soon by switching to a different pagination mechanism, but in the meantime you can use the following workaround. Add to the search query (which is empty in your case) an additional filter ensuring that you receive less than 10000 results back for a single search, then manually change the values you're filtering on. It's convenient to use a range of
For Literature, which has a much higher density of records and uses a custom query parser, you'd do something like
|
Link to 10000 results workaround in #20
What you write here is working very well. Thank you for adding it to the documentation. I think it will be clear how to circumvent this issue if someone starts using the API for their own project and find issues. Please keep up the great work on developing this infrastructure, it is crucial for meta-analyses and I look forward to sharing the results of our project with you when they come to fruition! |
Hi @michamos
Probably there could be a natural sorting for all the objects and the users can get top 10, 1000, etc ... |
I am trying to use the API to scrape the geographical distribution information for publications throughout the world to get a handle on the differences of publications by institutions located in different regions of the world. As such, I am trying to make calls to URLs like
which allows me to query the metadata associated with the insitutional publication records.
This works well and allows me to get all the information I need. However, there seems to be an upper limit on being able to access all of the data because when I try a call like
I get a return of
Now, I see that there is a maximum number of simultaneous returns that can be requested of 1000, but this upper bound of 10000 is causing issues because it means I can't access the data for the full set of 11791 institutions that have publications in HEP via this API.
Is there some reason why this upper bound exists? Or am I misusing the API?
The text was updated successfully, but these errors were encountered: