-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Directly harvest oai sources #197
Comments
Yeah was writing it to @david-caro on gitter: for the OAI-PMH crawler see: #198 |
Also remove old tests. Fixes inspirehep#197. Co-authored-by: Samuele Kaplun <[email protected]> Signed-off-by: Szymon Łopaciuk <[email protected]>
Also remove old tests. Fixes inspirehep#197. Co-authored-by: Samuele Kaplun <[email protected]> Signed-off-by: Szymon Łopaciuk <[email protected]>
Also remove old tests. Fixes inspirehep#197. Co-authored-by: Samuele Kaplun <[email protected]> Signed-off-by: Szymon Łopaciuk <[email protected]>
Also remove old tests. Fixes inspirehep#197. Co-authored-by: Samuele Kaplun <[email protected]> Signed-off-by: Szymon Łopaciuk <[email protected]>
Also remove old tests. Fixes inspirehep#197. Signed-off-by: Szymon Łopaciuk <[email protected]>
Also remove old tests. Fixes inspirehep#197. Signed-off-by: Szymon Łopaciuk <[email protected]>
Closed by #203 |
Currently in order to harvest oai services we rely on invenio-oaiharvester from an app like inspire to do it, save the result in xml files on disk, and then harvest those. It would be really nice if we can instead harvest directly from them.
Expected Behavior
For example, for the arxiv spider, instead of harvesting from a file, we should be able to pass a server (like http://export.arxiv.org/oai2), and the parameters for the harvest (like the date from, to and the sets to take into account).
As an example of similar functionality, you can check the
inspirehep oaharvester harvest
command for parameters, we can use https://sickle.readthedocs.io/en/latest/index.html too to do the oai protocol handling.Current Behavior
There's no support to directly harvest from oai enabled services.
The text was updated successfully, but these errors were encountered: