-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daily updates #24
Comments
TSV files are always grouped by PPN. The set of rows for each PPN is either as known, e.g:
resultung in rows
See method |
So the next step would be to add an update script that calls methods in the SQLite backend, and that also allows both partial and full updates? Something like:
Full updates would clear the whole table instead of deleting records for single PPNs, so we would likely need an additional method in the backend. Also needs a |
Currently only supports full import via TSV file. No documentation yet.
@nichtich I feel like partial imports are not yet 100% clear. My suggestion for the TSV format for partial import would be this:
= delete all records for PPN 12345
= delete all RVK records for PPN 12345
= add record for PPN 12345 (but do not delete anything) For example, if the update would 1) remove the existing DDC record, 2) replace the one existing RVK record, and 3) add an addition BK record, it would look like this:
Or would you prefer to do it differently? I think this would cover all cases, even though removal of a single record would mean all other record for that PPN/vocabulary would need to be listed again. (I think in your case, removal of a single record would mean ALL other records for that PPN, regardless of vocab, would need to be listed again.) |
There's now a basic working implementation of the import script. It will be finished in #27. |
This is not part of the software but its deployment and configuration, so closing this issue. |
Related to #17 there should also be an update script that can handle partial updates. The update could be a .tsv or .tsv.gz file as well but it may include rows with empty vocabulary (just the PPN) to indicate removal of a record:
Alternatively keep a full dump as file and apply update to this file to get an updated full dump (may even be faster, depending on size of updates).
Use case: There are a daily jobs at K10plus CBS database to pass updated records to LBS and to K10plus central Solr index.
The text was updated successfully, but these errors were encountered: