Daily updates #24

nichtich · 2022-08-25T10:13:24Z

Related to #17 there should also be an update script that can handle partial updates. The update could be a .tsv or .tsv.gz file as well but it may include rows with empty vocabulary (just the PPN) to indicate removal of a record:

awk '{print $1}' update.tsv | uniq > ppns # filter out affected records
# TODO: remove rows with PPN in file ppns
awk -F'\t' '$2{print}' update.tsv > import.tsv # filter out rows with PPN only (records without subject indexing)
# TODO: import import.tsv into database without purging database

Alternatively keep a full dump as file and apply update to this file to get an updated full dump (may even be faster, depending on size of updates).

Use case: There are a daily jobs at K10plus CBS database to pass updated records to LBS and to K10plus central Solr index.

The text was updated successfully, but these errors were encountered:

nichtich · 2022-08-30T13:38:28Z

TSV files are always grouped by PPN. The set of rows for each PPN is either as known, e.g:

12345   rvk      XY 333
12345   bk       33.33

resultung in rows [{voc: "rvk", notation: "XY 333"}, {voc: "bk", "notation": "33.33"}] or it's just one row with empty voc and notation to only delete the record (rows = []):

See method updateRecord in SQLite Backend (dev branch) to be passed this parsed TSV data.

stefandesu · 2022-08-31T08:03:43Z

So the next step would be to add an update script that calls methods in the SQLite backend, and that also allows both partial and full updates? Something like:

# partial update by default
./bin/import update.tsv
# full update with flag
./bin/import --full subjects.tsv

Full updates would clear the whole table instead of deleting records for single PPNs, so we would likely need an additional method in the backend.

Also needs a --modified flag for #25 and update the modified metadata in the database.

Currently only supports full import via TSV file. No documentation yet.

stefandesu · 2022-09-02T09:47:52Z

@nichtich I feel like partial imports are not yet 100% clear. My suggestion for the TSV format for partial import would be this:

= delete all records for PPN 12345

12345	rvk

= delete all RVK records for PPN 12345

12345	rvk	XY 333

= add record for PPN 12345 (but do not delete anything)

For example, if the update would 1) remove the existing DDC record, 2) replace the one existing RVK record, and 3) add an addition BK record, it would look like this:

12345	ddc
12345	rvk
12345	rvk	XY 333
12345	bk	33.33

Or would you prefer to do it differently? I think this would cover all cases, even though removal of a single record would mean all other record for that PPN/vocabulary would need to be listed again. (I think in your case, removal of a single record would mean ALL other records for that PPN, regardless of vocab, would need to be listed again.)

stefandesu · 2022-09-05T08:00:22Z

There's now a basic working implementation of the import script. It will be finished in #27.

nichtich · 2023-08-08T06:49:41Z

This is not part of the software but its deployment and configuration, so closing this issue.

This was referenced Aug 25, 2022

Include modification date #25

Closed

Add import script #27

Closed

stefandesu added a commit that referenced this issue Sep 2, 2022

Small fix for updateRecord method in SQLite backend (#24)

74294f8

stefandesu added a commit that referenced this issue Sep 2, 2022

SQLite backend: Add batchImport method (#24)

2a21a7d

stefandesu added a commit that referenced this issue Sep 2, 2022

Add first implementation of import script (#24)

4a6f4f2

Currently only supports full import via TSV file. No documentation yet.

stefandesu added a commit that referenced this issue Sep 5, 2022

Import script: Support partial import (#24)

93f6caf

stefandesu added a commit that referenced this issue Sep 5, 2022

Import script: Update modified metadata value (#24)

b4ab0cd

nichtich closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daily updates #24

Daily updates #24

nichtich commented Aug 25, 2022 •

edited

Loading

nichtich commented Aug 30, 2022

stefandesu commented Aug 31, 2022

stefandesu commented Sep 2, 2022

stefandesu commented Sep 5, 2022 •

edited

Loading

nichtich commented Aug 8, 2023

Daily updates #24

Daily updates #24

Comments

nichtich commented Aug 25, 2022 • edited Loading

nichtich commented Aug 30, 2022

stefandesu commented Aug 31, 2022

stefandesu commented Sep 2, 2022

stefandesu commented Sep 5, 2022 • edited Loading

nichtich commented Aug 8, 2023

nichtich commented Aug 25, 2022 •

edited

Loading

stefandesu commented Sep 5, 2022 •

edited

Loading