Releases: jpkanter/Spcht
Reordering
This is the first complete version of the Solr2Triplestore project. While there are still a lot of ideas and other minor things that can use some attention i think the overall procedure and functionality is finally where i wanted it to be.
In this release the README and especially the tutorial how to use the SpchtDescriptor was massively overhauled. There is also now a definite JSONSchema for the correct formatting for any SpchtDescriptors.
In the same change some renaming were done, the field 'graph' is now correctly called 'predicate', overall the name graph
was replaced by more appropriate and correct words. Things now should be called subject, predicate, object or URI when i used 'graph' before. The joined_graph
function now is also called joined_map
.
The way mapping is configured was updated, it is now more clear as $default and $inherit don't share a key anymore. There is a new Regex validation on load and it is possible to use regex keys in maps, with the minor caveat that all keys then have to be regex and that there might performance impacts.
Internally the way the processing takes places was written anew, all functions should now work in concert with all others (except insert_into
in joined_map
which makes logically no sense). The order in which procedures are now applied are is:
- if condition
- pre processing (match)
- post processing (cut, replace, append, prepend)
- mappings
- insert_into strings
There is also no difference between marc and dict data sources anymore, both use the exact same procedures and all effects should work exactly the same. Only caveat is still the function insert_into
that does not allow mixing of sources for now.
Other parts were not touched, WorkOrder still works exactly as before.
The former external project SpchtCheckerGui was integrated into the project code, it uses a simple i18n implementation now. Also there is a new dependency for PySide2 because of this. If the GUI is not needed PySide2 is not necessary.
WorkOrder complete
This release overhauls the entire way the bridge worked compared to before. New is the concept of WorkOrder.
As i noticed the process of processing data from one source to another database can take measurable time, especially if the insert processes are inefficient or just capped by some external factors. The previous iteration had no way to continue a once started process and if something went wrong or something just temporarily broke any phase an entire, multi-hour process might been unrecoverable.
WorkOrder is a fancy name for a json file that breaks the big process in multiple part, mainly determined by the size of the downloaded file. There are some concerns with disk space to be made here. Additional to the downloaded files there will be also ready processed turtle files saved to disk till they are inserted. After those process those will be deleted but for the time being its something to consider.
A WorkOrder contains a meta description and a file list where the processed and raw files are linked, additional some statistics are added in the meta and file specific parts, those meta data will remain after completion.
Additionally the cli interface was overhauled, it contains almost no debug options anymore and was moved from the main.py
to an external json file to keep the logic separated from the data.
Version 0.4 - Beta release
general Spcht functionality and class work as expected. Testing is barely existing. Need some more actual testing of other functions. Transfer from Solr/Marc to Virtuoso does work in main.py. Automatic updates do work as well.
Some work left to be an actual release.