Releases · jpkanter/Spcht

06 Aug 10:38

jpkanter

e5c493b

Reordering Latest

Latest

This is the first complete version of the Solr2Triplestore project. While there are still a lot of ideas and other minor things that can use some attention i think the overall procedure and functionality is finally where i wanted it to be.

In this release the README and especially the tutorial how to use the SpchtDescriptor was massively overhauled. There is also now a definite JSONSchema for the correct formatting for any SpchtDescriptors.

In the same change some renaming were done, the field 'graph' is now correctly called 'predicate', overall the name graph was replaced by more appropriate and correct words. Things now should be called subject, predicate, object or URI when i used 'graph' before. The joined_graph function now is also called joined_map.

The way mapping is configured was updated, it is now more clear as $default and $inherit don't share a key anymore. There is a new Regex validation on load and it is possible to use regex keys in maps, with the minor caveat that all keys then have to be regex and that there might performance impacts.

Internally the way the processing takes places was written anew, all functions should now work in concert with all others (except insert_into in joined_map which makes logically no sense). The order in which procedures are now applied are is:

if condition
pre processing (match)
post processing (cut, replace, append, prepend)
mappings
insert_into strings

There is also no difference between marc and dict data sources anymore, both use the exact same procedures and all effects should work exactly the same. Only caveat is still the function insert_into that does not allow mixing of sources for now.

Other parts were not touched, WorkOrder still works exactly as before.

The former external project SpchtCheckerGui was integrated into the project code, it uses a simple i18n implementation now. Also there is a new dependency for PySide2 because of this. If the GUI is not needed PySide2 is not necessary.

Assets 2

14 Jul 08:46

jpkanter

v0.6

3547632

WorkOrder complete

This release overhauls the entire way the bridge worked compared to before. New is the concept of WorkOrder.

As i noticed the process of processing data from one source to another database can take measurable time, especially if the insert processes are inefficient or just capped by some external factors. The previous iteration had no way to continue a once started process and if something went wrong or something just temporarily broke any phase an entire, multi-hour process might been unrecoverable.

WorkOrder is a fancy name for a json file that breaks the big process in multiple part, mainly determined by the size of the downloaded file. There are some concerns with disk space to be made here. Additional to the downloaded files there will be also ready processed turtle files saved to disk till they are inserted. After those process those will be deleted but for the time being its something to consider.

A WorkOrder contains a meta description and a file list where the processed and raw files are linked, additional some statistics are added in the meta and file specific parts, those meta data will remain after completion.

Additionally the cli interface was overhauled, it contains almost no debug options anymore and was moved from the main.py to an external json file to keep the logic separated from the data.

Assets 2

07 May 08:22

jpkanter

0.4

ed99a30

Version 0.4 - Beta release

general Spcht functionality and class work as expected. Testing is barely existing. Need some more actual testing of other functions. Transfer from Solr/Marc to Virtuoso does work in main.py. Automatic updates do work as well.

Some work left to be an actual release.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: jpkanter/Spcht

Reordering

WorkOrder complete

Version 0.4 - Beta release