Integrating with EIDF Milestone

A key goal of the overarching battery data archive project is to make data available for large-scale analysis. This will be enabled by storing data within Edinburgh International Data Facility systems and providing Galvanalyser data provision services within the same systems.

There are three key components to EIDF integration:

Sending data to EIDF
Proces…

A key goal of the overarching battery data archive project is to make data available for large-scale analysis. This will be enabled by storing data within Edinburgh International Data Facility systems and providing Galvanalyser data provision services within the same systems.

There are three key components to EIDF integration:

Sending data to EIDF
Processing data within EIDF
Providing access to data stored in EIDF systems
Galvanalyser will be responsible for aspects of all three of these components.

Requirements

Sending data to EIDF

Galvanalyser should:

monitor battery data directories
extract file and dataset metadata
allow metadata to be checked and edited by authorised users
send raw files and metadata to 3rd parties (i.e. the EIDF's CKAN ingestion service)

Processing data within EIDF

Galvanalyser should:

search for unprocessed uploaded files on CKAN
'checkout' a file by registering it in a database
process a file to extract its data into the generic Galvanalyser format
send processed data to CKAN, linked to the raw data as a related record

Provide access to data

Galvanalyser should:

allow users to log in via CKAN authorisation service
provide data overview to authorised users via a web interface
provide data to authorised users via a Python API

Restructure

To perform the various roles Galvanalyser is called upon to play in the EIDF-integrated setup, Galvanalyser will have to be restructured.

The current implementation of Galvanalyser is suitable for lab environments where tens of directories are being monitored. For sending data to the EIDF, this complete version of Galvanalyser is more than sufficient. The data processing functionality is optional (and will in fact slow down the process because datasets are not available for metadata editing until their content is processed).
For processing data on the EIDF, Galvanalyser's harvesters should have a parallel implementation where their inputs and outputs are not monitored directories and the Galvanalyser backend, but the CKAN data lake and a Galvanalyser-style database to which they can write directly. Once files are completely processed, records should be sent to the CKAN data lake and linked to the raw data.
- Note: We'll still need some core stuff here about column types, units, etc. When sending stuff back to the CKAN data lake, it may be necessary to expand some of that generic information to provide complete files.
To provide access to data, a READ ONLY version of the Galvanalyser REST API and its Python API and web frontend counterparts should be plugged into the EIDF Galvanalyser database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly