Code for instantiating and migrating Materials Provenance Store (MPS) and Materials Experiment Knowledge Graph (MEKG) databases. Please see the corresponding MEKG and MPS papers for additional details.
The code in this repository was used for several key database setup and metric generation tasks:
- Downloading and restoring MPS postgresql dump files
- Downloading and restoring MEKG neo4j dump files
- Migrating an instantiated postgresql MPS database to a neo4j MEKG database
- Running query times against each database to measure performance
- Instantiating a Jupyter notebook connected to each database for easy scripting access
To download the PostgreSQL and Neo4j database dump files, you can use the download-dumps.sh script provided in this repository. This script can be used to download the dump files for either or both databases.
To download the PostgreSQL dump file, run the following command from the root of the repository:
scripts/download-releases.sh --postgresql
This will download the PostgreSQL dump file to the data/releases/postgres
directory.
To download the Neo4j dump file, run the following command from the root of the repository:
scripts/download-releases.sh --neo4j
This will download the Neo4j dump file to the data/releases/neo4j
directory.
By default, the dump files will be saved to the data directory relative to the location of the script. If you want to change the output directory, you can modify the download-releases.sh script accordingly.
Note that the download links may expire after some time. If the links are no longer valid, you may need to obtain new links from the authors.
The postgresql MPS database can be restored into a postgresql docker container using a loading container.
Once the relevant postgresql has been downloaded it can be loaded into the postgresql docker container volume with the following command:
docker-compose run --rm postgres-loader
If the name of the release dump file is different than public-release.dump.
it can be configured by setting the DUMP_FILE
environmental variable prior to loading
POSTGRES_BACKUP_FILE=public_release_20230228_eighth.tar docker-compose run --rm postgres-loader
The MPS postgresql database can be directly connected to after installation if you have the psql
postgresql client utility installed:
PGPASSWORD=mps_password psql -p 7777 -d postgres -Umps_user -h localhost
This requires the postgres container to be running (it will be in the running state after loading the database). If the container has stopped, you can restart it with the following command:
docker-compose up postgres
Once the relevant neo4j.dump
has been downloaded the MEKG neo4j database can be instantiated with the following docker-compose command:
docker-compose run --rm neo4j-loader
This will run the neo4j-admin database load
command and load the instantiate the neo4j database in a docker volume.
Due to a limitation of the neo4j-admin
utility the file must be named neo4j.dump
in the data/releases/neo4j
folder.
Once the MEKG dump has been restored the neo4j database can be started with the following command:
docker-compose up neo4j -d
Once the MEKG dump has been restored the neo4j you can navigate to http://localhost:7474/ to view the database in the neo4j-browser application. The authentication is turned off by default so you can simply click the connect button without filling in the credentials.
The database can also be interacted with programatically through a connection at http://localhost:7687
Once the database has succesfully been loaded the jupyter server can be turned on with a simple docker-compose up command
docker-compose up jupyter -d
-d
is appended here to start the servers in detached mode to run the servers in the background.
The server can be found at http://localhost:8888. The mps-client library has been pre-seeded with the credentials required to connect to both database. An example notebook has been provided for running queries against each database in src/QueryTimings.ipynb
See Migration
All containers can be turned off with a full docker-copmose down command
docker-compose down
This will preserve the volumes on the databases so that on restart data does not need to be reloaded. To fully delete the datbase volumes and the storage they consume add the -v flag.
docker-compose down -v