-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #24 from NRGI/taglifter
Merge in work based on Tim's taglifter library. Includes multiple transform scripts, and a transform/load a workflow that works entirely within docker containers.
- Loading branch information
Showing
52 changed files
with
10,046 additions
and
21,711 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,8 @@ | ||
__pycache__ | ||
data | ||
data/* | ||
*.swp | ||
*~ | ||
.ve | ||
.ipynb_checkpoints | ||
process/*/data | ||
ontology/catalog-v001.xml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
FROM python:3.4-onbuild | ||
CMD "./transform_all.sh" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,45 @@ | ||
# resource-projects-etl | ||
Extract, Transform and Load processes for rp.org | ||
|
||
# Requirements | ||
This repository contains a library for Extract, Transform and Load processes for ResourceProjects.org. | ||
|
||
Python 3 | ||
You can report issues with current transformations, or suggest sources which should be added to this library using the GitHub issue tracker. | ||
|
||
# Getting started | ||
|
||
## Processes | ||
Each process, located in the **process** folder consists of a collection of files that either (a) document a manual transformation of the data; or (b) perform an automated transformation. | ||
|
||
Folders may contain: | ||
|
||
* A README.md file describing the transformation | ||
* An extract.sh or extract.py file to fetch the file | ||
* A data/ subfolder where the extracted data is stored during conversion (ignored by git) | ||
* A transform.py file which runs the transformations | ||
* A meta.json file, containing the meta-data which transform.py will use | ||
* A prov.ttl file containing provenance information (using [PROV-O](www.w3.org/TR/prov-o)) to be merged into the final graph | ||
|
||
The output of each process should be written to the root /data/ folder, from where it can be loaded onto the ResourceProjects.org platform. | ||
|
||
|
||
|
||
## Requirements | ||
|
||
* Python 3 | ||
* Bash | ||
|
||
### Getting started | ||
|
||
``` | ||
virtualenv .ve --python=/usr/bin/python3 | ||
source .ve/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Running with docker | ||
|
||
``` | ||
docker rm -f rp-etl rp-load | ||
docker run --name rp-etl -v /usr/src/app/data -v /usr/src/app/ontology bjwebb/resource-projects-etl | ||
docker run --name rp-load --link virtuoso:virtuoso --volumes-from virtuoso --volumes-from rp-etl --rm bjwebb/resource-projects-etl-load | ||
``` | ||
|
||
To run the last command you will need [virtuoso container running](https://github.com/NRGI/resourceprojects.org-frontend/#pre-requisites). |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Ignore everything in this directory | ||
* | ||
# Except this file | ||
!.gitignore |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
FROM caprenter/automated-build-virtuoso | ||
ADD load.sh /load.sh | ||
ADD import.sql /import.sql | ||
CMD /load.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Docker container definition for loading data into virtuoso. | ||
|
||
You will need a [virtuoso container running](https://github.com/NRGI/resourceprojects.org-frontend/#pre-requisites). | ||
|
||
Then from this directory: | ||
|
||
``` | ||
docker build -t rp-load . | ||
cd .. | ||
docker run --name rp-load --link virtuoso:virtuoso --volumes-from virtuoso -v `pwd`/data:/data --rm rp-load | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
SPARQL CLEAR GRAPH <http://resourceprojects.org/>; | ||
delete from db.dba.load_list; | ||
ld_dir_all('/usr/local/var/lib/virtuoso/db/import', '*', 'http://resourceprojects.org/'); | ||
rdf_loader_run(); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
cd /usr/local/var/lib/virtuoso/db/ | ||
rm -r import | ||
mkdir import | ||
cp /usr/src/app/data/* import | ||
cp /usr/src/app/ontology/*.rdf import | ||
isql virtuoso dba dba /import.sql |
Oops, something went wrong.