OpenDataDiscovery + Great Expectations = <3
This demo project shows ways OpenDataDiscovery works wits Data QA.
Run OpenDataDiscovery Platform and Postgres services to store Great Expectations results. By default, OpenDataDiscovery is started on http://locahost:8080
docker compose up -d
Next commands will create and activate virtual environment and install 3 libraries:
- great_expectations - To work with GreatExpectations.
- odd-cli - Has some useful commands,i.e. reading and collection local files metadata, creating OpenDataDiscovery tokens.
- odd-great-expectations - Contains ODDAction to catch validation results, map them and send metadata to OpenDataDiscovery Platform.
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Next command will create token with name=data_qa
and print it in console.
odd tokens create data_qa -h http://localhost:8080
Store env variable to reduce duplicates commands CLI commands.
export ODD_PLATFORM_HOST=http://localhost:8080
export ODD_PLATFORM_TOKEN=<token_from_previous_step>
For demo purposes we prepared 2 files data/BankChurners.csv
, data/BankChurners_Bad.csv
.
Next CLI command will read files from /data
folder, gather metadata and ingest it to OpenDataDiscovery Platform.
odd collect data
For demo purposes we prepared expectations (/great_expectations/expectations/validate_bank_data.json
) and 2 checkpoints (/great_expectations/checkpoints/*
) to run data quality tests against BankChurners files
succeeded_checkpoint - Validates data/BankChurners.csv
file.
great_expectations checkpoint run succeeded_checkpoint
failed_checkpoint - Validates data/BankChurners.csv
and data/BankChurners_Bad.csv
files.
great_expectations checkpoint run failed_checkpoint
Go to http://localhost:8080 to see results.