-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
197 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,54 +1,83 @@ | ||
# OpenDataDiscovery dbt tests metadata collecting | ||
|
||
[![PyPI version](https://badge.fury.io/py/odd-dbt.svg)](https://badge.fury.io/py/odd-dbt) | ||
|
||
CLI tool helps automatically parse and ingest DBT test results to OpenDataDiscovery Platform. | ||
It can be used as separated CLI tool or within [ODD CLI](https://github.com/opendatadiscovery/odd-cli) package which provides some useful additional features. | ||
CLI tool helps run and ingest dbt test to platform. | ||
|
||
It can be used as separated CLI tool or within [ODD CLI](https://github.com/opendatadiscovery/odd-cli) package which | ||
provides some useful additional features for working with OpenDataDiscovery. | ||
|
||
## Supported adapters | ||
|
||
| Adapter | version | | ||
|-----------|---------| | ||
| Snowflake | ^1.6 | | ||
| Postgres | ^1.6 | | ||
|
||
Profiles inside the file looks different for each type of data source. | ||
|
||
**Snowflake** host_settings value is created from field `account`. Field value should be `<account_identifier>` | ||
For example the URL for an account uses the following format: `<account_identifier>`.snowflakecomputing.com | ||
Example Snowflake account identifier `hj1234.eu-central-1`. | ||
|
||
## Supported tests types | ||
|
||
1. [x] Generic tests | ||
2. [ ] Singular tests. Currently Singular tests are not supported. | ||
|
||
## Installation | ||
```pip install odd-dbt``` | ||
|
||
## Command options | ||
## To see all available commands | ||
``` | ||
╭─ Options ─────────────────────────────────────────────────────────────╮ | ||
│ --project-dir PATH [default: Path().cwd()odd-dbt] │ | ||
│ --target TEXT [default:None] │ | ||
│ --profile-name TEXT [default:None] │ | ||
│ * --host -h TEXT [env var: ODD_PLATFORM_HOST] │ | ||
│ * --token -t TEXT [env var: ODD_PLATFORM_TOKEN] │ | ||
│ --dbt-host TEXT [default: localhost] │ | ||
│ --help Show this message and exit. │ | ||
╰───────────────────────────────────────────────────────────────────────╯ | ||
odd_dbt_test --help | ||
``` | ||
|
||
## Example | ||
For each command that involves sending information to OpenDataDiscovery platform exists set of env variables: | ||
1. `ODD_PLATFORM_HOST` - Where you platform is | ||
2. `ODD_PLATFORM_TOKEN` - Token for ingesting data to platform (How to create [token](https://docs.opendatadiscovery.org/configuration-and-deployment/trylocally#create-collector-entity)?) | ||
3. `DBT_DATA_SOURCE_ODDRN` - Unique oddrn string describes dbt source, i.e '//dbt/host/localhost' | ||
|
||
## Command run example | ||
How to create [collector token](https://docs.opendatadiscovery.org/configuration-and-deployment/trylocally#create-collector-entity)? | ||
```bash | ||
odd_dbt_test --host http://localhost:8080 --token <COLLECTOR_TOKEN> | ||
It is recommended to add them as ENV variables or provide as flags to each command | ||
``` | ||
export ODD_PLATFORM_HOST=http://localhost:8080 | ||
export ODD_PLATFORM_TOKEN=token*** | ||
export DBT_DATA_SOURCE_ODDRN=//dbt/host/localhost | ||
``` | ||
|
||
## Supported data sources | ||
| Source | | | ||
| --------- | ------ | | ||
| Snowflake | ^1.4.1 | | ||
| Postgres | ^1.4.5 | | ||
### Commands | ||
`create-datasource` - helps to register dbt as data source at OpenDataDiscovery platform. User later for ingesting metadata. | ||
```commandline | ||
odd_dbt_test create-datasource --name=my_local_dbt --dbt-host=localhost | ||
``` | ||
|
||
## Requirements | ||
Library to inject Quality Tests entities requires presence of corresponding with them datasets entities in the platform. | ||
For example: if you want to inject data quality test of Snowflake table, you need to have entity of that table present in the platform. | ||
`ingest-test` - Read results_run file under the target folder to parse and ingest metadata. | ||
```commandline | ||
odd_dbt_test ingest-test --profile=my_profile | ||
``` | ||
|
||
## Supported tests | ||
Library supports for basics tests provided by dbt. | ||
- `unique`: values in the column should be unique | ||
- `not_null`: values in the column should not contain null values | ||
- `accepted_values`: column should only contain values from list specified in the test config | ||
- `relationships`: each value in the select column of the model exists as a specified field in the reference table (also known as referential integrity) | ||
`test` - Proxy command to `dbt test`, then reads results_run file under the target folder to parse and ingest metadata. | ||
```commandline | ||
odd_dbt_test test --profile=my_profile | ||
``` | ||
|
||
## ODDRN generation for datasets | ||
`host_settings` of ODDRN generators required for source datasets are loaded from `.dbt/profiles.yml`. | ||
### Run commands programmatically | ||
You could run that scrip to read, parse and ingest test results to the platform. | ||
```python | ||
# ingest_test_result.py | ||
from odd_dbt import config | ||
from odd_dbt.domain.cli_args import CliArgs | ||
from odd_dbt.service.dbt import get_context | ||
from odd_dbt.service.odd import ingest_entities | ||
from odd_dbt.mapper.test_results import DbtTestMapper | ||
|
||
Profiles inside the file looks different for each type of data source. | ||
cfg = config.Config() # All fields can be set manually or read from ENV variables | ||
client = config.create_odd_client(host=cfg.odd_platform_host, token=cfg.odd_platform_token) | ||
generator = config.create_dbt_generator_from_oddrn(oddrn=cfg.dbt_data_source_oddrn) | ||
|
||
**Snowflake** host_settings value is created from field `account`. Field value should be `<account_identifier>` | ||
For example the URL for an account uses the following format: `<account_identifier>`.snowflakecomputing.com | ||
Example Snowflake account identifier `hj1234.eu-central-1`. | ||
cli_args = CliArgs.default() | ||
context = get_context(cli_args=cli_args) | ||
data_entities = DbtTestMapper(context=context, generator=generator).map() | ||
ingest_entities(data_entities, client) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
import oddrn_generator as odd | ||
from odd_models.api_client.v2.odd_api_client import Client | ||
from pydantic import BaseSettings | ||
|
||
|
||
class Config(BaseSettings): | ||
odd_platform_host: str | ||
odd_platform_token: str | ||
dbt_data_source_oddrn: str | ||
|
||
|
||
def create_odd_client(host: str = None, token: str = None) -> Client: | ||
return Client(host=host, token=token) | ||
|
||
|
||
def create_dbt_generator_from_oddrn(oddrn: str) -> odd.DbtGenerator: | ||
return odd.DbtGenerator(host_settings=extract_host_from_oddrn(oddrn)) | ||
|
||
|
||
def create_dbt_generator(host: str) -> odd.DbtGenerator: | ||
return odd.DbtGenerator(host_settings=host) | ||
|
||
|
||
def extract_host_from_oddrn(oddrn: str) -> str: | ||
return oddrn.split("//dbt/host/")[-1] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,14 @@ | ||
from pathlib import Path | ||
|
||
from odd_dbt.utils import load_json | ||
from funcy import walk_values | ||
from dbt.contracts.graph.nodes import ParsedNode | ||
|
||
|
||
class Manifest: | ||
def __init__(self, file: Path) -> None: | ||
self._manifest = load_json(file) | ||
|
||
@property | ||
def nodes(self): | ||
return self._manifest["nodes"] | ||
def nodes(self) -> list[ParsedNode]: | ||
return walk_values(ParsedNode._deserialize, self._manifest["nodes"]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
from odd_models import DataEntityList | ||
from odd_models.api_client.v2.odd_api_client import Client | ||
|
||
from odd_dbt import config | ||
from odd_dbt.logger import logger | ||
|
||
|
||
def create_datasource(name: str, dbt_host: str, client: Client) -> None: | ||
generator = config.create_dbt_generator(host=dbt_host) | ||
oddrn = generator.get_data_source_oddrn() | ||
client.create_data_source( | ||
data_source_name=name, | ||
data_source_oddrn=oddrn, | ||
) | ||
return oddrn | ||
|
||
|
||
def ingest_entities(data_entities: DataEntityList, client: Client) -> None: | ||
client.ingest_data_entity_list(data_entities=data_entities) | ||
logger.success( | ||
f"Injecting test results finished. Ingested {len(data_entities.items)} entities" | ||
) |