Data quality checks in your dbt flow.
This repository helps you understand the different data quality checks available in dbt.
You can clone the repository, create a supabase
account, write the .env
file and run the commands according to the environment setup, and get an environment where you can practice testing with dbt for free.
I'm Bruno Gonzalez from 🇺🇾, working as a senior data engineer, and writing about data quality and data engineering.
Postgres database: supabase. Create a free account and get the credentials to create the .env
file.
Create .env
file with the following structure:
POSTGRES_HOST=<postgres_host>
POSTGRES_USER=<postgres_user>
POSTGRES_PASSWORD=<postgres_password>
POSTGRES_DATABASE=<postgres_database>
Commands to setup the environment:
conda create -n dbtdq python=3.9
conda activate dbtdq
pip install -r requirements.txt
export $(cat .env | xargs)
dbt seed
dbt run
dbt deps
Modified from dbt-labs jaffle_shop.
Changes:
seeds/raw_customers.csv
- Added customer
101
withoutfirst_name
. - Added customer
102
withoutlast_name
. - Added customer
103
with a differentlast_name
pattern. - Added customer
104
with inconsistent case infirst_name
.
- Added customer
seeds/raw_orders.csv
- Duplicated order with
id = 98
- Added order
100
with anorder_date
in the future. - Added order
101
with an inexistentuser_id
. - Added order
102
with a wrongstatus
. - Added order
103
without issues.
- Duplicated order with
seeds/raw_payments.csv
- Added payment
114
for order100
with a wrongpayment_method
. - Added payment
115
for order103
with a huge amount (outlier).
- Added payment