dbt-bigquery-monitoring

🚧 The package is still early stage and might vary a lot 🚧

dbt-bigquery-monitoring is a dbt package that provides models for monitoring BigQuery performance and costs.

Getting started

little tip: click on "Watch -> Custom -> Releases" to get an email on new versions with the changelog

dbt compatibility

The package is actively used with the latest dbt stable version which 1.8.2 at the time of writing.

Granting rights

To use this package, you will need to grant permissions to the Service Account that dbt uses to connect to BigQuery.

There are various ways to add required permissions to leverage the extension.

"YOLO" mode

The simplest way is to give BQ admin access role:

BigQuery Admin can do pretty much everything in BigQuery (so more than enough)

It's great for testing but not recommended for production where you'd rather follow the principle of least privilege.

Finer grain basic roles

Google provides some predefined roles that can be used to grant the necessary permissions to the service account that dbt uses to connect to BigQuery.

Here's the list of predefined roles that can be combined to cover the extension needs:

BigQuery Data Editor to list and modify datasets/tables
BigQuery User to run queries
BigQuery Resource Viewer to access some metadata tables

Custom roles

if you prefer to use custom roles, you can use the following permissions.

This list might not be exhaustive and you might need to add more permissions depending on your use case but it should be a good start:

bigquery.jobs.create - To Create BigQuery request
bigquery.tables.get - To access BigQuery tables data
bigquery.tables.list - To access BigQuery tables data
bigquery.jobs.listAll - To access BigQuery jobs data
- At the organization or project level, depending on desired scope
- Note that JOBS_BY_ORGANIZATION is only available to users with defined Google Cloud organizations. More information on permissions and access control in BigQuery can be found here.
bigquery.reservations.list - To access BigQuery Reservations data
bigquery.capacityCommitments.list - To access BigQuery Reservations data
bigquery.reservationAssignments.list - To access BigQuery Reservations data

Installing the package to your dbt project

Add the following to your packages.yml file:

packages:
  - package: bqbooster/dbt_bigquery_monitoring
    version: 0.14.3

Set up an output dataset

In your dbt_project.yml file, add the following configuration:

models:
  ## dbt-bigquery-models models will be created in the schema '<your_schema>_dbt_bigquery_monitoring' (or anything related if you override output schema system through a macro)
  dbt_bigquery_monitoring:
    +schema: "dbt_bigquery_monitoring"

Configure the package

Settings have default values that can be overriden using:

dbt project variables (and therefore also by CLI variable override)
environment variables

Please note that the default region is us and there's no way, at the time of writing, to query cross region tables but you might run that project in each region you want to monitor and then replicate the tables to a central region to build an aggregated view.

To know which region is related to a job, in the BQ UI, use the Job history (bottom panel), take a job and look at Location field when clicking on a job. You can also access the region of a dataset/table by opening the details panel of it and check the Data location field.

Modes

Region mode (default)

In this mode, the package will monitor all the GCP projects in the region specified in the dbt_project.yml file.

vars:
  # dbt bigquery monitoring vars
  bq_region: 'us'

Requirements

Execution project needs to be the same as the storage project else you'll need to use the second mode.
If you have multiple GCP Projects in the same region, you should use the "project mode" (with input_gcp_projects setting to specify them) as else you will run into errors such as: Within a standard SQL view, references to tables/views require explicit project IDs unless the entity is created in the same project that is issuing the query, but these references are not project-qualified: "region-us.INFORMATION_SCHEMA.JOBS".

Project mode

To enable the "project mode", you'll need to define explicitly one mandatory setting to set in the dbt_project.yml file:

vars:
  # dbt bigquery monitoring vars
  input_gcp_projects: [ 'my-gcp-project', 'my-gcp-project-2' ]

GCP Billing export

GCP Billing export is a feature that allows you to export your billing data to BigQuery. It allows the package to track the real cost of your queries and storage overtime.

To enable on GCP end, you can follow the official documentation to set up the export.

Then enable the GCP billing export monitoring in the package, you'll need to define the following settings in the dbt_project.yml file:

vars:
  # dbt bigquery monitoring vars
  enable_gcp_billing_export: true
  gcp_billing_export_storage_project: 'my-gcp-project'
  gcp_billing_export_dataset: 'my_dataset'
  gcp_billing_export_table: 'my_table'

Customizing the package configuration

Settings details

Following settings are defined with following template: dbt_project_variable (Environment variable) : description (default if any).

Optional settings

Environment

input_gcp_projects (DBT_BQ_MONITORING_GCP_PROJECTS) : list of GCP projects to monitor (default: [])
bq_region (DBT_BQ_MONITORING_REGION) : region where the monitored projects are located (default: us)

Pricing

use_flat_pricing (DBT_BQ_MONITORING_USE_FLAT_PRICING) : whether to use flat pricing or not (default: true)
per_billed_tb_price (DBT_BQ_MONITORING_PER_BILLED_TB_PRICE) : price in US dollars per billed TB of data processed (default: 6,25)
free_tb_per_month (DBT_BQ_MONITORING_FREE_TB_PER_MONTH) : free on demand compute quota TB per month (default: 1)
hourly_slot_price (DBT_BQ_MONITORING_HOURLY_SLOT_PRICE) : hourly price in US dollars per slot per hour (default: 0.04)
active_logical_storage_gb_price (DBT_BQ_MONITORING_ACTIVE_LOGICAL_STORAGE_GB_PRICE) : monthly price in US dollars per active logical storage GB (default: 0.02)
long_term_logical_storage_gb_price (DBT_BQ_MONITORING_LONG_TERM_LOGICAL_STORAGE_GB_PRICE) : monthly price in US dollars per long term logical storage GB (default: 0.01)
active_physical_storage_gb_price (DBT_BQ_MONITORING_ACTIVE_PHYSICAL_STORAGE_GB_PRICE) : monthly price in US dollars per active physical storage GB (default: 0.04)
long_term_physical_storage_gb_price (DBT_BQ_MONITORING_LONG_TERM_PHYSICAL_STORAGE_GB_PRICE) : monthly price in US dollars per long term physical storage GB (default: 0.02)
bi_engine_gb_hourly_price (DBT_BQ_MONITORING_BI_ENGINE_GB_HOURLY_PRICE): hourly price in US dollars per BI engine GB of memory (default: 0.0416)
free_storage_gb_per_month (DBT_BQ_MONITORING_FREE_STORAGE_GB_PER_MONTH) : free storage GB per month (default: 10)

Package

lookback_window_days (DBT_BQ_MONITORING_LOOKBACK_WINDOW_DAYS) : number of days to look back for monitoring (default: 7)
lookback_incremental_billing_window_days (DBT_BQ_MONITORING_LOOKBACK_INCREMENTAL_BILLING_WINDOW_DAYS) : number of days to look back for monitoring (default: 3)
output_limit_size (DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE) : limit size to use for the models (default: 1000)
output_partition_expiration_days (DBT_BQ_MONITORING_OUTPUT_LIMIT_SIZE) : default table expiration in days for incremental models (default: 365 days)
use_copy_partitions (DBT_BQ_MONITORING_USE_COPY_PARTITIONS) : whether to use copy partitions or not (default: true)

GCP Billing export

enable_gcp_billing_export (DBT_BQ_MONITORING_ENABLE_GCP_BILLING_EXPORT) : toggle to enable GCP billing export monitoring (default: false)
gcp_billing_export_storage_project (DBT_BQ_MONITORING_GCP_BILLING_EXPORT_STORAGE_PROJECT) : the GCP project where billing export data is stored (default: 'placeholder' if enable_gcp_billing_export is true; otherwise None)
gcp_billing_export_dataset (DBT_BQ_MONITORING_GCP_BILLING_EXPORT_DATASET) : the dataset for GCP billing export data (default: 'placeholder' if enable_gcp_billing_export is true; otherwise None)
gcp_billing_export_table (DBT_BQ_MONITORING_GCP_BILLING_EXPORT_TABLE) : the table for GCP billing export data (default: 'placeholder' if enable_gcp_billing_export is true; otherwise None)

Add metadata to queries (Recommended but optional)

To enhance your query metadata with dbt model information, the package provides a dedicated macro that leverage "dbt query comments" (the header set at the top of each query) To configure the query comments, add the following config to dbt_project.yml.

query-comment:
  comment: '{{ dbt_bigquery_monitoring.get_query_comment(node) }}'
  job-label: True # Use query comment JSON as job labels

Running the package

The package is designed to be run as a daily or hourly job. To do so, you can use the following dbt command:

dbt run -s tag:dbt-bigquery-monitoring

Using the package

Google INFORMATION_SCHEMA tables

Following models are available to query the INFORMATION_SCHEMA tables. They are materialized as ephemeral in dbt so it acts as a "source" but let you access multiple multiple project based tables using a single ref.

Example

You can use those models such as:

SELECT query FROM {{ ref('information_schema_jobs') }}

Here's the list (**don't forget to prefix the following list by `information_schema_` in your `ref` call**).

access_control
- object_privileges
bi_engine
- bi_capacities
- bi_capacity_changes
configuration
- effective_project_options
- organization_options
- organization_options_changes
- project_options
- project_options_changes
datasets
- links
- schemata
- schemata_options
- schemata_replicas
- shared_dataset_usage
jobs
- jobs
- jobs_by_folder
- jobs_by_organization
- jobs_by_project
- jobs_by_user
jobs_timeline
- jobs_timeline
- jobs_timeline_by_folder
- jobs_timeline_by_organization
- jobs_timeline_by_user
recommendations_and_insights
- insights
- recommendations_by_organization
- recommendations
reservations
- assignment_changes
- assignments
- capacity_commitment_changes
- capacity_commitments
- reservation_changes
- reservations
- reservations_timeline
routines
- parameters
- routine_options
- routines
search_indexes
- search_index_columns
- search_indexes
sessions
- sessions
- sessions_by_project
- sessions_by_user
streaming
- streaming_timeline
- streaming_timeline_by_folder
- streaming_timeline_by_organization
tables
- column_field_paths
- columns
- constraint_column_usage
- key_column_usage
- partitions
- table_constraints
- table_options
- table_snapshots
- table_storage
- table_storage_by_organization
- table_storage_usage_timeline
- table_storage_usage_timeline_by_organization
- tables
vector_indexes
- vector_index_columns
- vector_index_options
- vector_indexes
views
- materialized_views
- views
write_api
- write_api_timeline
- write_api_timeline_by_folder
- write_api_timeline_by_organization
gcp_billing_export
- gcp_billing_export_resource_v1

Models

The package provides the following datamarts that can be easily used to build monitoring charts and dashboards:

global
- daily_spend
- dbt_bigquery_monitoring_options
compute
- billing
  - compute_billing_per_hour
- bi engine
  - bi_engine_usage_per_minute
  - bi_engine_usage_per_hour
- cost
  - compute_cost_per_hour
  - compute_cost_per_hour_view (adds computed metrics)
  - compute_cost_per_minute
  - compute_cost_per_minute_view (adds computed metrics)
- jobs
  - most_expensive_jobs
  - most_repeated_jobs
  - slowest_jobs
  - query_with_better_pricing_using_flat_pricing_view
  - query_with_better_pricing_using_on_demand_view
- models
  - most_expensive_models
  - most_repeated_models
- reservations
  - reservation_usage_per_minute
- users
  - most_expensive_users
storage
- dataset_with_better_pricing_on_logical_billing_model
- dataset_with_better_pricing_on_physical_billing_model
- dataset_with_cost
- most_expensive_tables
- partitions_monitoring
- read_heavy_tables
- storage_billing_per_hour
- table_with_better_pricing_on_logical_billing_model
- table_with_better_pricing_on_physical_billing_model
- unused_tables

Contributing

If you feel like contribute, don't hesitate to open an issue and submit a PR. For more details, please refer to the CONTRIBUTING.md file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dbt-bigquery-monitoring

Getting started

dbt compatibility

Granting rights

"YOLO" mode

Finer grain basic roles

Custom roles

Installing the package to your dbt project

Set up an output dataset

Configure the package

Modes

Region mode (default)

Project mode

GCP Billing export

Customizing the package configuration

Optional settings

Environment

Pricing

Package

GCP Billing export

Add metadata to queries (Recommended but optional)

Running the package

Tags

Using the package

Google INFORMATION_SCHEMA tables

Example

Models

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

dbt-bigquery-monitoring

Getting started

dbt compatibility

Granting rights

"YOLO" mode

Finer grain basic roles

Custom roles

Installing the package to your dbt project

Set up an output dataset

Configure the package

Modes

Region mode (default)

Project mode

GCP Billing export

Customizing the package configuration

Optional settings

Environment

Pricing

Package

GCP Billing export

Add metadata to queries (Recommended but optional)

Running the package

Tags

Using the package

Google INFORMATION_SCHEMA tables

Example

Models

Contributing