From f8d999965b513d1e688ad6460990bb615b7d19f8 Mon Sep 17 00:00:00 2001 From: Natalie Fiann Date: Fri, 6 Dec 2024 13:17:49 +0000 Subject: [PATCH 1/8] Updated upgrading to v1.9 guide to included parallel batch execution and added links to incremental microbatch page --- website/docs/docs/build/incremental-microbatch.md | 3 ++- .../docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index b50240775f..f51e7d47c5 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,8 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills). +- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry) and enables [parallel batch execution](/docs/build/incremental-microbatch#parallel-batch-execution). + - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). ### How microbatch works diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 6ade3d5013..432b1de7ff 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -49,6 +49,9 @@ Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incr - Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time`, `lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. - Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches. - Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. +- [Parallel batch execution](docs/build/incremental-microbatch#parallel-batch-execution): Multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt determines if a batch can run in parallel, so manual configuration is usually unnecessary. However, the `concurrent_batches` config is available as an override (not a gate), allowing you to specify whether batches should or shouldn’t be run in parallel in specific cases. + +For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of available threads. Currently microbatch is supported on these adapters with more to come: * postgres From 7d5d4b0388f287a8ab354fc3600fc064d5daee31 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:22:00 +0000 Subject: [PATCH 2/8] Update website/docs/docs/build/incremental-microbatch.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index f51e7d47c5..72f5f8d721 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,7 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry) and enables [parallel batch execution](/docs/build/incremental-microbatch#parallel-batch-execution). +- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), and enables [parallel batch execution](#parallel-batch-execution). - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). From 91ccf1ca904dc047ff39b1c380209f2f6e5b0947 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:27:00 +0000 Subject: [PATCH 3/8] Update website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- .../docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 432b1de7ff..ee552f0165 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -49,7 +49,7 @@ Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incr - Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time`, `lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. - Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches. - Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. -- [Parallel batch execution](docs/build/incremental-microbatch#parallel-batch-execution): Multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt determines if a batch can run in parallel, so manual configuration is usually unnecessary. However, the `concurrent_batches` config is available as an override (not a gate), allowing you to specify whether batches should or shouldn’t be run in parallel in specific cases. +- [Automatic parallel batch execution](#parallel-batch-execution): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-runs batches in parallel, while also allowing you to manually override parallel execution with the `concurrent_batches` config. For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of available threads. From 9c48de4dc6db6cc486cf4eb0b3f108760aeb5081 Mon Sep 17 00:00:00 2001 From: nataliefiann <120089939+nataliefiann@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:27:07 +0000 Subject: [PATCH 4/8] Update website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- .../docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index ee552f0165..1171c6229a 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -51,7 +51,6 @@ Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incr - Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. - [Automatic parallel batch execution](#parallel-batch-execution): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-runs batches in parallel, while also allowing you to manually override parallel execution with the `concurrent_batches` config. -For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of available threads. Currently microbatch is supported on these adapters with more to come: * postgres From f028892465bf6a367206a56da591dc71a9fb3367 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:34:47 +0000 Subject: [PATCH 5/8] Update website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md --- .../docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md index 1171c6229a..a7d8be0e8a 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md +++ b/website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md @@ -49,7 +49,7 @@ Starting in Core 1.9, you can use the new [microbatch strategy](/docs/build/incr - Simplified query design: Write your model query for a single batch of data. dbt will use your `event_time`, `lookback`, and `batch_size` configurations to automatically generate the necessary filters for you, making the process more streamlined and reducing the need for you to manage these details. - Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches. - Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`. -- [Automatic parallel batch execution](#parallel-batch-execution): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-runs batches in parallel, while also allowing you to manually override parallel execution with the `concurrent_batches` config. +- [Automatic parallel batch execution](/docs/build/incremental-microbatch#parallel-batch-execution): Process multiple batches at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. dbt intelligently auto-detects if your batches can run in parallel, while also allowing you to manually override parallel execution with the `concurrent_batches` config. Currently microbatch is supported on these adapters with more to come: From 8c84ad9bc8130cf0a01c72e0649cd2d49fca3d84 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:35:28 +0000 Subject: [PATCH 6/8] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index 72f5f8d721..33cd4b98e4 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,7 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), and enables [parallel batch execution](#parallel-batch-execution). +- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), and auto-detects [parallel batch execution](#parallel-batch-execution). - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). From a9d132684db32723814431efdbf5a9dc587541d8 Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:47:43 -0800 Subject: [PATCH 7/8] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index bc070c9f19..a411984dae 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,7 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills), allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), and auto-detects [parallel batch execution](#parallel-batch-execution). +- Unlike traditional incremental strategies, microbatch allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), auto-detect [parallel batch execution](#parallel-batch-execution), and eliminates the need to implement complex conditional logic for [backfilling](#backfills). - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). From 0d3aa1fb813de8bbb92983d7e3e977b63bf602da Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Fri, 6 Dec 2024 13:55:07 -0800 Subject: [PATCH 8/8] Update website/docs/docs/build/incremental-microbatch.md --- website/docs/docs/build/incremental-microbatch.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index cf8d240e37..67b297df2f 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -25,7 +25,7 @@ Incremental models in dbt are a [materialization](/docs/build/materializations) Microbatch is an incremental strategy designed for large time-series datasets: - It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. - It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. -- Unlike traditional incremental strategies, microbatch allows you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), auto-detect [parallel batch execution](#parallel-batch-execution), and eliminates the need to implement complex conditional logic for [backfilling](#backfills). +- Unlike traditional incremental strategies, microbatch enables you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), auto-detect [parallel batch execution](#parallel-batch-execution), and eliminate the need to implement complex conditional logic for [backfilling](#backfills). - Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies).