Skip to content

Commit

Permalink
Merge branch 'current' into azure-auth0
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewshaver authored Nov 27, 2024
2 parents 77325fe + 2591d73 commit aa4bdce
Show file tree
Hide file tree
Showing 15 changed files with 283 additions and 25 deletions.
31 changes: 24 additions & 7 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ id: "incremental-microbatch"

:::info Microbatch

The `microbatch` strategy is available in beta for [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) and dbt Core v1.9.
The new `microbatch` strategy is available in beta for [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) and dbt Core v1.9.

If you use a custom microbatch macro, set a [distinct behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](#how-microbatch-compares-to-other-incremental-strategies).

Expand All @@ -22,17 +22,34 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.
Microbatch is an incremental strategy designed for large time-series datasets:
- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions.
- It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing.
- Unlike traditional incremental strategies, microbatch doesn't require implementing complex conditional logic for [backfilling](#backfills).
- Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies).

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.
### How microbatch works

When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />.

This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently.

### Example

A `sessions` model aggregates and enriches data that comes from two other models.
- `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update.
- `customers` is a relatively small dimensional table. Customer attributes update often, and not in a time-based manner — that is, older customers are just as likely to change column values as newer customers.
A `sessions` model aggregates and enriches data that comes from two other models:
- `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update. It uses the `page_view_start` column as its `event_time`.
- `customers` is a relatively small dimensional table. Customer attributes update often, and not in a time-based manner — that is, older customers are just as likely to change column values as newer customers. The customers model doesn't configure an `event_time` column.

The `page_view_start` column in `page_views` is configured as that model's `event_time`. The `customers` model does not configure an `event_time`. Therefore, each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch, and it will not filter `customers` (a full scan for every batch).
As a result:

- Each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch.
- The `customers` table isn't filtered, resulting in a full scan for every batch.

:::tip
In addition to configuring `event_time` for the target table, you should also specify it for any upstream models that you want to filter, even if they have different time columns.
:::

<File name="models/staging/page_views.yml">

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/incremental-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ When you define a `unique_key`, you'll see this behavior for each row of "new" d
Please note that if there's a unique_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and [incremental strategy](/docs/build/incremental-strategy). If you're having issues running an incremental model, it's a good idea to double check that the unique key is truly unique in both your existing database table and your new incremental rows. You can [learn more about surrogate keys here](https://www.getdbt.com/blog/guide-to-surrogate-key).

:::info
While common incremental strategies, such as`delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy).
While common incremental strategies, such as `delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy).
:::

#### `unique_key` example
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ You can read more about each of these behavior changes in the following links:
- (Introduced, disabled by default) [`skip_nodes_if_on_run_start_fails` project config flag](/reference/global-configs/behavior-changes#behavior-change-flags). If the flag is set and **any** `on-run-start` hook fails, mark all selected nodes as skipped.
- `on-run-start/end` hooks are **always** run, regardless of whether they passed or failed last time.
- (Introduced, disabled by default) [[Redshift] `restrict_direct_pg_catalog_access`](/reference/global-configs/behavior-changes#redshift-restrict_direct_pg_catalog_access). If the flag is set the adapter will use the Redshift API (through the Python client) if available, or query Redshift's `information_schema` tables instead of using `pg_` tables.
- (Introduced, disabled by default) [`require_nested_cumulative_type_params`](/reference/global-configs/behavior-changes#cumulative-metrics). If the flag is set to `True`, users will receive an error instead of a warning if they're not proprly formatting cumulative metrics using the new [`cumulative_type_params`](/docs/build/cumulative#parameters) nesting.
- (Introduced, disabled by default) [`require_batched_execution_for_custom_microbatch_strategy`](/reference/global-configs/behavior-changes#custom-microbatch-strategy). Set to `True` if you use a custom microbatch macro to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the microbatch strategy.

## Adapter specific features and functionalities
Expand Down
46 changes: 40 additions & 6 deletions website/docs/docs/deploy/model-notifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ Create configuration YAML files in your project for dbt to send notifications ab

## Configure groups

Add your group configuration in either the `dbt_project.yml` or `groups.yml` file. For example:
Define your groups in any .yml file in your [models directory](/reference/project-configs/model-paths). For example:

<File name='models/groups.yml'>

```yml
version: 2
Expand All @@ -42,22 +44,26 @@ groups:
- name: finance
description: "Models related to the finance department"
owner:
# 'name' or 'email' is required
# Email is required to receive model-level notifications, additional properties are also allowed.
name: "Finance Team"
email: [email protected]
slack: finance-data
favorite_food: donuts

- name: marketing
description: "Models related to the marketing department"
owner:
name: "Marketing Team"
email: [email protected]
slack: marketing-data
favorite_food: jaffles
```
## Set up models
</File>
## Attach groups to models
Set up your model configuration in either the `dbt_project.yml` or `groups.yml` file; doing this automatically sets up notifications for tests, too. For example:
Attach groups to models as you would any other config, in either the `dbt_project.yml` or `whatever.yml` files. For example:

<File name='models/marts.yml'>

```yml
version: 2
Expand All @@ -74,6 +80,34 @@ models:
group: marketing
```
</File>

By assigning groups in the `dbt_project.yml` file, you can capture all models in a subdirectory at once.

In this example, model notifications related to staging models go to the data engineering group, `marts/sales` models to the finance team, and `marts/campaigns` models to the marketing team.

<File name='dbt_project.yml'>

```yml
config-version: 2
name: "jaffle_shop"
[...]
models:
jaffle_shop:
staging:
+group: data_engineering
marts:
sales:
+group: finance
campaigns:
+group: marketing
```

</File>
Attaching a group to a model also encompasses its tests, so you will also receive notifications for a model's test failures.

## Enable access to model notifications

Expand Down
8 changes: 4 additions & 4 deletions website/docs/reference/dbt-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ Commands with a ('❌') indicate write commands, commands with a ('✅') indicat

| Command | Description | Parallel execution | <div style={{width:'250px'}}>Caveats</div> |
|---------|-------------| :-----------------:| ------------------------------------------ |
| [build](/reference/commands/build) | Build and test all selected resources (models, seeds, snapshots, tests) || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [build](/reference/commands/build) | Builds and tests all selected resources (models, seeds, snapshots, tests) || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| cancel | Cancels the most recent invocation. | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clean](/reference/commands/clean) | Deletes artifacts present in the dbt project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clone selected models from the specified state || All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [clone](/reference/commands/clone) | Clones selected models from the specified state || All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [compile](/reference/commands/compile) | Compiles (but does not run) the models in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [debug](/reference/commands/debug) | Debugs dbt connections and projects || dbt Cloud IDE, dbt Core <br /> All [supported versions](/docs/dbt-versions/core) |
| [deps](/reference/commands/deps) | Downloads dependencies for a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
Expand All @@ -50,9 +50,9 @@ Commands with a ('❌') indicate write commands, commands with a ('✅') indicat
| reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | dbt Cloud CLI <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [retry](/reference/commands/retry) | Retry the last run `dbt` command from the point of failure || All tools <br /> Requires [dbt v1.6 or higher](/docs/dbt-versions/core) |
| [run](/reference/commands/run) | Runs the models in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invoke a macro, including running arbitrary maintenance SQL against the database || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [run-operation](/reference/commands/run-operation) | Invokes a macro, including running arbitrary maintenance SQL against the database || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [seed](/reference/commands/seed) | Loads CSV files into the database || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Preview table rows post-transformation || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [show](/reference/commands/show) | Previews table rows post-transformation || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [snapshot](/reference/commands/snapshot) | Executes "snapshot" jobs defined in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
| [source](/reference/commands/source) | Provides tools for working with source data (including validating that sources are "fresh") || All tools<br /> All [supported versions](/docs/dbt-versions/core) |
| [test](/reference/commands/test) | Executes tests defined in a project || All tools <br /> All [supported versions](/docs/dbt-versions/core) |
Expand Down
50 changes: 50 additions & 0 deletions website/docs/reference/global-configs/behavior-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ When we use dbt Cloud in the following table, we're referring to accounts that h
| [state_modified_compare_more_unrendered_values](#source-definitions-for-state) | 2024.10 | TBD* | 1.9.0 | TBD* |
| [require_yaml_configuration_for_mf_time_spines](#metricflow-time-spine-yaml) | 2024.10 | TBD* | 1.9.0 | TBD* |
| [require_batched_execution_for_custom_microbatch_strategy](#custom-microbatch-strategy) | 2024.11 | TBD* | 1.9.0 | TBD* |
| [cumulative_type_params](#cumulative-metrics-parameter) | 2024.11 | TBD* | 1.9.0 | TBD* |
When the dbt Cloud Maturity is "TBD," it means we have not yet determined the exact date when these flags' default values will change. Affected users will see deprecation warnings in the meantime, and they will receive emails providing advance warning ahead of the maturity date. In the meantime, if you are seeing a deprecation warning, you can either:
- Migrate your project to support the new behavior, and then set the flag to `True` to stop seeing the warnings.
Expand Down Expand Up @@ -175,3 +176,52 @@ Set the flag is set to `True` if you have a custom microbatch macro set up in yo
If you have a custom microbatch macro and the flag is left as `False`, dbt will issue a deprecation warning.

Previously, users needed to set the `DBT_EXPERIMENTAL_MICROBATCH` environment variable to `True` to prevent unintended interactions with existing custom incremental strategies. But this is no longer necessary, as setting `DBT_EXPERMINENTAL_MICROBATCH` will no longer have an effect on runtime functionality.

### Cumulative metrics

[Cumulative-type metrics](/docs/build/cumulative#parameters) are nested under the `cumulative_type_params` field in Versionless dbt Cloud, dbt Core v1.9 and newer. Currently, dbt will warn users if they have cumulative metrics improperly nested. To enforce the new format (resulting in an error instead of a warning), set the `require_nested_cumulative_type_params` to `True`.

Use the following metric configured with the syntax before v1.9 as an example:

```yaml
type: cumulative
type_params:
measure: order_count
window: 7 days
```

If you run `dbt parse` with that syntax on Core v1.9 or Versionless dbt Cloud, you will receive a warning like:

```bash
15:36:22 [WARNING]: Cumulative fields `type_params.window` and
`type_params.grain_to_date` has been moved and will soon be deprecated. Please
nest those values under `type_params.cumulative_type_params.window` and
`type_params.cumulative_type_params.grain_to_date`. See documentation on
behavior changes:
https://docs.getdbt.com/reference/global-configs/behavior-changes.

```

If you set `require_nested_cumulative_type_params` to `True` and re-run `dbt parse` you will now receive an error like:

```bash

21:39:18 Cumulative fields `type_params.window` and `type_params.grain_to_date` should be nested under `type_params.cumulative_type_params.window` and `type_params.cumulative_type_params.grain_to_date`. Invalid metrics: orders_last_7_days. See documentation on behavior changes: https://docs.getdbt.com/reference/global-configs/behavior-changes.

```

Once the metric is updated, it will work as expected:

```yaml

type: cumulative
type_params:
measure:
name: order_count
cumulative_type_params:
window: 7 days

```
23 changes: 21 additions & 2 deletions website/docs/reference/project-configs/analysis-paths.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,31 @@ analysis-paths: [directorypath]
</File>
## Definition
Specify a custom list of directories where [analyses](/docs/build/analyses) are located.
Specify a custom list of directories where [analyses](/docs/build/analyses) are located.
## Default
Without specifying this config, dbt will not compile any `.sql` files as analyses.

However, the [`dbt init` command](/reference/commands/init) populates this value as `analyses` ([source](https://github.com/dbt-labs/dbt-starter-project/blob/HEAD/dbt_project.yml#L15))
However, the [`dbt init` command](/reference/commands/init) populates this value as `analyses` ([source](https://github.com/dbt-labs/dbt-starter-project/blob/HEAD/dbt_project.yml#L15)).

import RelativePath from '/snippets/_relative-path.md';

<RelativePath
path="analysis-paths"
absolute="/Users/username/project/analyses"
/>

- ✅ **Do**
- Use relative path:
```yml
analysis-paths: ["analyses"]
```

- ❌ **Don't**
- Avoid absolute paths:
```yml
analysis-paths: ["/Users/username/project/analyses"]
```

## Examples
### Use a subdirectory named `analyses`
Expand Down
Loading

0 comments on commit aa4bdce

Please sign in to comment.