How to - dynamically enable or disable a model from running based on query results from another table #1310

jeremyyeo · 2022-04-06T10:10:09Z

jeremyyeo
Apr 6, 2022
Collaborator

The following apply to all configs (tags, hooks, snowflake_warehouse, incremental_strategy, etc) and not just the enabled config which is the initial inspiration for this writeup.

From time to time, we may want to skip the run of a model based on some dynamic condition instead of statically setting the enabled flag:

-- my_table.sql
{{ config(enabled = false) }}
select 1 as user_id

I believe there's only a couple of ways we can have the enabled config be "dynamic":

Using variables

-- my_table.sql
{{ config(enabled = var('run_flag')) }}
select 1 as user_id

dbt run --vars 'run_flag: True'
dbt run --vars 'run_flag: False'

Using jinja / macros with limited logic (i.e. not dependent on the execution of a SQL query)

-- my_table.sql
{{ config(enabled = run_flag()) }}
select 1 as user_id

-- macros/run_flag.sql
{% macro run_flag() %}
  {% if target.name == 'prod' %}
    {% do return(True) %}
  {% else %}
    {% do return(False) %}
  {% endif %}
{% endmacro %}

If the target was prod then model my_table would be run and not otherwise.

As far as I know, there is currently no way to set the enabled config to be conditional based upon some result being returned by a query.

Let's imagine we have a table that determines whether a particular model should be enabled (1) or not (0):

-- check_run
+───────────+
| run_flag  |
+───────────+
| 0         |
+───────────+

We add a macro that queries that table and return True or False based on the above table:

{% macro run_flag() %}
    {%- set run_or_not = True -%}

    {%- set query -%}
        select run_flag from development.dbt_jyeo.check_run;
    {%- endset -%}

    {%- if execute -%}
        {%- set results = run_query(query) -%}
        {%- set val = results.columns[0].values()[0] -%}
        {%- if val == 0 -%}
            {%- set run_or_not = False -%}
        {%- endif -%}
    {%- endif -%}

    {% do return(run_or_not) %}
{% endmacro %}

And then we use it in our model like we had previously when calling the run_flag() macro:

-- my_table.sql
{{ config(enabled = run_flag()) }}
select 1 as user_id

You will find that your my_table model runs no matter what is in your check_run table. My best guess is that the enabled config is set before any queries are executed on the database, this means that run_flag() above will always return True.

-!  🚨                                          WARNING                                          🚨  !-
Do not use this pattern if you can avoid it. I make no guarantees that they will work well with your 
project or even with other core dbt functionality. Likely if you are going down this route, it is 
better to reconsider restructuring your entire project - for example, you could choose to add a test
to model A which is selected from in model B and then use the `dbt build` command - this ensures
that if a test fails in model A, model B is not run instead.

In order to achieve the above "enabled or not based on query results", one "hack" might be to essentially modify the SQL body of your model, if it was a table then we do self reference, and if it was an incremental, we apply a limit 0.

First, let's add a modified version of the macro shown above:

{% macro check_and_apply_limit(model_name) -%}
    {%- set table_limit = '' -%}

    {%- set query -%}
        select run_flag from {{ target.database }}.{{ target.schema }}.check_run where model_name = '{{ model_name }}';
    {%- endset -%}

    {%- if execute -%}
        {%- set results = run_query(query) -%}
        {%- set output = results.columns[0].values()[0] -%}
        {%- if output == 0 -%}
            {%- do log('[check_and_apply_limit] check run failed (run_flag = 0) model <' ~ model_name ~ '> will self-reference or limit.', true) -%}
            {%- set table_limit = 'limit 0' -%}
        {%- endif -%}
    {%- endif -%}

    {{ table_limit }}
{%- endmacro %}

And in our check_run, let's first try enabling (1) all of our models by letting them have the following values:

-- check_run
+───────────+─────────────────+
| run_flag  | model_name      |
+───────────+─────────────────+
| 1         | my_incremental  |
| 1         | my_table        |
+───────────+─────────────────+

Now we add 2 models to demonstrate the above working:

-- my_table.sql
{{ config(materialized = 'table') }}
with source as (
    select 1 as user_id
)
select * from
{% if check_and_apply_limit(this.name) == 'limit 0' %}
    {{ this }}
{% else %}
    source
{% endif %}


-- my_incremental.sql
{{ config(materialized = 'incremental') }}
with source as (
    select 1 as user_id
)
select * from source
{{ check_and_apply_limit(this.name) }}

Now let's do a dbt run and inspect our debug logs:

Click to expand...

10:03:45.914153 [debug] [Thread-1  ]: On model.my_dbt_project.my_incremental: /* {"app": "dbt", "dbt_version": "1.0.4", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.my_incremental"} */

      create or replace transient table development.dbt_jyeo.my_incremental  as
      (
with source as (
    select 1 as user_id
)
select * from source

      );

10:03:49.466734 [debug] [Thread-1  ]: On model.my_dbt_project.my_table: /* {"app": "dbt", "dbt_version": "1.0.4", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.my_table"} */


      create or replace transient table development.dbt_jyeo.my_table  as
      (
with source as (
    select 1 as user_id
)
select * from

    source

      );

And then another for good measure to check the behaviour of our incremental model:

Click to expand...

10:04:55.868895 [debug] [Thread-1  ]: On model.my_dbt_project.my_incremental: /* {"app": "dbt", "dbt_version": "1.0.4", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.my_incremental"} */

      create or replace temporary table development.dbt_jyeo.my_incremental__dbt_tmp  as
      (
with source as (
    select 1 as user_id
)
select * from source

      );

10:04:58.436670 [debug] [Thread-1  ]: On model.my_dbt_project.my_incremental: insert into development.dbt_jyeo.my_incremental ("USER_ID")
        (
            select "USER_ID"
            from development.dbt_jyeo.my_incremental__dbt_tmp
        );

At this point, feel free to check the tables above in your data warehouse to see that they have the expected rows.

Now let's manually set¹ those run_flag values in our check_run table to 0 in order to disabled those model from being run²:

-- check_run
+───────────+─────────────────+
| run_flag  | model_name      |
+───────────+─────────────────+
| 0         | my_incremental  |
| 0         | my_table        |
+───────────+─────────────────+

And then a dbt run should log the following to stdout:

Click to expand...

09:53:19  Running with dbt=1.0.4
09:53:20  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 181 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
09:53:20  
09:53:26  Concurrency: 1 threads (target='dev')
09:53:26  
09:53:26  1 of 2 START incremental model dbt_jyeo.my_incremental.......................... [RUN]
09:53:28  [check_and_apply_limit] check run failed (run_flag = 0) model <my_incremental> will self-reference or limit.
09:53:32  1 of 2 OK created incremental model dbt_jyeo.my_incremental..................... [SUCCESS 1 in 6.08s]
09:53:32  2 of 2 START table model dbt_jyeo.my_table...................................... [RUN]
09:53:34  [check_and_apply_limit] check run failed (run_flag = 0) model <my_table> will self-reference or limit.
09:53:36  2 of 2 OK created table model dbt_jyeo.my_table................................. [SUCCESS 1 in 4.20s]
09:53:36  
09:53:36  Finished running 1 incremental model, 1 table model in 16.11s.
09:53:36  
09:53:36  Completed successfully
09:53:36  
09:53:36  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

With the more detailed debug logs in our target/logs folder showing:

Click to expand...

09:55:15.371513 [debug] [Thread-1  ]: On model.my_dbt_project.my_incremental: /* {"app": "dbt", "dbt_version": "1.0.4", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.my_incremental"} */
      create or replace temporary table development.dbt_jyeo.my_incremental__dbt_tmp  as
      (
with source as (
    select 1 as user_id
)
select * from source
limit 0
      );

09:55:17.574398 [debug] [Thread-1  ]: On model.my_dbt_project.my_incremental: insert into development.dbt_jyeo.my_incremental ("USER_ID")
        (
            select "USER_ID"
            from development.dbt_jyeo.my_incremental__dbt_tmp
        );

...

09:55:21.242456 [debug] [Thread-1  ]: On model.my_dbt_project.my_table: /* {"app": "dbt", "dbt_version": "1.0.4", "profile_name": "snowflake", "target_name": "dev", "node_id": "model.my_dbt_project.my_table"} */
      create or replace transient table development.dbt_jyeo.my_table  as
      (
with source as (
    select 1 as user_id
)
select * from
    development.dbt_jyeo.my_table
      );

Basically we have successfully left the 2 models in their prior state with the above commands - pseudo dynamic enabled achieved.

While we are setting these manually, you can imagine them being set by some other process, perhaps a separate audit run that sets those to 0 or 1 based on some other conditions for those models. ↩
They are actually being "run" - we just have some logic that makes them appear to not update the actual destination table. ↩

akmalsoliev · 2024-04-24T15:33:53Z

akmalsoliev
Apr 24, 2024

Is there still no way to deal with this issue?
The idea is that config should be executed first with all its macros no?

4 replies

jeremyyeo May 27, 2024
Collaborator Author

Is there still no way to deal with this issue?
Unfortunately not.

akmalsoliev May 28, 2024

Is there still no way to deal with this issue?
Unfortunately not.

There is a way around, which is not ideal, just setting some column to an absurd value which should never happen, but again this is an absurd workaround.

mahiki Sep 1, 2024

Just want to say thank you, this is helping me understand dbt as a new adopter.

mahiki Sep 4, 2024

I have found my way back here after having built a huge table and wanting not to destroy it on the next run/build command. Seems like I need to convert it to an incremental model, which is another story.

This is not a solution to the use-case above, but I found this helpful for my own troubles, in that the dev/stage runs can be slimmed down

{% if target.name != 'prod' %}
where partition_date_col > current_timestamp() - interval '5' day
{% endif %}

Since we are entertaining terrible hacks, what if the where condition is like where <something> != (select col from other_table). The if condition could just be true I guess. Not sure you can use ref inside an inline macro.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to - dynamically enable or disable a model from running based on query results from another table #1310

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to - dynamically enable or disable a model from running based on query results from another table #1310

jeremyyeo Apr 6, 2022 Collaborator

Using variables

Using jinja / macros with limited logic (i.e. not dependent on the execution of a SQL query)

Footnotes

Replies: 1 comment · 4 replies

akmalsoliev Apr 24, 2024

jeremyyeo May 27, 2024 Collaborator Author

akmalsoliev May 28, 2024

mahiki Sep 1, 2024

mahiki Sep 4, 2024

jeremyyeo
Apr 6, 2022
Collaborator

Replies: 1 comment 4 replies

akmalsoliev
Apr 24, 2024

jeremyyeo May 27, 2024
Collaborator Author