[Bug] Custom tests whose `fail_calc` is a sum() function cannot run properly when the test produces no rows at all #4248

EPNickBosma · 2021-11-09T06:36:31Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Running a custom test whose fail_calc is sum(some_col) causes the test to crash if the custom test query itself produces no rows.

We have a custom test that deals with a very important and therefore heavily tested staging model. We wrote a custom test to aggregate the sum of erroneous entries per month, and error when the total sum of erroneous entries over the months returned was > x.

The way the test was written, it had a where clause saying rows should only be returned if they had discrepancies. Any month without a discrepancy doesn't need a row, in other words. No discrepancies, no rows. So when the discrepancies stopped, the rows stopped, the sum() function stopped, and the test stopped working.

To begin with we didn't know that this error was happening instead of a pass. We just knew that the error was happening sometimes. It would also intermittently work just fine, but in hindsight that was when it was failing because there were rows to sum.

Expected Behavior

When we run a custom test we generally expect it to end up in one of a few states. It might pass, it might error because of test failures, it might warn because of test failures, and it might skip.

However, and this is what makes this interesting, we do also accept that tests can fail for other reasons -- database error: being one.

This, not being a database error (and therefore not being bad sql), and not being an explicit pass/fail/warn/skip in everyday dbt terms (and therefore not being an expected outcome of dbt test) strikes me as a strange middle ground worth talking about a bit.

Some things are totally out of dbt's hands, and perhaps how we choose to configure our fail_calcs is one of those things. However if we can anticipate that users will try certain things and those certain things can go wrong in certain predictable ways, then maybe the product can get ahead of that in some cases. (Maybe this is one of those cases.)

Steps To Reproduce

Let's say we have a model: model_that_needs_a_custom_test.sql

with series_cte as (
    {{ dbt_utils.generate_series(upper_bound=100) }}
),

string_cte as (
    select 'a'::text as letter_designation
    union all
    select 'b'::text as letter_designation
    union all
    select 'c'::text as letter_designation
),

combined as (
    select
        series_cte.generated_number,
        string_cte.letter_designation
    from series_cte
    cross join string_cte
)

select * from combined
order by 1,2

Let's say we give that model a test: assert_only_categories_abc_exist.sql

{{
  config(
    error_if = '>=1',
    fail_calc = 'sum(non_abc_letters)'
  )
}}

with source_information as (
    select * from {{ ref('model_that_needs_a_custom_test') }}
)

select
    source_information.generated_number,
    sum(case when source_information.letter_designation not in ('a', 'b', 'c') then 1 else 0 end) as non_abc_letters
from source_information
where 1=2
group by 1

We run the model: dbt run --models model_that_needs_a_custom_test

05:12:56 | Running 1 on-run-start hook
05:12:56 | Concurrency: 1 threads (target='default')
05:13:04 | Running 1 on-run-end hook
05:13:04 | Finished running 1 view model, 2 hooks in 22.23s.
Completed successfully�

Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

We run the test: dbt test --models model_that_needs_a_custom_test

05:16:37 | Concurrency: 1 threads (target='default')
05:16:39 | Finished running 1 test in 14.08s.
Completed with 1 error and 0 warnings:�
None is not of type 'integer'�

Failed validating 'type' in schema['properties']['failures']:
    {'type': 'integer'}

On instance['failures']:
    None

Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The culprit is obvious in this case. We've put an impossible condition on the test and it can never return any rows. Without any rows to sum, the fail_calc is going to try to sum(null) and it's going to complain about it. The test therefore does not pass, ironically, because no erroneous records are found.

In our case it was much less obvious to begin with; we just knew we had an important model and the tests on it weren't working properly.

Relevant log output

Included above

Environment

- OS: Windows 11
- Browser: Firefox 94.0.1 (64-bit)
- dbt Cloud: "dbt_version": "0.20.2"

What database are you using dbt with?

redshift

Additional Context

This is potentially a super duper wontfix from a code perspective.

In some ways it's self-evident. How can you expect to sum something if you can't produce it?

There's also the question of what a fix would look like. Should dbt test try all tests with user-specified fail_calc first, and then revert to some other fail_calc in case of error? Should it instead explicitly skip and/or log something to the effect of you need to address one or more misconfigured tests: assert_only_categories_abc_exist?

The fix might look a lot more like this:

If the docs had a warning section saying make sure you only deviate from count(*) to sum(some_col) if you expect there to be a result set to speak of, otherwise you'll see this error:


Failed validating 'type' in schema['properties']['failures']:
    {'type': 'integer'}

On instance['failures']:
    None

Then anyone googling either fail_calc or the error itself would eventually find what was wrong, with no changes to the code base necessary.

The best outcome here really might be just to put something in the docs about it. That would have helped us when we were trying to figure out what was wrong, and that would lead to this not being an issue because it's in the docs.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-11-16T18:12:10Z

@EPNickBosma Thank you for the incredibly thorough write-up, and the detailed reproduction case!

I completely agree with your ultimate conclusion. The right next step here is to document that fail_calc should always return an integer number of failures, including the case when your test query returns no rows.

That's handled quite gracefully by the default fail_calc of count(*); less so if you're relying on being able to aggregate a specific column value. Given that this error crops up only if you set a custom fail_calc, I feel even more comfortable documenting it, expecting more advanced users to uncover and handle it accordingly.

Given that, I think your move would be to either:

rewrite your test query, to ensure it always returns at least one row
rewrite your fail_calc, to handle the case in which no rows are returned

I don't think the latter should be too tricky at all:

{{
  config(
    error_if = '>=1',
    fail_calc = 'case when count(*) = 0 then 404 else sum(non_abc_letters) end'
  )
}}

$ dbt build
Running with dbt=0.21.1-rc1
Found 1 model, 1 test, 0 snapshots, 0 analyses, 350 macros, 0 operations, 0 seed files, 0 sources, 0 exposures

18:54:48 | Concurrency: 5 threads (target='dev')
18:54:48 |
18:54:48 | 1 of 2 START view model dbt_jcohen.model_that_needs_a_custom_test.... [RUN]
18:54:48 | 1 of 2 OK created view model dbt_jcohen.model_that_needs_a_custom_test [CREATE VIEW in 0.07s]
18:54:48 | 2 of 2 START test assert_only_categories_abc_exist................... [RUN]
18:54:48 | 2 of 2 FAIL 404 assert_only_categories_abc_exist..................... [FAIL 404 in 0.03s]
18:54:48 |
18:54:48 | Finished running 1 view model, 1 test in 0.29s.

Completed with 1 error and 0 warnings:

Failure in test assert_only_categories_abc_exist (tests/assert_only_categories_abc_exist.sql)
  Got 404 results, configured to fail if >=1

  compiled SQL at target/compiled/testy/tests/assert_only_categories_abc_exist.sql

Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

I'm going to close this issue in favor of a proposed change to docs.getdbt.com, specifically this page: https://docs.getdbt.com/reference/resource-configs/fail_calc

Would you be up for opening an issue, or (if you feel moved) making the change yourself?

dbeatty10 · 2024-04-03T14:12:39Z

I got bit by this today, and it took me a long while to figure out what was going on.

As a result, I opened a PR to update the docs: dbt-labs/docs.getdbt.com#5221

It would have been nice if either of the following happened instead:

None failures automatically converted to 0 to bypass any issues
A more understandable error message (like "if you have a custom fail_calc, make sure to handle the case when no rows are returned"

## What are you changing in this pull request and why? The current code example does not always work and may raise an error like this: ``` 14:01:53 None is not of type 'integer' 14:01:53 14:01:53 Failed validating 'type' in schema['properties']['failures']: 14:01:53 {'type': 'integer'} 14:01:53 14:01:53 On instance['failures']: 14:01:53 None ``` As described in dbt-labs/dbt-core#4248, it's known that: - tests whose `fail_calc` is a sum() function raises an error when the test produces no rows at all Well, that explains why the code example doesn't work! Jerco described two options for dealing with this: 1. rewrite the test query, to ensure it always returns at least one row 1. rewrite the `fail_calc`, to handle the case in which no rows are returned This PR adopts the 2nd option and updates the code example to handle the case in which no rows are returned. ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [x] I've verified that the updated code example works

…no rows at all (#5246) [Preview](https://docs-getdbt-com-git-dbeatty10-patch-1-dbt-labs.vercel.app/reference/resource-configs/fail_calc) ## What are you changing in this pull request and why? Follow-up to #5221 resolves #918 See also: dbt-labs/dbt-core#4248 ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.

EPNickBosma added bug Something isn't working triage labels Nov 9, 2021

jtcohen6 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Nov 16, 2021

jtcohen6 closed this as completed Nov 16, 2021

EPNickBosma mentioned this issue Nov 20, 2021

fail_calc config docs should warn users not to use a sum() fail_calc in their test query if that query could conceivably return no rows at all. dbt-labs/docs.getdbt.com#918

Closed

dbeatty10 changed the title ~~[Bug] <Custom tests whose fail_calc is a sum() function cannot run properly when the test produces no rows at all>~~ [Bug] Custom tests whose fail_calc is a sum() function cannot run properly when the test produces no rows at all Apr 3, 2024

dbeatty10 mentioned this issue Apr 3, 2024

Update code example for fail_calc dbt-labs/docs.getdbt.com#5221

Merged

2 tasks

dbeatty10 mentioned this issue Apr 8, 2024

Beware custom fail_calc in tests that have the potential to return no rows at all dbt-labs/docs.getdbt.com#5246

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Custom tests whose `fail_calc` is a sum() function cannot run properly when the test produces no rows at all #4248

[Bug] Custom tests whose `fail_calc` is a sum() function cannot run properly when the test produces no rows at all #4248

EPNickBosma commented Nov 9, 2021

jtcohen6 commented Nov 16, 2021

dbeatty10 commented Apr 3, 2024

[Bug] Custom tests whose fail_calc is a sum() function cannot run properly when the test produces no rows at all #4248

[Bug] Custom tests whose fail_calc is a sum() function cannot run properly when the test produces no rows at all #4248

Comments

EPNickBosma commented Nov 9, 2021

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

What database are you using dbt with?

Additional Context

jtcohen6 commented Nov 16, 2021

dbeatty10 commented Apr 3, 2024

[Bug] Custom tests whose `fail_calc` is a sum() function cannot run properly when the test produces no rows at all #4248

[Bug] Custom tests whose `fail_calc` is a sum() function cannot run properly when the test produces no rows at all #4248