[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

tanukifk · 2024-10-03T13:38:51Z

Is this a new bug in dbt-bigquery?

I believe this is a new bug in dbt-bigquery
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I define a table with a wildcard as a source and then run a freshness test without loaded_at_field to retrieve freshness from metadata.

version: 2

sources:

  - name: dummy_source
    database: your-project
    schema: your-dataset
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: day}
    tables:
      - name: dummy_*

Even if the table was created a few days ago, max_loaded_at_time_ago_in_s is less than a second.

source.json looks like this:

{
    "metadata": {
        "dbt_schema_version": "https://schemas.getdbt.com/dbt/sources/v3.json",
        "dbt_version": "1.8.3",
        "generated_at": "2024-09-27T10:10:58.326203Z",
        "invocation_id": "xxxxxx",
        "env": {}
    },
    "results": [
        {
            "unique_id": "source.my_new_project.dummy_source.dummy_*",
            "max_loaded_at": "2024-09-27T10:10:57.753000+00:00",
            "snapshotted_at": "2024-09-27T10:10:58.320901+00:00",
            "max_loaded_at_time_ago_in_s": 0.567901,
            "status": "pass",
            "criteria": {
                "warn_after": {
                    "count": 1,
                    "period": "day"
                },
                "error_after": {
                    "count": 2,
                    "period": "day"
                },
                "filter": null
            },
            ...
        }
    ],
    "elapsed_time": 7.178150415420532
}

Expected Behavior

max_loaded_at should be the table creation time, not the current timestamp.

Steps To Reproduce

define a table with a wildcard and not provide loaded_at_field to run metadata source freshness.
run source freshness.

Relevant log output

No response

Environment

- OS: Windows 11
- Python: 3.11.9
- dbt-core: 1.8.3
- dbt-bigquery: 1.8.2

Additional Context

According to the issue, the get_table method in calculate_freshness_from_metadata creates a temp table containing all matched tables and returns its creation time as modified instead of actual latest modified time we'd like to retrieve.

Therefore, we should raise an error or implement an alternative way for tables containing a wildcard in their name.

https://github.com/dbt-labs/dbt-bigquery/blob/v1.8.2/dbt/adapters/bigquery/impl.py#L726-L744

The text was updated successfully, but these errors were encountered:

amychen1776 · 2024-10-28T14:03:35Z

@tanukifk so I can fully understand the use case here - what are the use cases in which you need to use an wildcard in a table name?

tanukifk · 2024-10-29T02:29:04Z

hi @amychen1776,

I typically use a wildcard for sharded tables like dummy_20241029.
(I know both dbt and bigquery recommend using a partitioned table, but there are still plenty of sharded source tables.
ref: https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard)

If I specify loaded_at_field for the source with wildcard, the source freshness command can successfully fetch the latest timestamp from all matched tables. Therefore, it is natural that users assume without loaded_at_field the source freshness command also works correctly. However, it doesn't now.

amychen1776 · 2024-10-30T13:24:39Z

Thank you for the information!

tanukifk added bug Something isn't working triage labels Oct 3, 2024

amychen1776 added awaiting_response and removed triage labels Oct 28, 2024

github-actions bot added triage and removed awaiting_response labels Oct 29, 2024

amychen1776 removed the triage label Oct 30, 2024

tanukifk mentioned this issue Nov 2, 2024

[Feature] Implement batch metadata freshness using INFORMATION_SCHEMA.TABLE_STORAGE instead of client.get_table #1239

Open

3 tasks

colin-rogers-dbt added the good_first_issue Good for newcomers label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

tanukifk commented Oct 3, 2024

amychen1776 commented Oct 28, 2024

tanukifk commented Oct 29, 2024 •

edited

Loading

amychen1776 commented Oct 30, 2024

[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

Comments

tanukifk commented Oct 3, 2024

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

amychen1776 commented Oct 28, 2024

tanukifk commented Oct 29, 2024 • edited Loading

amychen1776 commented Oct 30, 2024

tanukifk commented Oct 29, 2024 •

edited

Loading