Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] metadata source freshness does not work as intended if table name contains wildcard. #1363

Open
2 tasks done
tanukifk opened this issue Oct 3, 2024 · 3 comments
Open
2 tasks done
Labels
bug Something isn't working good_first_issue Good for newcomers

Comments

@tanukifk
Copy link

tanukifk commented Oct 3, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I define a table with a wildcard as a source and then run a freshness test without loaded_at_field to retrieve freshness from metadata.

version: 2

sources:

  - name: dummy_source
    database: your-project
    schema: your-dataset
    freshness:
      warn_after: {count: 1, period: day}
      error_after: {count: 2, period: day}
    tables:
      - name: dummy_*

Even if the table was created a few days ago, max_loaded_at_time_ago_in_s is less than a second.

source.json looks like this:

{
    "metadata": {
        "dbt_schema_version": "https://schemas.getdbt.com/dbt/sources/v3.json",
        "dbt_version": "1.8.3",
        "generated_at": "2024-09-27T10:10:58.326203Z",
        "invocation_id": "xxxxxx",
        "env": {}
    },
    "results": [
        {
            "unique_id": "source.my_new_project.dummy_source.dummy_*",
            "max_loaded_at": "2024-09-27T10:10:57.753000+00:00",
            "snapshotted_at": "2024-09-27T10:10:58.320901+00:00",
            "max_loaded_at_time_ago_in_s": 0.567901,
            "status": "pass",
            "criteria": {
                "warn_after": {
                    "count": 1,
                    "period": "day"
                },
                "error_after": {
                    "count": 2,
                    "period": "day"
                },
                "filter": null
            },
            ...
        }
    ],
    "elapsed_time": 7.178150415420532
}

Expected Behavior

max_loaded_at should be the table creation time, not the current timestamp.

Steps To Reproduce

  1. define a table with a wildcard and not provide loaded_at_field to run metadata source freshness.
  2. run source freshness.

Relevant log output

No response

Environment

- OS: Windows 11
- Python: 3.11.9
- dbt-core: 1.8.3
- dbt-bigquery: 1.8.2

Additional Context

According to the issue, the get_table method in calculate_freshness_from_metadata creates a temp table containing all matched tables and returns its creation time as modified instead of actual latest modified time we'd like to retrieve.

Therefore, we should raise an error or implement an alternative way for tables containing a wildcard in their name.

https://github.com/dbt-labs/dbt-bigquery/blob/v1.8.2/dbt/adapters/bigquery/impl.py#L726-L744

@tanukifk tanukifk added bug Something isn't working triage labels Oct 3, 2024
@amychen1776
Copy link

@tanukifk so I can fully understand the use case here - what are the use cases in which you need to use an wildcard in a table name?

@tanukifk
Copy link
Author

tanukifk commented Oct 29, 2024

hi @amychen1776,

I typically use a wildcard for sharded tables like dummy_20241029.
(I know both dbt and bigquery recommend using a partitioned table, but there are still plenty of sharded source tables.
ref: https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard)

If I specify loaded_at_field for the source with wildcard, the source freshness command can successfully fetch the latest timestamp from all matched tables. Therefore, it is natural that users assume without loaded_at_field the source freshness command also works correctly. However, it doesn't now.

@amychen1776
Copy link

Thank you for the information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants