[Regression] support non-literal `batch_id` config for python models on dataproc #1321

maxmckittrick · 2024-08-16T18:26:17Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

currently, the default batch ID that's included for python models submitted to dataproc is simply str(uuid.uuid4()), this was last changed with #1020.

this works, and is sufficient to avoid 409 Already exists: Failed to create batch errors from dataproc when attempting to submit batches with duplicate names, but after the test changes included in #1014, attempting to pass any non-literal batch_id in the model config will cause a parsing error, e.g.;

18:19:35  Running with dbt=1.8.5
18:19:36  Registered adapter: bigquery=1.8.2
18:19:36  Unable to do partial parsing because of a version mismatch
18:19:39  Encountered an error:
Parsing Error
  Error when trying to literal_eval an arg to dbt.ref(), dbt.source(), dbt.config() or dbt.config.get()
  malformed node or string on line 49: <ast.Name object at 0x169b599f0>
  https://docs.python.org/3/library/ast.html#ast.literal_eval
  In dbt python model, `dbt.ref`, `dbt.source`, `dbt.config`, `dbt.config.get` function args only support Python literal structures

this makes passing any non-default batch_id more or less impossible, as using a var to assign a dynamic batch ID at runtime will throw an error from literal_eval, and setting a static batch ID will allow a model to run on dataproc only once before throwing a 409 error.

Describe alternatives you've considered

one alternative would be to amend the default_batch_id config to prepend the model name with either a uuid, or with a non-static dbt env var, maybe invocation_id (unsure if this would only work on dbt cloud)? this would avoid the previous errors when using created_at as mentioned in #1006

Who will this benefit?

everyone who wants to see descriptive batch names in dataproc!

Are you interested in contributing this feature?

yes, I'm a regular dbt user but haven't contributed anything here before :)

Anything else?

I've confirmed this is broken in both dbt-core v1.8.5/dbt-bigquery v1.8.2 and dbt-core v1.7.16/dbt-bigquery v1.7.9

The text was updated successfully, but these errors were encountered:

amychen1776 · 2024-08-28T14:50:35Z

@maxmckittrick Thank you for opening up the issue.
What are the use cases for which you use the batch ids? (I assume it's to help you identify the queries?)

maxmckittrick · 2024-10-11T20:14:16Z

@amychen1776 yes, it'd be very helpful for us to see descriptive batch names when viewing the dataproc console; we typically run a few dozen python models per day in production, and there's no way to easily identify which batch is associated with which dbt model:

maxmckittrick added enhancement New feature or request triage labels Aug 16, 2024

amychen1776 added python Pull requests that update Python code and removed triage labels Aug 28, 2024

amychen1776 added python_models and removed python Pull requests that update Python code labels Aug 28, 2024

amychen1776 changed the title ~~[Feature] support non-literal batch_id config for python models on dataproc~~ [Feature] [Regression] support non-literal batch_id config for python models on dataproc Aug 28, 2024

amychen1776 added the regression label Aug 28, 2024

amychen1776 changed the title ~~[Feature] [Regression] support non-literal batch_id config for python models on dataproc~~ [Regression] support non-literal batch_id config for python models on dataproc Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] support non-literal `batch_id` config for python models on dataproc #1321

[Regression] support non-literal `batch_id` config for python models on dataproc #1321

maxmckittrick commented Aug 16, 2024 •

edited

Loading

amychen1776 commented Aug 28, 2024 •

edited

Loading

maxmckittrick commented Oct 11, 2024

[Regression] support non-literal batch_id config for python models on dataproc #1321

[Regression] support non-literal batch_id config for python models on dataproc #1321

Comments

maxmckittrick commented Aug 16, 2024 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

amychen1776 commented Aug 28, 2024 • edited Loading

maxmckittrick commented Oct 11, 2024

[Regression] support non-literal `batch_id` config for python models on dataproc #1321

[Regression] support non-literal `batch_id` config for python models on dataproc #1321

maxmckittrick commented Aug 16, 2024 •

edited

Loading

amychen1776 commented Aug 28, 2024 •

edited

Loading