[Feature] Optimize for Redshift Serverless #857

jaswanthikolla · 2024-07-15T19:19:35Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

In Redshift Serverless, Queries are billed for a minimum of 60 seconds, It's better to batch the queries. For example, let's say you are running a model with full dependency. You run system table queries like pg_namespace and information_schema.tables at T0, and those results are processed, and the model query runs at T1-T2. You are billed from T0 to T2 instead of just T1-T2, which includes a lot of IO time. This same thing is done for every model in the dependency chain.

The proposal is to do this system tables queries at the startup time itself while resolving dependency resolution so that they are queried, and when the actual models are run it's not queried again ( Which makes the Redshift to wait).

Pulled from other Issues:

[Bug] Default auto_begin causes unnecessary Serverless Billing #856 auto_begin should be configurable and preferably False as default.
[Bug] Connection PID Query Causes unnecessary Serverless Billing #855 This query should be executed right before executing the first query on the connection. If the PID aleady exists, there is no need to do it again.

Describe alternatives you've considered

Multiple Workspaces with different RPU, but it's outside the scope of DBT.

Who will this benefit?

All redshift serverless users, This can save millions of dollars across industry.

Are you interested in contributing this feature?

I am 3 days into DBT, But Yes I can!

Anything else?

May be you can take this next level and use SQLLite to cache the system tables info locally.

amychen1776 · 2024-07-25T18:32:00Z

@jaswanthikolla Thank you so much for opening the three issues! And welcome to dbt :) In the future, feel free to group these similar requests together!

jaswanthikolla added enhancement New feature or request triage labels Jul 15, 2024

amychen1776 removed the triage label Jul 25, 2024

amychen1776 changed the title ~~[Feature] Batch the System table Queries to Optimize for Serverless~~ [Feature] Optimize for Serverless Jul 25, 2024

amychen1776 changed the title ~~[Feature] Optimize for Serverless~~ [Feature] Optimize for Redshift Serverless Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Optimize for Redshift Serverless #857

[Feature] Optimize for Redshift Serverless #857

jaswanthikolla commented Jul 15, 2024 •

edited by amychen1776

Loading

amychen1776 commented Jul 25, 2024

[Feature] Optimize for Redshift Serverless #857

[Feature] Optimize for Redshift Serverless #857

Comments

jaswanthikolla commented Jul 15, 2024 • edited by amychen1776 Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

amychen1776 commented Jul 25, 2024

jaswanthikolla commented Jul 15, 2024 •

edited by amychen1776

Loading