You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion
Describe the feature
In Redshift Serverless, Queries are billed for a minimum of 60 seconds, It's better to batch the queries. For example, let's say you are running a model with full dependency. You run system table queries like pg_namespace and information_schema.tables at T0, and those results are processed, and the model query runs at T1-T2. You are billed from T0 to T2 instead of just T1-T2, which includes a lot of IO time. This same thing is done for every model in the dependency chain.
The proposal is to do this system tables queries at the startup time itself while resolving dependency resolution so that they are queried, and when the actual models are run it's not queried again ( Which makes the Redshift to wait).
@jaswanthikolla Thank you so much for opening the three issues! And welcome to dbt :) In the future, feel free to group these similar requests together!
amychen1776
changed the title
[Feature] Batch the System table Queries to Optimize for Serverless
[Feature] Optimize for Serverless
Jul 25, 2024
amychen1776
changed the title
[Feature] Optimize for Serverless
[Feature] Optimize for Redshift Serverless
Aug 28, 2024
Is this your first time submitting a feature request?
Describe the feature
In Redshift Serverless, Queries are billed for a minimum of 60 seconds, It's better to batch the queries. For example, let's say you are running a model with full dependency. You run system table queries like pg_namespace and information_schema.tables at T0, and those results are processed, and the model query runs at T1-T2. You are billed from T0 to T2 instead of just T1-T2, which includes a lot of IO time. This same thing is done for every model in the dependency chain.
The proposal is to do this system tables queries at the startup time itself while resolving dependency resolution so that they are queried, and when the actual models are run it's not queried again ( Which makes the Redshift to wait).
Pulled from other Issues:
Describe alternatives you've considered
Multiple Workspaces with different RPU, but it's outside the scope of DBT.
Who will this benefit?
All redshift serverless users, This can save millions of dollars across industry.
Are you interested in contributing this feature?
I am 3 days into DBT, But Yes I can!
Anything else?
May be you can take this next level and use SQLLite to cache the system tables info locally.
The text was updated successfully, but these errors were encountered: