-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for InvocationMode.DBT_RUNNER
for local execution mode
#836
Add support for InvocationMode.DBT_RUNNER
for local execution mode
#836
Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #836 +/- ##
==========================================
- Coverage 94.72% 94.70% -0.02%
==========================================
Files 56 56
Lines 2520 2589 +69
==========================================
+ Hits 2387 2452 +65
- Misses 133 137 +4 ☔ View full report in Codecov by Sentry. |
Invocation.DBT_RUNNER
for local and virtualenv execution modes
Invocation.DBT_RUNNER
for local and virtualenv execution modesInvocationMode.DBT_RUNNER
for local and virtualenv execution modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! Will try it out locally over the weekend.
One thing I haven't thought a ton about is what config we should pass the performance tests. IMO the perf tests should cover the "best case" scenario (appropriately tune Cosmos for performance) so that we're always pushing the boundary vs the default. Thoughts on including a change in this PR to the perf tests to use this new method?
One other thought: do you think it's worth doing any auto-discovery to infer which invocation method is used? i.e. if you don't explicitly specify one, should we:
- try to import the dbt runner, if it works, great - we can use the more performant method
- if it doesn't work, no problem, we default to subprocess
InvocationMode.DBT_RUNNER
for local and virtualenv execution modesInvocationMode.DBT_RUNNER
for local execution mode
Description
This PR adds
dbtRunner
programmatic invocation forExecutionMode.LOCAL
. I decided to not make a new execution mode for each (e.g.ExecutionMode.LOCAL_DBT_RUNNER
) and all of the child operators but instead added an additional configExecutionConfig.invocation_mode
whereInvocationMode.DBT_RUNNER
could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements.With the
dbtRunnerResult
it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator:Performance Testing
After #827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models:
InvocationMode.SUBPROCESS
:Ran 10 models in 23.77661895751953 seconds NUM_MODELS=10 TIME=23.77661895751953
InvocationMode.DBT_RUNNER
:Ran 10 models in 8.390100002288818 seconds NUM_MODELS=10 TIME=8.390100002288818
So using
InvocationMode.DBT_RUNNER
is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task.One thing I found while working on this is that a manifest is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well.
I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire
execute
method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR.Related Issue(s)
closes #717
Breaking Change?
None
Checklist