Add support for `InvocationMode.DBT_RUNNER` for local execution mode #836

jbandoro · 2024-02-06T02:57:18Z

Description

This PR adds dbtRunner programmatic invocation for ExecutionMode.LOCAL. I decided to not make a new execution mode for each (e.g. ExecutionMode.LOCAL_DBT_RUNNER) and all of the child operators but instead added an additional config ExecutionConfig.invocation_mode where InvocationMode.DBT_RUNNER could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements.

With the dbtRunnerResult it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator:

Performance Testing

After #827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models:

InvocationMode.SUBPROCESS:

Ran 10 models in 23.77661895751953 seconds
NUM_MODELS=10
TIME=23.77661895751953

InvocationMode.DBT_RUNNER:

Ran 10 models in 8.390100002288818 seconds
NUM_MODELS=10
TIME=8.390100002288818

So using InvocationMode.DBT_RUNNER is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task.

One thing I found while working on this is that a manifest is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well.

I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire execute method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR.

Related Issue(s)

closes #717

Breaking Change?

None

Checklist

I have made corresponding changes to the documentation (if required)
I have added tests that prove my fix is effective or that my feature works - added unit tests and integration tests.

…lid for local/venv operators

netlify · 2024-02-06T02:57:22Z

✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name	Link
🔨 Latest commit	`f761a8a`
🔍 Latest deploy log	https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/65d00500ebc18600082acee7

codecov · 2024-02-06T21:39:55Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9af7067) 94.72% compared to head (f0e03be) 94.70%.

❗ Current head f0e03be differs from pull request most recent head f761a8a. Consider uploading reports for the commit f761a8a to get more accurate results

Files	Patch %	Lines
cosmos/dbt/parser/output.py	96.29%	1 Missing ⚠️
cosmos/operators/local.py	98.27%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #836      +/-   ##
==========================================
- Coverage   94.72%   94.70%   -0.02%     
==========================================
  Files          56       56              
  Lines        2520     2589      +69     
==========================================
+ Hits         2387     2452      +65     
- Misses        133      137       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jlaneve

This is looking great! Will try it out locally over the weekend.

One thing I haven't thought a ton about is what config we should pass the performance tests. IMO the perf tests should cover the "best case" scenario (appropriately tune Cosmos for performance) so that we're always pushing the boundary vs the default. Thoughts on including a change in this PR to the perf tests to use this new method?

One other thought: do you think it's worth doing any auto-discovery to infer which invocation method is used? i.e. if you don't explicitly specify one, should we:

try to import the dbt runner, if it works, great - we can use the more performant method
if it doesn't work, no problem, we default to subprocess

jbandoro · 2024-02-17T01:25:26Z

@jlaneve I'm closing this PR and opening up #850 because I couldn't update the GH action and have it run with the updates here on my forked branch.

jbandoro added 3 commits February 5, 2024 15:37

add InvocationMode to ExecutionConfig

9fe6336

pass invocation_mode in task_args only if not None since it's only va…

fc60b91

…lid for local/venv operators

allow DbtLocalBaseOperator to use dbtRunner for invocation

2f7d2ab

jbandoro had a problem deploying to external February 6, 2024 02:57 — with GitHub Actions Error

improve type hints for subprocess hooks

77cd3cb

jbandoro had a problem deploying to external February 6, 2024 18:46 — with GitHub Actions Error

add change_working_directory context manager and add integration tests

7247487

jbandoro had a problem deploying to external February 6, 2024 21:05 — with GitHub Actions Error

rm duplicate import

a1c5da0

jbandoro had a problem deploying to external February 6, 2024 21:07 — with GitHub Actions Error

add test coverage for env vars context

25e669e

jbandoro temporarily deployed to external February 6, 2024 21:15 — with GitHub Actions Inactive

jbandoro added 2 commits February 6, 2024 14:23

add invocation mode to docs

2adda9c

fix: test coverage

15da888

jbandoro had a problem deploying to external February 6, 2024 22:40 — with GitHub Actions Error

jbandoro changed the title ~~WIP - Add support for dbt runner~~ Add support for Invocation.DBT_RUNNER for local and virtualenv execution modes Feb 6, 2024

Merge branch 'main' into 717-add-dbtrunner-local-executor

fe5b1ab

jbandoro changed the title ~~Add support for Invocation.DBT_RUNNER for local and virtualenv execution modes~~ Add support for InvocationMode.DBT_RUNNER for local and virtualenv execution modes Feb 6, 2024

jbandoro temporarily deployed to external February 6, 2024 22:41 — with GitHub Actions Inactive

jbandoro marked this pull request as ready for review February 6, 2024 23:04

jbandoro requested a review from a team as a code owner February 6, 2024 23:04

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Feb 6, 2024

jbandoro added this to the 1.4.0 milestone Feb 6, 2024

dosubot bot added area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:run Primarily related to dbt run command or functionality execution:local Related to Local execution environment labels Feb 6, 2024

on_kill check for InvocationMode.SUBPROCESS

1a90a3f

jbandoro temporarily deployed to external February 13, 2024 01:41 — with GitHub Actions Inactive

Merge branch 'main' into 717-add-dbtrunner-local-executor

64d68e3

jbandoro had a problem deploying to external February 14, 2024 00:52 — with GitHub Actions Error

Merge branch 'main' into 717-add-dbtrunner-local-executor

af6059d

jbandoro temporarily deployed to external February 14, 2024 00:58 — with GitHub Actions Inactive

jbandoro mentioned this pull request Feb 14, 2024

Add performance integration tests #827

Merged

2 tasks

Merge branch 'main' into 717-add-dbtrunner-local-executor

6c6c913

jbandoro temporarily deployed to external February 15, 2024 17:28 — with GitHub Actions Inactive

add note of dbt >= v1.50 requirement

f0e03be

jbandoro temporarily deployed to external February 15, 2024 19:32 — with GitHub Actions Inactive

jlaneve reviewed Feb 16, 2024

View reviewed changes

jbandoro added 4 commits February 16, 2024 11:53

Merge branch 'main' into 717-add-dbtrunner-local-executor

9ca62b3

update perf dag to use postgres to allow latest dbt-core

7a3b182

add invocation mode discovery if none selected

fd92032

add branch to test performance dag updates

56c5e84

jbandoro had a problem deploying to external February 17, 2024 00:11 — with GitHub Actions Error

jbandoro changed the title ~~Add support for InvocationMode.DBT_RUNNER for local and virtualenv execution modes~~ Add support for InvocationMode.DBT_RUNNER for local execution mode Feb 17, 2024

update test env vars

42217c6

jbandoro had a problem deploying to external February 17, 2024 00:29 — with GitHub Actions Error

try add authorize

f761a8a

jbandoro had a problem deploying to external February 17, 2024 00:59 — with GitHub Actions Failure

jbandoro temporarily deployed to internal February 17, 2024 01:13 — with GitHub Actions Inactive

jbandoro closed this Feb 17, 2024

dosubot bot removed this from the 1.4.0 milestone Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `InvocationMode.DBT_RUNNER` for local execution mode #836

Add support for `InvocationMode.DBT_RUNNER` for local execution mode #836

jbandoro commented Feb 6, 2024 •

edited

Loading

netlify bot commented Feb 6, 2024 •

edited

Loading

codecov bot commented Feb 6, 2024 •

edited

Loading

jlaneve left a comment

jbandoro commented Feb 17, 2024

Add support for InvocationMode.DBT_RUNNER for local execution mode #836

Add support for InvocationMode.DBT_RUNNER for local execution mode #836

Conversation

jbandoro commented Feb 6, 2024 • edited Loading

Description

Performance Testing

Related Issue(s)

Breaking Change?

Checklist

netlify bot commented Feb 6, 2024 • edited Loading

✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.

codecov bot commented Feb 6, 2024 • edited Loading

Codecov Report

jlaneve left a comment

Choose a reason for hiding this comment

jbandoro commented Feb 17, 2024

Add support for `InvocationMode.DBT_RUNNER` for local execution mode #836

Add support for `InvocationMode.DBT_RUNNER` for local execution mode #836

jbandoro commented Feb 6, 2024 •

edited

Loading

netlify bot commented Feb 6, 2024 •

edited

Loading

codecov bot commented Feb 6, 2024 •

edited

Loading