forked from astronomer/astronomer-cosmos
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for
InvocationMode.DBT_RUNNER
for local execution mode (a…
…stronomer#850) ## Description This PR adds `dbtRunner` programmatic invocation for `ExecutionMode.LOCAL`. I decided to not make a new execution mode for each (e.g. `ExecutionMode.LOCAL_DBT_RUNNER`) and all of the child operators but instead added an additional config `ExecutionConfig.invocation_mode` where `InvocationMode.DBT_RUNNER` could be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements. With the `dbtRunnerResult` it makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator: ![image](https://github.com/astronomer/astronomer-cosmos/assets/79104794/76a4cf82-f0f2-4133-8d68-a0a6a145b1d8) ## Performance Testing After astronomer#827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models: 1. `InvocationMode.SUBPROCESS`: ```shell Ran 10 models in 23.77661895751953 seconds NUM_MODELS=10 TIME=23.77661895751953 ``` 2. `InvocationMode.DBT_RUNNER`: ```shell Ran 10 models in 8.390100002288818 seconds NUM_MODELS=10 TIME=8.390100002288818 ``` So using `InvocationMode.DBT_RUNNER` is almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task. One thing I found while working on this is that a [manifest](https://docs.getdbt.com/reference/programmatic-invocations#reusing-objects) is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well. I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire `execute` method was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR. ## Related Issue(s) closes astronomer#717 ## Breaking Change? None ## Checklist - [x] I have made corresponding changes to the documentation (if required) - [x] I have added tests that prove my fix is effective or that my feature works - added unit tests and integration tests.
- Loading branch information
Showing
21 changed files
with
679 additions
and
106 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -293,13 +293,25 @@ jobs: | |
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH | ||
|
||
Run-Performance-Tests: | ||
needs: Authorize | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ["3.11"] | ||
airflow-version: ["2.7"] | ||
num-models: [1, 10, 50, 100] | ||
|
||
services: | ||
postgres: | ||
image: postgres | ||
env: | ||
POSTGRES_PASSWORD: postgres | ||
options: >- | ||
--health-cmd pg_isready | ||
--health-interval 10s | ||
--health-timeout 5s | ||
--health-retries 5 | ||
ports: | ||
- 5432:5432 | ||
steps: | ||
- uses: actions/checkout@v3 | ||
with: | ||
|
@@ -335,8 +347,14 @@ jobs: | |
AIRFLOW_CONN_AIRFLOW_DB: postgres://postgres:[email protected]:5432/postgres | ||
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: 90.0 | ||
PYTHONPATH: /home/runner/work/astronomer-cosmos/astronomer-cosmos/:$PYTHONPATH | ||
COSMOS_CONN_POSTGRES_PASSWORD: ${{ secrets.COSMOS_CONN_POSTGRES_PASSWORD }} | ||
POSTGRES_HOST: localhost | ||
POSTGRES_USER: postgres | ||
POSTGRES_PASSWORD: postgres | ||
POSTGRES_DB: postgres | ||
POSTGRES_SCHEMA: public | ||
POSTGRES_PORT: 5432 | ||
MODEL_COUNT: ${{ matrix.num-models }} | ||
|
||
env: | ||
AIRFLOW_HOME: /home/runner/work/astronomer-cosmos/astronomer-cosmos/ | ||
AIRFLOW_CONN_AIRFLOW_DB: postgres://postgres:[email protected]:5432/postgres | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.