Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Define separate remote for dbt artifact upload #90

Open
pixie79 opened this issue Jan 30, 2023 · 3 comments
Open

[Feature] Define separate remote for dbt artifact upload #90

pixie79 opened this issue Jan 30, 2023 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@pixie79
Copy link

pixie79 commented Jan 30, 2023

Hi,

Thanks for this project it looks great and I am looking to switch over to using it. One thing on the docs in order to keep the speed on downloads I will probably zip my project dir. At the moment however, I do use s3 for my docs but I would like to use a different bucket to the one used for pulling airflow and dbt resources. Is there an override for the command below to change the upload bucket?

Thanks

    dbt_docs = DbtDocsGenerateOperator(
        task_id="dbt_docs",
        project_dir="s3://my-bucket/dbt/project/key/prefix/",
        profiles_dir="s3://my-bucket/dbt/profiles/key/prefix/",
    )
@pixie79
Copy link
Author

pixie79 commented Feb 21, 2023

If I do the above with a zip file for the project it does correctly generate the docs as far as I can tell, but then attempts to overwrite my zip file on s3 which is not great as that would then get overwritten again as part of my CI/CD process from github.

2023-02-21, 15:22:13 UTC] {dbt.py:289} INFO - Pushing dbt project to: s3://XXXX-data-airflow/dbt-project.zip
[2023-02-21, 15:22:13 UTC] {base.py:88} INFO - Pushing dbt project files to: s3://XXXX-data-airflow/dbt-project.zip
[2023-02-21, 15:22:13 UTC] {s3.py:243} INFO - Loading file /tmp/airflowtmpmnxiz710/.temp.zip to S3: dbt-project.zip
[2023-02-21, 15:22:13 UTC] {base_aws.py:130} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2023-02-21, 15:22:14 UTC] {s3.py:256} WARNING - Failed to load dbt-project.zip: key already exists in S3.
[2023-02-21, 15:22:16 UTC] {taskinstance.py:1318} INFO - Marking task as SUCCESS. dag_id=dbt_docs_generate, task_id=gl_hourly, execution_date=20230221T151529, start_date=20230221T151531, end_date=20230221T152216
[2023-02-21, 15:22:16 UTC] {local_task_job.py:208} INFO - Task exited with return code 0

Either way, as you can see, the ZIP push failed, but the job passed, which is incorrect as it should fail. Ideally, I need to be able to set the upload to a different location or bucket so that the write can succeed.

@tomasfarias
Copy link
Owner

tomasfarias commented Mar 5, 2023

There is currently no way to override the upload destination: we only support uploading to the same key from where we downloaded the project.

You could -in theory, I haven't tried this- push the documentation artifacts to XCOM (via do_xcom_push_artifacts) and then have a follow-up task pick them up and send them to your different S3 bucket. But XCOM (at least the default backend) wasn't designed to store the heavy dbt documentation artifacts, so this is not ideal.

From airflow-dbt-python's perspective, I don't see any reason not to support this: it's a matter of having the time to implement the feature. I would make it generic enough so that we can override the upload destination of all dbt artifacts, not just those generated by dbt docs, perhaps with a new argument artifact_remote_url or something.

Or changing do_xcom_push_artifacts to a more generic upload_artifacts and having XCom be one of the options for remote uploads.

If you are up to taking a stab at this (or you have already done it) I can review the PR. Otherwise I may have time to do this (but can't promise a timeline).

Thanks for reporting this issue!

@tomasfarias tomasfarias added the enhancement New feature or request label Mar 5, 2023
@tomasfarias tomasfarias changed the title DbtDocsGenerateOperator Upload bucket [Feature] Define separate remote for dbt artifact upload Mar 5, 2023
@tomasfarias tomasfarias added this to the v1.1.0 milestone Mar 12, 2023
@pixie79
Copy link
Author

pixie79 commented Mar 16, 2023

Thanks for that, I did take a look but cant really see where to do this correctly.

For now I will try via the XCOM and hope you are able to find time as some point to update.

Thank you for your work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants