Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get optimizations from target-postgres dependency #3

Merged
merged 8 commits into from
Jun 9, 2022

Conversation

nsmith22
Copy link

@nsmith22 nsmith22 commented Jun 9, 2022

Update the target-postgres dependency to get the latest master branch commit, which is many commits ahead of the 0.2.4 package version. This provides several performance optimizations that have been added, such as:

Additionally, bypass table insertion work when the record count is zero. The code still respects the persist_empty_tables setting to manage the table schema itself, but will not go through the expensive process to perform a zero-record insertion. That process takes ~6s per table, which in the GitHub tap means a few minutes of completely wasted time for an ETL process which has little-to-no new data.

Testing

Tested locally with tap-github running on minwareco/repotest, which brought runtime down from ~1m45s to ~40s. Also, most of that remaining run time will be eliminated when https://minware.atlassian.net/browse/MW-496 is completed, which causes only one GitHub ingest pipeline per org to process the global data (e.g. collaborators, issues).

@nsmith22 nsmith22 requested a review from KBorders01 June 9, 2022 14:48
Copy link

@KBorders01 KBorders01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily.

@nsmith22 nsmith22 merged commit 7ca4964 into master Jun 9, 2022
@nsmith22 nsmith22 deleted the nick/MW-375-get-postgres-optimizations branch June 9, 2022 17:02
@nsmith22
Copy link
Author

nsmith22 commented Jun 9, 2022

These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily.

I opened a PR for that repo at https://github.com/datamill-co/target-snowflake/pull/35/files

However, there are a bunch of open PRs there which are quite old, and there hasn't been a merge since Jan 2021, so it seems that it isn't being maintained anymore.

@KBorders01
Copy link

KBorders01 commented Jun 10, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants