-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get optimizations from target-postgres dependency #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily.
I opened a PR for that repo at https://github.com/datamill-co/target-snowflake/pull/35/files However, there are a bunch of open PRs there which are quite old, and there hasn't been a merge since Jan 2021, so it seems that it isn't being maintained anymore. |
Ah, okay, that's unfortunate. Hopefully this will be good enough that we
don't need to do too many more fixes in the future.
…On Thu, Jun 9, 2022 at 1:27 PM Nick Smith ***@***.***> wrote:
These are good optimizations. Let’s submit them back to data mill as well
if that can be done quickly and easily.
I opened a PR for that repo at
https://github.com/datamill-co/target-snowflake/pull/35/files
However, there are a bunch of open PRs there which are quite old, and
there hasn't been a merge since Jan 2021, so it seems that it isn't being
maintained anymore.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAY2NQQKF5SRISRKF46FTKLVOISRNANCNFSM5YKLR3PQ>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Update the target-postgres dependency to get the latest
master
branch commit, which is many commits ahead of the0.2.4
package version. This provides several performance optimizations that have been added, such as:Additionally, bypass table insertion work when the record count is zero. The code still respects the
persist_empty_tables
setting to manage the table schema itself, but will not go through the expensive process to perform a zero-record insertion. That process takes ~6s per table, which in the GitHub tap means a few minutes of completely wasted time for an ETL process which has little-to-no new data.Testing
Tested locally with tap-github running on minwareco/repotest, which brought runtime down from ~1m45s to ~40s. Also, most of that remaining run time will be eliminated when https://minware.atlassian.net/browse/MW-496 is completed, which causes only one GitHub ingest pipeline per org to process the global data (e.g. collaborators, issues).