Get optimizations from target-postgres dependency #3

nsmith22 · 2022-06-09T14:48:19Z

Update the target-postgres dependency to get the latest master branch commit, which is many commits ahead of the 0.2.4 package version. This provides several performance optimizations that have been added, such as:

Multiple optimizations: Multiple performance improvements datamill-co/target-postgres#204
Reduce state message memory footprint: Reduce memory footprint of state messages datamill-co/target-postgres#212

Additionally, bypass table insertion work when the record count is zero. The code still respects the persist_empty_tables setting to manage the table schema itself, but will not go through the expensive process to perform a zero-record insertion. That process takes ~6s per table, which in the GitHub tap means a few minutes of completely wasted time for an ETL process which has little-to-no new data.

Testing

Tested locally with tap-github running on minwareco/repotest, which brought runtime down from ~1m45s to ~40s. Also, most of that remaining run time will be eliminated when https://minware.atlassian.net/browse/MW-496 is completed, which causes only one GitHub ingest pipeline per org to process the global data (e.g. collaborators, issues).

…e improvements

KBorders01

These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily.

nsmith22 · 2022-06-09T17:27:39Z

These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily.

I opened a PR for that repo at https://github.com/datamill-co/target-snowflake/pull/35/files

However, there are a bunch of open PRs there which are quite old, and there hasn't been a merge since Jan 2021, so it seems that it isn't being maintained anymore.

KBorders01 · 2022-06-10T15:44:36Z

Ah, okay, that's unfortunate. Hopefully this will be good enough that we don't need to do too many more fixes in the future.

…

On Thu, Jun 9, 2022 at 1:27 PM Nick Smith ***@***.***> wrote: These are good optimizations. Let’s submit them back to data mill as well if that can be done quickly and easily. I opened a PR for that repo at https://github.com/datamill-co/target-snowflake/pull/35/files However, there are a bunch of open PRs there which are quite old, and there hasn't been a merge since Jan 2021, so it seems that it isn't being maintained anymore. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAY2NQQKF5SRISRKF46FTKLVOISRNANCNFSM5YKLR3PQ> . You are receiving this because your review was requested.Message ID: ***@***.***>

nsmith22 added 7 commits June 9, 2022 09:59

Get latest target-postgres version which contains multiple performanc…

d5696e3

…e improvements

Try https git link

683e470

Fix typo

773ff8d

Another try at the git link

c64cb64

Fix package name

12cec71

Avoid doing work to insert zero-length record batches

d909745

Add date time serialization cache optimization

478630a

nsmith22 requested a review from KBorders01 June 9, 2022 14:48

Fix comment

fee3dcb

KBorders01 approved these changes Jun 9, 2022

View reviewed changes

nsmith22 merged commit 7ca4964 into master Jun 9, 2022

nsmith22 deleted the nick/MW-375-get-postgres-optimizations branch June 9, 2022 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get optimizations from target-postgres dependency #3

Get optimizations from target-postgres dependency #3

nsmith22 commented Jun 9, 2022

KBorders01 left a comment

nsmith22 commented Jun 9, 2022

KBorders01 commented Jun 10, 2022 via email

Get optimizations from target-postgres dependency #3

Get optimizations from target-postgres dependency #3

Conversation

nsmith22 commented Jun 9, 2022

Testing

KBorders01 left a comment

Choose a reason for hiding this comment

nsmith22 commented Jun 9, 2022

KBorders01 commented Jun 10, 2022 via email