Skip to content
This repository has been archived by the owner on May 30, 2022. It is now read-only.

Reduce memory footprint due to storing state messages #2

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

laurentS
Copy link
Collaborator

@laurentS laurentS commented Feb 8, 2022

This PR helps substantially reduce the memory footprint of the target by storing the singer STATE message in their raw string format instead of the deserialized objects.

The downside is that STATE messages are deserialized twice instead of just one time, so there is some slight performance penalty. However, this only really applies to 1/N such messages, with N depending on the max__batch_rows config setting, but typically appearing to be >100.

In practice, we've seen a memory footprint reduction by a factor of 15-20x with this change, this will of course depend on how your STATE messages are structured.

doublethefish and others added 2 commits March 4, 2021 10:20
This only documents `tests/unit` as `tests/migrations` appear to be
designed to work inside a container rather than a local dev machine.
Copy link
Member

@ericboucher ericboucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'm just wondering if it might make sense to propose this as a config? Or at least code with that in mind?

So for example, handle_state_message could take in both line and line_data as arguments and then decide what to do with it based on the config?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants