Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: Use sync.pool #1080

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

gzip: Use sync.pool #1080

wants to merge 1 commit into from

Conversation

Jeremyyang920
Copy link

@Jeremyyang920 Jeremyyang920 commented Nov 26, 2024

This commit uses a sync.Pool for botht he gzip writer and reader so that we reduce the number of allocations and time GC takes as previously every mutation that required to be gzipped would make a call to newWriter and allocate a new object. This in turn spent a lot of time and created extra objects on the heap that were un needed which drove up GC time


This change is Reviewable

@Jeremyyang920
Copy link
Author

image
pprof flame graph for reference over a 15s profile. Majority of time spent in malloc and gc due to all the object allocations.

r, err := gzip.NewReader(bytes.NewReader(data))

gzReader := gzipReaderPool.Get().(*gzip.Reader)
defer gzipReaderPool.Put(gzReader)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Reset() failed below, are we sure we want to put this back in the pool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm thats a good question. Im looking at the Reset code, and I think it just calls bufio.NewReader(r) to reset it back. So the put call is really just putting the base gzip.Reader{} object back and reset internally resets z.r which is the actual reader.

Does that check out with your understand of reset as well?

@ryanluu12345
Copy link
Contributor

Unrelated to the change itself, but I think @noelcrl mentioned this in standup that the container images for v20.1 and v20.2 are no longer present. You can just rebase after this goes in:
#1081

Copy link
Contributor

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General implementation looks good to me. Pending Steven's question and light testing to see the profile changes before/after change.

internal/staging/stage/gzip.go Show resolved Hide resolved
internal/staging/stage/gzip.go Show resolved Hide resolved
@cockroachdb cockroachdb deleted a comment from ryanluu12345 Nov 26, 2024
This commit uses a sync.Pool for botht he gzip writer and reader
so that we reduce the number of allocations and time GC takes as previously
every mutation that required to be gzipped would make a call to newWriter
and allocate a new object. This in turn spent a lot of time and created
extra objects on the heap that were un needed which drove up GC time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants