-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #162
base: master
Are you sure you want to change the base?
Conversation
On my simple, sample filesystem, this speeds up my benchmark (a series of git clones) by a factor of 5x-10x. The vast majority of the time savings is from reduced allocation and GC of giant buffers. Some representative runtime.MemStats fields after a run, before: TotalAlloc=302219702256 Mallocs=343942 Frees=342827 GCCPUFraction=0.199405 After: TotalAlloc=42925688 Mallocs=194497 Frees=193819 GCCPUFraction=0.000480 Note that ReadRequest is normally called from exactly one goroutine, as part of a loop in Server.serve, so switching rio to a sync.Mutex will actually help performance (sync.RWMutex has non-trivial additional overhead). It also makes the code simpler and easier to reason about. This does introduce expensive copies for large buffers. I experimented with having three tiers of buffer sizes instead. In that scenario, small buffers are pooled (as in this commit), medium buffers are allocated and the data copied over (as in this commit), but very large buffers take over c.readbuf instead of copying it, and allocate a new c.readbuf in its place. In my experiments, in order for it to be worth allocating a full sized buffer, the cutoff to count as "very large" has to be very large indeed, and I don't see any evidence that such messages are common enough to warrant the extra code (if indeed such messages exist in practice at all).
In addition to using the final long-term home of the context package, this helps performance. Surprisingly, context.WithCancel showed up in double-digits during cpu profiling. The golang.org/x/net context implementation just calls through to the stdlib context. Eliminating that extra hop halved the time spent in context.WithCancel.
The Request field was never used. This reduces memory allocations in my simple filesystem tests by about 20%.
Eliminates about 20% of allocations in my simple filesystem.
buffer.reset is unused.
Reduces allocations in my simple filesystem by about 25%. Increases the total size of allocations by about 3.5%. Reducing the size of the preallocated buffer moves those numbers in the obvious ways (allocs up, total size down). I picked 160 because it covered > 95% of the messages in my simple filesystem.
any news about this pr ? why it not merged and does not have comments from author ? |
@vtolstov This project appears to be dead. There have been no commits to any branch for over a year. It would be nice if the author updated the README to redirect to actively-maintained alternatives. |
<insert monty python sketch about bringing out your dead> More serious status update: lots of life changes, big refactor that is not quite ready and is proving very hard to break into smaller changes. Expect a phoenix resurrection moment with a new API that allows for more performant memory management. At some point.. |
@tv42 How's that going? If you don't mind me asking. The last commit to this repo was in April. Would you like help maintaining the repo? (I can't volunteer the time myself but perhaps I can help recruit?) |
This series of commits considerably improves performance on my machine (macOS), on a simple in-memory filesystem, using git clone to exercise the filesystem.
Before these commits, the git clone ran 5-x10x slower than on the OS filesystem. After these commits, the performance is comparable.
Some runtime.MemStats stats, before these commits:
TotalAlloc=302219702256 Mallocs=343942 Frees=342827 GCCPUFraction=0.199405
After the first (readbuf) commit:
TotalAlloc=43079832 Mallocs=198069 Frees=197442 GCCPUFraction=0.000560
At the end of the commits:
TotalAlloc=41499160 Mallocs=104432 Frees=103759 GCCPUFraction=0.000171
The commits are mostly independent of each other.