-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize repeatedly serializing array bitmaps #364
base: master
Are you sure you want to change the base?
Optimize repeatedly serializing array bitmaps #364
Conversation
Effectively, this is an attempt at doing your own memory allocation. Such approaches can help, in specific scenarios, but they can also cause harm and increase complexity. There is a no free lunch property: trying to beat the Go memory allocator from Go itself involves tradeoffs which, on average, are going to leave you in a bad shape. I don't think we want to go there generally. However, if you are interested in extending the roaring bitmaps, in such a way that current users are totally unaffected by your changes, then that would be acceptable. So, for example, I don't think we want to fatten and complexify the type Bitmap. If we ever change our types, we will make them simpler and more lightweight, not the other way around. If you want to create a separate Bitmap factory that could recycle old bitmaps, and produce new ones by trying to recycle existing ones, that is fine. So instead of doing... for i := 0; i < b.N; i++ {
l.ClearRetainStructures()
for j := 0; j < 8; j++ {
l.Add(uint32(j))
}
buf.Reset()
_, err := l.WriteTo(buf)
if err != nil {
panic(err)
}
} You could try something like this... (this is not meant to be actual Go code) bf := NewBitmapFactory()
for i := 0; i < b.N; i++ {
content := [...]uint32{0,1,2,3,4,5,6,7}
l := bf.NewBitmapFrom(content)
buf.Reset()
_, err := l.WriteTo(buf)
if err != nil {
panic(err)
}
bf.Recycle(l)
}
} So if you have to frequently create new small bitmaps from some content, then you could just store some containers in your factory and then produce a new bitmap from the available poll. If there are bugs or inefficiencies, or complexity in a |
I don't really want to get into a big philosophical discussion about high performance Go software, but every major database that uses (or has used) this library (M3, Husky, InfluxDB, Pilosa, etc) takes extreme care to minimize unnecessary allocations in the critical path. Its trivial to beat the Go allocator, in Go, simply by virtue of understanding your application and leveraging tasteful amount of object pooling and reuse. Leaning on the Go G.C for 95% of allocations is a great decision and a major reason why people are able to build large scale datasystems so quickly in Go, but most of those systems would be dead in the water performance wise if they didn't bypass the default allocator (at least to some degree) in their critical paths. That said I am open to an alternatives implementations. The proposal you provided works for the narrow use-case in my original issue, but it doesn't remove the allocation caused by the call to Alternatively, I think a solution that would work for a much wider variety of performance-critical use-cases would be to allow the bitmap types to be constructed with an optional "allocator". So something like this: type Allocator interface {
AllocateBytes(size, capacity int) []byte
AllocateInt16s(size, capacity int) []int16
...
} The existing Bitmaps would have a allocator := NewMyAllocator()
bf := roaring.NewBitmapWithAllocator(allocator)
for i := 0; i < b.N; i++ {
for j := 0; j < 8; j++ {
// Allocating the sparse container leverages the allocator.
l.Add(uint32(j))
}
buf.Reset()
// Allocating the temporary buffer in this method leverages the allocator also.
_, err := l.WriteTo(buf)
if err != nil {
panic(err)
}
} The I think that would strike a good balance between allowing callers to control allocations if they're willing to absorb the complexity in their application, while keeping additional complexity in the Bitmap implementation to a minimum (call a function to allocate instead of using make(), but thats it). I went and looked through our codebase for the other areas where the allocations caused by this library are major sources of CPU and GC waste in our applications and almost all of them would be optimizeable to almost nothing with a custom allocator. For example, another use-case we have is "compacting" together files that contain many of these bitmaps in an M:N fashion in a single-thread (sometimes compacting millions of bitmaps in a very short period of time). We suffer a lot of allocations from reading the |
@lemire to validate my suggestion, I made a quick draft: #366 Its not bad. Its still significantly better than not having a custom allocator, but it still suffers from allocating the
Maybe its good enough. Let me know what you think about this approach! |
Yes, the ability to provide a custom memory allocator is a reasonable path if it can be implemented in a reasonable manner. For that to be effective, you need to have both allocation and deallocation functions. One should be mindful that a container can belong to several bitmaps at once (due to 'copy on write'). Evidently, it would need to be safe in a multithreaded context (the Go garbage collector is thread safe). If you can address these concerns and get a performance boost in some cases, it would be great. Note that they are related issues, e.g., #228 |
@lemire I replied in the P.R draft, but I think my proposal works for copyOnWrite by default (any containers which must be copied will already be copied by the existing implementation). Thread-safety is an implementation detail to be handled by users who choose to write their own allocators (for my use-case for example, thread-safety is not a concern since I intend to spawn an allocator per-thread basically). I don't think deallocation functions are required for many use-cases. I could add them to the interface if you like, but it will result in more changes and the P.R will end up looking more like my original P.R I think. I suspect that most use-cases can be addressed without deallocator functions. Some things would require a deallocator, but I wonder if it would be easier to start with a simpler pattern and expand the interface as we go? |
#228 is interesting yeah, although a large change. ~ the same thing could be accomplished by allowing the allocator to provide container wrappers FWIW. |
@richardartoul I do not understand how you can safely reuse an array with a new bitmap without knowing that the array is now available. |
@lemire Let me give a few examples in pseudocode that I think are probably relevant for many different systems: Ingestion allocator := NewAllocator()
roaring := roaring.NewWithAllocator(allocator)
for _, buffer := range buffers {
// This will make Bitmap release any references internally, and so
// when the allocator provides them again, its fine.
roaring.Reset()
for _, v := range buffer {
// Allocations caused by this method will grab from the allocator.
roaring.Add(v)
}
// Once this method is complete, any datastructures taken from the
// allocator are safe to be reused.
roaring.WriteTo(stream)
} Compaction inputStreams := openInputStreams()
bitmaps := make([]*roaring.Bitmap,0,len(allocators)
for _, stream := range inputStreams {
bitmaps = append(bitmaps, roaring.NewWithAllocator(custom.NewMyCustomAllocator())
}
output := roaring.NewWithAllocator(roaring.NewWithAllocator(custom.NewMyCustomAllocator())
for streamsAreNotEmpty() {
for i, stream := range inputStreams {
bitmaps[i].Reset()
// Any containers that need to be allocated by this method can pull from the allocator
// which basically never allocates, just returns objects from some "pool" it creates one time.
bitmaps[i].ReadFrom(stream)
}
output.Reset()
for _, bitmap := range bitmaps {
// New containers allocated by this pull from allocator.
output.UnionInPlace(bitmap)
}
// After this method completes we "know" its safe to reuse output and
// its allocator (which always returns from the same pool of objects) which
// allows us to eliminate allocations even though we never explicitly return
// anything to the pool.
//
// After this method all the input bitmaps (and their corresponding allocators)
// are safe to reuse also.
output.WriteTo(outputStream)
} There is a corresponding equivalent for query as well, but this is already a long comment. |
Basically I as the user/programmer know that at some point that the datastructures that were provided to the Bitmap by the allocator will never be touched again, even if they are never explicitly returned with a |
I guess one thing I left out in my pseudocode is that the allocator would probably need some kind of |
Could other people chime in on this PR? |
I believe that internal pools may be necessary to maximize performance, and the bitmap should include a 'Reset' method to perform recycling tasks. |
Benchmark before
Benchmark after
So a little less than 2x as fast, but zero allocation now as well.