-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE - Experimental Java Native Cached Write Batch #13115
Closed
alanpaxton
wants to merge
51
commits into
facebook:main
from
evolvedbinary:address-api-javanativewritebatch
Closed
DO NOT MERGE - Experimental Java Native Cached Write Batch #13115
alanpaxton
wants to merge
51
commits into
facebook:main
from
evolvedbinary:address-api-javanativewritebatch
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
not working yet - invalid link
Extend original native write batch jmh tests to help us compare various write batch implementations. So-far-unused WriteBatchJavaCMem as placeholder for alternative implementation of native write batch.
- Finish making first prototype WriteBatchJavaNative work for put() - buffers put() requests on Java side - applies batched put() requests to C++ write_batch buffer on flush() - RocksDB.write(options, WriteBatchJavaNative) applies batch to DB - Clean up JMH performance tests for WriteBatchJavaNative vs existing WriteBatch
Give this prototype method UnsupportedOperationException on unimplemented methods so that we donm’t waste time investigating why jmh tests that call no-ops are so fast :-(
Pass contents in direct ByteBuffer (WriteBatchJavaNativeDirect) Refactor out from byte[] version (WriteBatchJavaNativeArray) Fix to pre-expand buffers by size rather than re-run insertion on overflow
“flushToDB” option used for stripping out the cost of the DB write at the end.
Refactor array/direct native variants into a single class with different static factories. Add native C++ support methods for creating the write batch on a write() if it doesn’t already exist. Add similar flush methods for creating the write batch on a flush() if it doesn’t already exist. Remove expansion on Java side - buffer is the requested size, behaviour should be to fill that and then flush, OR (TODO) to bypass it entirely (just make diret JNI calls to writebatch as before) for big writes
Refactored previously to remove subclass. So protected methods can now be private.
Bypass the java side buffer when put()sare bigger than 50% of the buffer size - which is controlled at constructor time. Doing a big Put() to this kind of write batch causes the buffer to be flushed (if there is anything in it) and then the Put() is performed directly over the JNI to C++ Note that the WriteBatch on the C++ side is created lazily; this may be on a flush of a batch (buffer is full), or it may be on the write of a batch to a DB, or it may be on a bypass write such as the one implemented here.
A quick step before trying to replicate the layout to the level at which it can be directly memory copied.
Pick up an efficient varint encoder from the interwebs Rewrite the encoding using that TODO - the C++ side, so all failing at present
Java-side write batch has same format as C++, so can be memcpy-ed (in theory). In practive it can be assign()-ed to an empty write batch in a single copy. In the case of an existing write batch, the std::string representation does not allow us to append() in 1-copy (because we have to copy into a std::string first to call append()) so we adapt what we did before, and step through the Java representation, calling Put() with Slices manufactured from the entries in the buffer. I think it is OK..
Last few changes to native write batch have modified the API slightly. Some of the test configuration is unnecessary or wrong, and the calls for creation are all part of a single class. Update these appropriately.
Translation of Varint32 I had stolen was bugged. Probably my translation is the problem. Use instead the slightly less efficient but more readable and adequately fast version. Fix the Varint32 test(s).
We should have added this optimization as part of the last set of changes. When we are calling write(), and the batch has not been created on the native/C++ side, we can use the same optimization as in flush, to wit: `WriteBatchInternal::SetContents(this, slice)` directly (via Append())
Update lists of files includes in builds and tests to correctly list the ones for the native write batch changes.
WriteBatchInternal::AppendContents allows us to append extra batch entries in bulk in a single copy, without the complexity of iteration and individual elements. Hope for improved performance.
WriteBatchInternal::AppendContents is not noticeable better performing, but it is cleaner and we have saved ourselves some code.
build an allocator into the write batch so that it can recycle bytebuffers rather than continually re-allocating them. When passing a direct bytebuffer, slice() it first before passing over JNI. This has a huge performance win, there is something (probing, or god forbid, copying) going on when we get the buffer from JNI. We still need to byte arrays, which are by far the slowest.
Second instance of - When passing a direct bytebuffer, slice() it first before passing over JNI Use critical section instead of env->GetByteArrayElements to avoid a copy of a potentially very large array, only part of which we may need. It has drawbacks, but for demonstrating performance and working round the byte[] version’s performance bug, it is invaluable.
1 operation now consists of writing a single batch of the required size. Performance is how many batches are written per second.
alanpaxton
force-pushed
the
address-api-javanativewritebatch
branch
from
November 15, 2024 11:59
e17a11c
to
0d409b9
Compare
Closing this PR as we have built something much simpler as a demo/PoC. Will start again as and when we come to productionize. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Write Batch Performance (Java Side Batching)
Performance of the Write Batch APIs from Java have long been understood to be an issue. Typically users of a WriteBatch want to efficiently gather a possibly large number of fairly small updates to the database, and then to apply them all atomically.
The RocksDB Java APIs all have a basic overhead caused by the non-trivial cost of a transition between JVM and C++ code. Our aim here is to replace (or enhance) the current mechanism where every operation on the write batch involves the JNI transition, with one in which many operations can be buffered on the Java side, amortizing the cost of transition over a more significant number of operations. The code we produce at this time will only be a proof of concept; we are not implementing all the WriteBatch operations (and not yet those in WriteBatchWithIndex). We are only implementing Put() operations at this time, and measuring the performance of these.
Note that where we report performance, the numbers are collected on a stock Macbook Pro with 64GB of memory, 2TB of SSD and an M1 Max CPU.
Detailed performance is reported in java/jmh/WriteBatch.md
Obviously we need a baseline to measure our performance improvements. We implemented some new JMH (Java Microbenchmark Harness) tests in the current JMH performance test subproject of RocksJava to measure the current performance. We then implemented a most trivial initial version of Java Side batching to compare performance.
We had the choice of writing the Java side buffers in normal Java memory (byte[] arrays) or using off-heap memory (a direct ByteBuffer). We wrapped the byte-array version in an indirect ByteBuffer so that most code could be shared, and compared the performance.
Having discussed the design of the batch approach, we extended our implementation to include some (possibly) important optimizations. Some details include:
Write operations on the Java side in the same format as on the C++ side, so that data can be bulk copied into C++ native write batches.
Regarding the details of the benchmark results; for comparison purposes with the C++ baseline, we use 16 byte keys, 32 byte values and a 64 kilobyte Java side buffer. We leave the C++ implementation to manage the size of the RocksDB native-side buffer(s). We operate in 2 modes; when writeToDB is true, we perform a DB.write() after numOpsPerBatch put operations have been performed, before closing the write batch. When writeToDB is false we just close the batch (and flush it to Native/C++). In this mode we are ultimately measuring how efficiently we fill a buffer at the Native/C++ side by using the Java WriteBatch API.
What we can see (java/jmh/WriteBatch.md) is that we have improved performance greatly by using the methods including native in their names; these are the tests which use our new implementation(s). Filling buffers is 3-4x faster. The problem lies in the cost of the subsequent write to the RocksDB core database. While our new implementation is still noticeably faster (by about 20%) when we include the final phase of writing to the database, it is somewhat surprising that the benefits are not greater. Since we are not approaching the baseline performance of the C++ benchmark, there may be some issues still to be resolved in finding peak performance. We are reviewing our results..