DO NOT MERGE - Experimental Java Native Cached Write Batch #13115

alanpaxton · 2024-11-05T10:33:08Z

Write Batch Performance (Java Side Batching)
Performance of the Write Batch APIs from Java have long been understood to be an issue. Typically users of a WriteBatch want to efficiently gather a possibly large number of fairly small updates to the database, and then to apply them all atomically.

The RocksDB Java APIs all have a basic overhead caused by the non-trivial cost of a transition between JVM and C++ code. Our aim here is to replace (or enhance) the current mechanism where every operation on the write batch involves the JNI transition, with one in which many operations can be buffered on the Java side, amortizing the cost of transition over a more significant number of operations. The code we produce at this time will only be a proof of concept; we are not implementing all the WriteBatch operations (and not yet those in WriteBatchWithIndex). We are only implementing Put() operations at this time, and measuring the performance of these.

Note that where we report performance, the numbers are collected on a stock Macbook Pro with 64GB of memory, 2TB of SSD and an M1 Max CPU.

Detailed performance is reported in java/jmh/WriteBatch.md

Obviously we need a baseline to measure our performance improvements. We implemented some new JMH (Java Microbenchmark Harness) tests in the current JMH performance test subproject of RocksJava to measure the current performance. We then implemented a most trivial initial version of Java Side batching to compare performance.

We had the choice of writing the Java side buffers in normal Java memory (byte[] arrays) or using off-heap memory (a direct ByteBuffer). We wrapped the byte-array version in an indirect ByteBuffer so that most code could be shared, and compared the performance.

Having discussed the design of the batch approach, we extended our implementation to include some (possibly) important optimizations. Some details include:
Write operations on the Java side in the same format as on the C++ side, so that data can be bulk copied into C++ native write batches.

Buffers are allocated with a fixed, known (parameterized) size on the Java side to allow tight control.
Buffers are flushed to C++ (a single JNI operation) when full.
Buffers are flushed to C++ as part of a single JNI operation when passed to DB.Write() to write the batch to the database.
Large Put() operations (> 50% of the buffer size) are not copied to the buffer, rather the buffer is flushed and the new data is transferred directly over JNI by flushing the cache and putting the large record.
Native C++ Write Batch is not created until the first flush() or DB.write() that crosses the JNI boundary. So creation at the Java API side is very lightweight.
The first flush/write over the JNI boundary uses WriteBatchInternal::SetContents to set the native batch to the batch supplied from Java using a single bulk copy. Subsequent flush/write operations incrementally Put() slices composed from references to the contents of the Java-supplied buffer into the native C++ buffer; this appears to be the only way to extend the std::string-based write batch while copying each piece of data only once; creating another Write Batch, then appending it to the existing one, would require 2 bulk copies.

Regarding the details of the benchmark results; for comparison purposes with the C++ baseline, we use 16 byte keys, 32 byte values and a 64 kilobyte Java side buffer. We leave the C++ implementation to manage the size of the RocksDB native-side buffer(s). We operate in 2 modes; when writeToDB is true, we perform a DB.write() after numOpsPerBatch put operations have been performed, before closing the write batch. When writeToDB is false we just close the batch (and flush it to Native/C++). In this mode we are ultimately measuring how efficiently we fill a buffer at the Native/C++ side by using the Java WriteBatch API.
What we can see (java/jmh/WriteBatch.md) is that we have improved performance greatly by using the methods including native in their names; these are the tests which use our new implementation(s). Filling buffers is 3-4x faster. The problem lies in the cost of the subsequent write to the RocksDB core database. While our new implementation is still noticeably faster (by about 20%) when we include the final phase of writing to the database, it is somewhat surprising that the benefits are not greater. Since we are not approaching the baseline performance of the C++ benchmark, there may be some issues still to be resolved in finding peak performance. We are reviewing our results..

not working yet - invalid link

comments there for what to pick up on

Extend original native write batch jmh tests to help us compare various write batch implementations. So-far-unused WriteBatchJavaCMem as placeholder for alternative implementation of native write batch.

- Finish making first prototype WriteBatchJavaNative work for put() - buffers put() requests on Java side - applies batched put() requests to C++ write_batch buffer on flush() - RocksDB.write(options, WriteBatchJavaNative) applies batch to DB - Clean up JMH performance tests for WriteBatchJavaNative vs existing WriteBatch

Give this prototype method UnsupportedOperationException on unimplemented methods so that we donm’t waste time investigating why jmh tests that call no-ops are so fast :-(

Pass contents in direct ByteBuffer (WriteBatchJavaNativeDirect) Refactor out from byte[] version (WriteBatchJavaNativeArray) Fix to pre-expand buffers by size rather than re-run insertion on overflow

“flushToDB” option used for stripping out the cost of the DB write at the end.

Refactor array/direct native variants into a single class with different static factories. Add native C++ support methods for creating the write batch on a write() if it doesn’t already exist. Add similar flush methods for creating the write batch on a flush() if it doesn’t already exist. Remove expansion on Java side - buffer is the requested size, behaviour should be to fill that and then flush, OR (TODO) to bypass it entirely (just make diret JNI calls to writebatch as before) for big writes

Refactored previously to remove subclass. So protected methods can now be private.

Bypass the java side buffer when put()sare bigger than 50% of the buffer size - which is controlled at constructor time. Doing a big Put() to this kind of write batch causes the buffer to be flushed (if there is anything in it) and then the Put() is performed directly over the JNI to C++ Note that the WriteBatch on the C++ side is created lazily; this may be on a flush of a batch (buffer is full), or it may be on the write of a batch to a DB, or it may be on a bypass write such as the one implemented here.

A quick step before trying to replicate the layout to the level at which it can be directly memory copied.

Pick up an efficient varint encoder from the interwebs Rewrite the encoding using that TODO - the C++ side, so all failing at present

Java-side write batch has same format as C++, so can be memcpy-ed (in theory). In practive it can be assign()-ed to an empty write batch in a single copy. In the case of an existing write batch, the std::string representation does not allow us to append() in 1-copy (because we have to copy into a std::string first to call append()) so we adapt what we did before, and step through the Java representation, calling Put() with Slices manufactured from the entries in the buffer. I think it is OK..

Last few changes to native write batch have modified the API slightly. Some of the test configuration is unnecessary or wrong, and the calls for creation are all part of a single class. Update these appropriately.

Translation of Varint32 I had stolen was bugged. Probably my translation is the problem. Use instead the slightly less efficient but more readable and adequately fast version. Fix the Varint32 test(s).

We should have added this optimization as part of the last set of changes. When we are calling write(), and the batch has not been created on the native/C++ side, we can use the same optimization as in flush, to wit: `WriteBatchInternal::SetContents(this, slice)` directly (via Append())

Update lists of files includes in builds and tests to correctly list the ones for the native write batch changes.

WriteBatchInternal::AppendContents allows us to append extra batch entries in bulk in a single copy, without the complexity of iteration and individual elements. Hope for improved performance.

WriteBatchInternal::AppendContents is not noticeable better performing, but it is cleaner and we have saved ourselves some code.

build an allocator into the write batch so that it can recycle bytebuffers rather than continually re-allocating them. When passing a direct bytebuffer, slice() it first before passing over JNI. This has a huge performance win, there is something (probing, or god forbid, copying) going on when we get the buffer from JNI. We still need to byte arrays, which are by far the slowest.

Second instance of - When passing a direct bytebuffer, slice() it first before passing over JNI Use critical section instead of env->GetByteArrayElements to avoid a copy of a potentially very large array, only part of which we may need. It has drawbacks, but for demonstrating performance and working round the byte[] version’s performance bug, it is invaluable.

1 operation now consists of writing a single batch of the required size. Performance is how many batches are written per second.

alanpaxton · 2024-11-19T16:52:16Z

Closing this PR as we have built something much simpler as a demo/PoC. Will start again as and when we come to productionize.

facebook-github-bot added the CLA Signed label Nov 5, 2024

alanpaxton marked this pull request as draft November 5, 2024 10:33

alanpaxton and others added 28 commits November 15, 2024 11:59

wip of native write batch

e29730f

not working yet - invalid link

I am an idiot.

3df3191

Make dispose work. Basic passing put() to default CF

69d94f4

C++ refactor to make CF extension easier

e0daab8

refactor WriteBatchJavaNative C++ side to use exceptions

d5070f7

Pause wip

5e361f1

comments there for what to pick up on

Add Put with CF, fix test.

ba3e641

Add JMH test for write batch.

56d5964

Update jmh benchmarks for native write batch

141d306

Extend original native write batch jmh tests to help us compare various write batch implementations. So-far-unused WriteBatchJavaCMem as placeholder for alternative implementation of native write batch.

WriteBatchJavaNative UnsupportedOperationException

abf3913

Give this prototype method UnsupportedOperationException on unimplemented methods so that we donm’t waste time investigating why jmh tests that call no-ops are so fast :-(

Re-add missed jmh native write batch benchmark

5e4876b

Comment out benchmark which uses unimplemented API

5dbc3e9

Remarks on benchmarking JavaNative write batch

f6eb81a

More write batch jmh benchmark runs

7c9343b

jmh write batch add commentary

14c42b1

Direct buffer version of write batch java native

7c37861

Pass contents in direct ByteBuffer (WriteBatchJavaNativeDirect) Refactor out from byte[] version (WriteBatchJavaNativeArray) Fix to pre-expand buffers by size rather than re-run insertion on overflow

jmh writebatch “don’t write the batch to DB” option

4f171a8

“flushToDB” option used for stripping out the cost of the DB write at the end.

Java write batch Flush vs not flush performance

02ac2ef

java native write batch - native methods private

513f7e4

Refactored previously to remove subclass. So protected methods can now be private.

java native write batch - re-order custom layout

1bc7551

A quick step before trying to replicate the layout to the level at which it can be directly memory copied.

java native write batch - change to C++ encoding

603f610

Pick up an efficient varint encoder from the interwebs Rewrite the encoding using that TODO - the C++ side, so all failing at present

Update jmh w/ bumped RocksDB version

c356592

Re-make jmh tests consistent w/ native write batch

8286f56

Last few changes to native write batch have modified the API slightly. Some of the test configuration is unnecessary or wrong, and the calls for creation are all part of a single class. Update these appropriately.

java native write batch fix varint32

79dfe64

Translation of Varint32 I had stolen was bugged. Probably my translation is the problem. Use instead the slightly less efficient but more readable and adequately fast version. Fix the Varint32 test(s).

alanpaxton added 23 commits November 15, 2024 11:59

New test was incorrect.

7cf4a7e

Update results

c5ea199

Repair build config files

90c320f

Update lists of files includes in builds and tests to correctly list the ones for the native write batch changes.

Add method to bulk append part of batch as a slice

8ce88ec

WriteBatchInternal::AppendContents allows us to append extra batch entries in bulk in a single copy, without the complexity of iteration and individual elements. Hope for improved performance.

Remove obsolete code

c16b604

Add jmh run for WriteBatchInternal::AppendContents

6789ff9

WriteBatchInternal::AppendContents is not noticeable better performing, but it is cleaner and we have saved ourselves some code.

Used u_intX_t rather than uintX_t, format fixes

dcf11ad

More benchmarks run

99013f8

Best? options for native write batch perf test

0a94207

fix names shadowing globals

2d3c608

More results.

7a833d6

Format fixes

c6d885e

New file misnamed in CMake for Java

0c14efb

Fix format and non-ASCII characters

8d3e486

Fix/Suppress PMD in low level stuff for writebatch

65f282c

File in wrong CMake section

eb66c81

words

ebbdbb7

Revise write batch jmh operations

d36026e

1 operation now consists of writing a single batch of the required size. Performance is how many batches are written per second.

Daft bug in benchmark for byte[] write batch

b4cfcb8

Configure “don’t buffer this put” cutoff size

0d409b9

alanpaxton force-pushed the address-api-javanativewritebatch branch from e17a11c to 0d409b9 Compare November 15, 2024 11:59

alanpaxton closed this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE - Experimental Java Native Cached Write Batch #13115

DO NOT MERGE - Experimental Java Native Cached Write Batch #13115

alanpaxton commented Nov 5, 2024

alanpaxton commented Nov 19, 2024

DO NOT MERGE - Experimental Java Native Cached Write Batch #13115

DO NOT MERGE - Experimental Java Native Cached Write Batch #13115

Conversation

alanpaxton commented Nov 5, 2024

alanpaxton commented Nov 19, 2024