-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engineccl: Add benchmark for ctr_stream encryption #113999
Conversation
"Before" numbers on a gceworker:
On an m2 macbook it's about twice as fast, 900 MB/s. I have a quick win for an 18% improvement (coming in a separate PR so we can get the before numbers into roachperf), but the real prize will be making the larger batch sizes useful. We can compare to
We're not far behind with the 16 byte block size, but openssl is able to take advantage of larger blocks and reach 10x the speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also benchmark fileCipherStream.Encrypt
, or alternatively benchmark that instead of newCTRBlockCipherStream
(though that won't cover the read path -- but reads often hit in the block cache, so maybe that is ok). It is a thin wrapper that does the iteration over chunks of size ctrBlockSize
that you are doing here. Also, it includes some copying when the data is not completely aligned at the end with the ctrBlockSize
, which is important to include in the cost when trying to increase block size.
We are using 32KB blocks in Pebble, but that is uncompressed bytes, and that is what fileCipherStream.Encrypt
will see after compression -- I am not sure what compression ratio we actually see (I vaguely remember seeing ~3x in some sstables, but I could be misremembering).
Hmm, looks like we typically do two writes, one for the block and one for the trailer (5 bytes) in https://github.com/cockroachdb/pebble/blob/81d9a4fea01c84f2213cf68ccdc886dc2febc15f/sstable/writer.go#L1788-L1794.
That suggests we should benchmark the write path at an even higher level, by writing an sstable. We could set block size to be 16KB (to approximate 2x compression), turn off compression, and vary the key and value sizes so each block is not exactly 16KB.
Reviewable status: complete! 1 of 0 LGTMs obtained
Ah yes, fileCipherStream is a better benchmark target. I skipped over it because I didn't want to deal with files but it's actually the level I want. I want to keep this benchmark pretty low-level for simplicity instead of benchmarking the entire sst write process. It's true that consolidating those two writes would help a bit but an extra encryption call every 16KB won't make much of a difference. The One more benchmark result: the FIPS-ready build is very slow, only 68 MB/s:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIPS-ready build is very slow, only 68 MB/s
That is crazy slow. Do you know why?
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)
Yes, it's because it crosses the cgo boundary every 16 bytes, for something that should be a single CPU instruction. |
Rewriting the benchmark to use
I've also rebased onto #114709 so I can verify FIPS status. Only the last commit is for this PR. |
41fa7a8
to
f0f8bb1
Compare
Start measuring performance of this code in anticipation of improving it. Epic: none Release note: None
f0f8bb1
to
d3077b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)
bors r=sumeerbhola |
Build succeeded: |
Start measuring performance of this code in anticipation of improving it.
Epic: none
Release note: None