Skip to content

Commit

Permalink
Improve performance of streamvbyte_validate_stream
Browse files Browse the repository at this point in the history
AppleClang 16 doesn't appear to be clever enough to transform the loop
into a branchless one, so help it out by adding a loop that it knows
how to unroll.

Also do the same for streamvbyte_validate_stream_0124 though I haven't
tested the performance of that.
  • Loading branch information
blawrence-ont committed Dec 6, 2024
1 parent 13315d5 commit 72d911f
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 2 deletions.
14 changes: 13 additions & 1 deletion src/streamvbyte_0124_decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -200,9 +200,21 @@ bool streamvbyte_validate_stream_0124(const uint8_t *in, size_t inCount,

// Accumulate the key sizes in a wider type to avoid overflow
const uint8_t *keyPtr = in;
uint64_t encodedSize = 0;

// Give the compiler a hint that it can avoid branches in the inner loop
for (uint32_t c = 0; c < outCount / 4; c++) {
uint32_t key = *keyPtr++;
for (uint8_t shift = 0; shift < 8; shift += 2) {
const uint8_t code = (key >> shift) & 0x3;
encodedSize += (1 << code) >> 1;
}
}
outCount &= 3;

// Process the remainder one at a time
uint8_t shift = 0;
uint32_t key = *keyPtr++;
uint64_t encodedSize = 0;
for (uint32_t c = 0; c < outCount; c++) {
if (shift == 8) {
shift = 0;
Expand Down
14 changes: 13 additions & 1 deletion src/streamvbyte_decode.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,21 @@ bool streamvbyte_validate_stream(const uint8_t *in, size_t inCount,

// Accumulate the key sizes in a wider type to avoid overflow
const uint8_t *keyPtr = in;
uint64_t encodedSize = 0;

// Give the compiler a hint that it can avoid branches in the inner loop
for (uint32_t c = 0; c < outCount / 4; c++) {
uint32_t key = *keyPtr++;
for (uint8_t shift = 0; shift < 8; shift += 2) {
const uint8_t code = (key >> shift) & 0x3;
encodedSize += code + 1;
}
}
outCount &= 3;

// Process the remainder one at a time
uint8_t shift = 0;
uint32_t key = *keyPtr++;
uint64_t encodedSize = 0;
for (uint32_t c = 0; c < outCount; c++) {
if (shift == 8) {
shift = 0;
Expand Down

0 comments on commit 72d911f

Please sign in to comment.