Speed up searches by removing repeated memsets coming from vec.resize() #2327

RaphaelMarinier · 2024-03-12T16:54:46Z

Also, reserve exactly the size needed, which is surprisingly needed to get the full speedup of ~5% on a good fraction of the queries.

The memset (accounting for 4% of the CPU cycles) in the range_numeric query is indeed gone with this PR, see flame graphs:

Overall, it makes things a few percent faster on many queries:

Note: I made sure to use the same version of tantivy for the baseline and for this proposed change. The only difference between the two columns is this PR.

It would be possible to have this speedup without using unsafe by never truncating it (as is done in filter_vec_in_place()) and instead passing the number of final elements in the vector back to VecCursor.

Also, reserve exactly the size needed, which is surprisingly needed to get the full speedup of ~5% on a good fraction of the queries.

adamreichold · 2024-03-12T18:59:28Z

bitpacker/src/bitpacker.rs

+        positions.reserve_exact(id_range.len());
+        #[allow(clippy::uninit_vec)]
+        unsafe {
+            positions.set_len(id_range.len());


I am pretty sure this is undefined behaviour due to the validity requirements on integers which newly allocated memory does not fulfil.

If you want to apply this optimization, get_batch_u32s needs to work with the return value of Vec::spare_capacity_mut to write into the MaybeUninit<u32> values and only call set_len after all of these values have been initialized.

Indeed, thanks for pointing this out, I see that in rust-lang/unsafe-code-guidelines#439 it was decided that the mere existence of an uninitialized integer is undefined behavior. There is also an interesting discussion on that here where it was controversially concluded that it was UB.

The trick is the same as in quickwit-oss/quickwit#4712, so we should probably decide whether we do neither or both. @fulmicoton

For that quickwit PR, I see that read_buf (see rfc, function doc) could be used, but it's not stable unfortunately.

For this PR, note that this vector ends up being filled by AVX2 instructions, so what you propose does not seem possible: we would either to take a performance hit by storing the output of the AVX2 computation to a temporary variable and write() it into the uninitialized vector, or use MaybeUninit.as_mut_ptr which would lead us back to UB. Or do you have an efficient suggestion to make it work with the AVX instructions?

so we should probably decide whether we do neither or both. @fulmicoton

I'd say we should do it for one and not for the other.
For the Read side, the one here is simply not needed. We also own the bitpacking crate. We can just add a method that takes a MaybeUninit.

For the Read trait, this is well known problem that has a workaround but it is not stabilized yet.
The problem is that if a Read implementation were to read the buffer we would be in big troubles.

We put that in a place where the Read impl is well defined so I think we are ok.

fulmicoton · 2024-03-13T00:53:39Z

I think there is a problem in the benchmark. That code is not used in most of the queries showing an improvements.

Speed up searches by removing repeated memsets coming from vec.resize()

0890503

Also, reserve exactly the size needed, which is surprisingly needed to get the full speedup of ~5% on a good fraction of the queries.

RaphaelMarinier requested a review from fulmicoton March 12, 2024 17:04

RaphaelMarinier marked this pull request as ready for review March 12, 2024 17:04

adamreichold reviewed Mar 12, 2024

View reviewed changes

RaphaelMarinier marked this pull request as draft April 22, 2024 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up searches by removing repeated memsets coming from vec.resize() #2327

Speed up searches by removing repeated memsets coming from vec.resize() #2327

RaphaelMarinier commented Mar 12, 2024 •

edited

Loading

adamreichold Mar 12, 2024

RaphaelMarinier Mar 12, 2024

fulmicoton Mar 13, 2024

fulmicoton commented Mar 13, 2024

Speed up searches by removing repeated memsets coming from vec.resize() #2327

Are you sure you want to change the base?

Speed up searches by removing repeated memsets coming from vec.resize() #2327

Conversation

RaphaelMarinier commented Mar 12, 2024 • edited Loading

adamreichold Mar 12, 2024

Choose a reason for hiding this comment

RaphaelMarinier Mar 12, 2024

Choose a reason for hiding this comment

fulmicoton Mar 13, 2024

Choose a reason for hiding this comment

fulmicoton commented Mar 13, 2024

RaphaelMarinier commented Mar 12, 2024 •

edited

Loading