Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unique vectorization #5092

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

AlexGuteniev
Copy link
Contributor

Not really unique, modelled on #4987

⏬ Double load

To compare adjacent values, the same memory is loaded twice with an element shift.

It is possible to reuse the previous vector part, and mix it with the current, to save one load, but have some extra instructions to mix values, and a loop-carried dependency. On SSE path it is possible with _mm_alignr_epi8 (except for 8-bit elements). For AVX it would be way more complex due to AVX lanes.

Benchmarking shows that double load is faster than any reuse attempt. To some extent such a result overlaps with #4958

⏱️ Benchmark results

Benchmark main this
u<alg_type::std_fn, std::uint8_t> 1166 ns 190 ns
u<alg_type::std_fn, std::uint16_t> 1222 ns 247 ns
u<alg_type::std_fn, std::uint32_t> 1555 ns 310 ns
u<alg_type::std_fn, std::uint64_t> 1470 ns 665 ns
u<alg_type::rng, std::uint8_t> 1230 ns 187 ns
u<alg_type::rng, std::uint16_t> 1204 ns 233 ns
u<alg_type::rng, std::uint32_t> 1268 ns 308 ns
u<alg_type::rng, std::uint64_t> 1505 ns 665 ns

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner November 16, 2024 20:59
@StephanTLavavej StephanTLavavej added the performance Must go faster label Nov 16, 2024
@StephanTLavavej StephanTLavavej self-assigned this Nov 16, 2024
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
Less error prone, especially if implementing _copy someday
@CaseyCarter
Copy link
Member

Not really unique, modelled on #4987

🤦

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Status: Initial Review
Development

Successfully merging this pull request may close these issues.

3 participants