Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSE/AVX code optimization #2

Open
wants to merge 5 commits into
base: boinc
Choose a base branch
from
Open

SSE/AVX code optimization #2

wants to merge 5 commits into from

Conversation

troosh
Copy link

@troosh troosh commented Jan 12, 2018

I tried to optimize SSE/AVX version of MovePairSearch::MovePairSearch().
However, I'm not sure about the correctness of the work (why does WU for test something does, but not find anything?).

@sirzooro
Copy link
Owner

Thanks for your contribution!.

Mask uses 9 bits, so after packing vector elements to int8 you are loosing one bit. You can fix this by calling _mm_packs_epi16 on result of _mm_cmpeq_epi16.

BTW, I have few other optimizations waiting on my PC, which are not pushed here yet. One of them was to change type of elements in squareA_MaskT to uint16_t, so one SSE vector can hold whole row, like AVX2 does now. By looking on your changes I have realized that code can be optimized further, by using packs instruction. Thanks again!

@troosh
Copy link
Author

troosh commented Jan 12, 2018

It's a pity that you did not allow to create Issues in the repository. So I'll write here, sorry.

It would be useful to use a WUs from https://github.com/sirzooro/RakeSearch/releases/download/v1.0/test.tgz how default workunit with a script to check. Well, or ask @CrystalFrost about this.

@sirzooro
Copy link
Owner

I have enabled issues, they were disabled by default (probably inherited this when I forked this repo).

I also pushed all my new changes on branch optimizations2. Please take a look, to avoid duplicate work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants