Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教小彭老师一下,这个函数怎么用avx2优化比较好 #6

Open
lin0ww0nil opened this issue Sep 14, 2023 · 4 comments
Open

Comments

@lin0ww0nil
Copy link

lin0ww0nil commented Sep 14, 2023

saunlesuanle

@archibate
Copy link
Contributor

for (int j = 0; j < height; j++) {
            dst[0] = col_0[height - 1 - j];
            if (2 == td) {
                dst[1] = col_1_td2[height - 1 - j];
            }

            if ((3 + 4 * j) < width) {
                memcpy(dst + td, ref_left + (4 * (height - 1) + rem_rl - 1) - (4 * j + rem_rl - 1), (rem_rl + 4 * j) * sizeof(s16));
                memcpy(dst + 3 + 4 * j, ref_above, (width - (3 + 4 * j)) * sizeof(s16));
            }
            else {
                // w - 3
                memcpy(dst + td, ref_left + (4 * (height - 1) + rem_rl - 1) - (4 * j + rem_rl - 1), (width - td) * sizeof(s16));
            }

            dst += i_dst;
        }

这里循环内都是的memcpy到dst + rd,确定是正确的?

@lin0ww0nil
Copy link
Author

lin0ww0nil commented Sep 18, 2023

saunlesuanle

@archibate
Copy link
Contributor

是td。td是在这个for循环内是常量,为什么要多次重复拷贝进同一个dst + td?我的测试显示60%的时间花在这三个memcpy里。

@lin0ww0nil
Copy link
Author

lin0ww0nil commented Sep 18, 2023

saunlesuanle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants