-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clobber list #8
Comments
I've personally switched to modifying LLVM to not need the assembly at all (adding a calling convention, syscall instruction to the ISD tablegen, and modifying the call lowering for it). #define CLOBBER_LIST "memory", "cc", "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5" Should ™️ suffice. AVX and AMX state is explicitly saved and restored by kernel code on use sites https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/floating-point-support-for-64-bit-drivers so in theory it shouldn't matter. |
i understand that x64 driver calling convention has a stricter rule than x64 usermode calling convention. i think im going to stick with x64 usermode calling convention to be safe, since normally syscall goes through ntdll/win32u, and they all use x64 usermode calling convention i did some testing to see which registers are volatile #define ASDF(reg) extern "C" __declspec(dllexport) void test_ ## reg(){asm volatile("":::#reg);}
ASDF(rax)
ASDF(rbx)
ASDF(rcx)
ASDF(rdx)
ASDF(rsi)
ASDF(rdi)
ASDF(rbp)
ASDF(rsp)
ASDF(r8)
ASDF(r9)
ASDF(r10)
ASDF(r11)
ASDF(r12)
ASDF(r13)
ASDF(r14)
ASDF(r15)
ASDF(tmm0)
ASDF(tmm1)
ASDF(tmm2)
ASDF(tmm3)
ASDF(tmm4)
ASDF(tmm5)
ASDF(tmm6)
ASDF(tmm7)
ASDF(xmm0)
ASDF(xmm1)
ASDF(xmm2)
ASDF(xmm3)
ASDF(xmm4)
ASDF(xmm5)
ASDF(xmm6)
ASDF(xmm7)
ASDF(xmm8)
ASDF(xmm9)
ASDF(xmm10)
ASDF(xmm11)
ASDF(xmm12)
ASDF(xmm13)
ASDF(xmm14)
ASDF(xmm15)
ASDF(xmm16)
ASDF(xmm17)
ASDF(xmm18)
ASDF(xmm19)
ASDF(xmm20)
ASDF(xmm21)
ASDF(xmm22)
ASDF(xmm23)
ASDF(xmm24)
ASDF(xmm25)
ASDF(xmm26)
ASDF(xmm27)
ASDF(xmm28)
ASDF(xmm29)
ASDF(xmm30)
ASDF(xmm31)
ASDF(ymm0)
ASDF(ymm1)
ASDF(ymm2)
ASDF(ymm3)
ASDF(ymm4)
ASDF(ymm5)
ASDF(ymm6)
ASDF(ymm7)
ASDF(ymm8)
ASDF(ymm9)
ASDF(ymm10)
ASDF(ymm11)
ASDF(ymm12)
ASDF(ymm13)
ASDF(ymm14)
ASDF(ymm15)
ASDF(ymm16)
ASDF(ymm17)
ASDF(ymm18)
ASDF(ymm19)
ASDF(ymm20)
ASDF(ymm21)
ASDF(ymm22)
ASDF(ymm23)
ASDF(ymm24)
ASDF(ymm25)
ASDF(ymm26)
ASDF(ymm27)
ASDF(ymm28)
ASDF(ymm29)
ASDF(ymm30)
ASDF(ymm31)
ASDF(zmm0)
ASDF(zmm1)
ASDF(zmm2)
ASDF(zmm3)
ASDF(zmm4)
ASDF(zmm5)
ASDF(zmm6)
ASDF(zmm7)
ASDF(zmm8)
ASDF(zmm9)
ASDF(zmm10)
ASDF(zmm11)
ASDF(zmm12)
ASDF(zmm13)
ASDF(zmm14)
ASDF(zmm15)
ASDF(zmm16)
ASDF(zmm17)
ASDF(zmm18)
ASDF(zmm19)
ASDF(zmm20)
ASDF(zmm21)
ASDF(zmm22)
ASDF(zmm23)
ASDF(zmm24)
ASDF(zmm25)
ASDF(zmm26)
ASDF(zmm27)
ASDF(zmm28)
ASDF(zmm29)
ASDF(zmm30)
ASDF(zmm31) i compiled this with -msse2 -mavx2 -march=znver4, and looked at compiled code in ida pro and see if the function save and restore the register inside function body.
is my reasoning correct here? also i dont know which compiler flag is needed to generate amx code that use tmm registers my conclusion is sse2
volatile xmm0-5
non-volatile xmm6-15
avx2
volatile xmm0-5
non-volatile xmm6-15
volatile ymm0-5
non-volatile ymm6-15
avx512
volatile xmm0-5 xmm16-31
non-volatile xmm6-15
volatile ymm0-5 ymm16-31
non-volatile ymm6-15
volatile zmm0-5 zmm16-31
non-volatile zmm6-15 so, this is the clobber list (putting tmm registers here, since i dont know anything about amx, and msdn does say "When AMX support is present, the TMM tile registers are volatile") *no need to check #define CLOBBER_LIST "memory", "cc", "rcx", "r11", "tmm0", "tmm1", "tmm2", "tmm3", "tmm4", "tmm5", "tmm6", "tmm7", \
"zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5", "zmm16", "zmm17", "zmm18", "zmm19", "zmm20", "zmm21", "zmm22", "zmm23", "zmm24", "zmm25", "zmm26", "zmm27", "zmm28", "zmm29", "zmm30", "zmm31", \
"ymm0", "ymm1", "ymm2", "ymm3", "ymm4", "ymm5", "ymm16", "ymm17", "ymm18", "ymm19", "ymm20", "ymm21", "ymm22", "ymm23", "ymm24", "ymm25", "ymm26", "ymm27", "ymm28", "ymm29", "ymm30", "ymm31", \
"xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm16", "xmm17", "xmm18", "xmm19", "xmm20", "xmm21", "xmm22", "xmm23", "xmm24", "xmm25", "xmm26", "xmm27", "xmm28", "xmm29", "xmm30", "xmm31"
this bug has been here since beginning. i use your library with clang since you first posted this on uc many years ago. im guessing the reason you didn't notice it is probably because the hashing you are doing next to the syscall instruction i had to do #define NO_OPTIMIZE(x) asm volatile("" : "+m"(const_cast<std::remove_const_t<std::remove_reference_t<decltype(x)>>&>(x))); now i fixed the clobber list (probably), i finally dont have to use those workaround anymore
ah. so thats why. is that still the case? i feel like having volatile registers inside clobber list is much more readable than having them inside output list. i hope clang has fixed it, i hope
having a compiler intrinsic for syscall would be nice. but i dont think im knowledgeable enough to mess with llvm yet |
-mavx2
exe size increase by 2kb. all non-inlined functions that has syscall instructions push xmm6-15 on stack at beginning of function, pop them at the end of function. no random silent bugs
-mavx2
no change to exe size. no random silent bugs
-mavx2
no change to exe size. but i get a bunch of random silent bugs at code next to syscall instruction. i dont have minimum reproducible examples, slight changes to code the bug is gone, pain to debug
according to msdn
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#callercallee-saved-registers
https://learn.microsoft.com/en-us/cpp/build/x64-software-conventions?view=msvc-170#register-volatility-and-preservation
so my question is. how do i make the
perfect
bug free
future proof
clobber list with this? how do i clobber only upper half of a register? (also i might try-mavx512vf
if i some day get a new cpu that support avx512)The text was updated successfully, but these errors were encountered: