-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gfx906 (AMD MI60) is failing on run_and_save_benchmarks.sh and llama.cpp #180
Comments
@lamikr , But I am not sure how to fix my issue above. Please, let me know if you have time to review this today. |
Hi, unfortunately I do not have myself the gfx906 for debug, so I only added added some patches that would be needed at least to get it build and start testing and added it's support as an experimental. About your error, I have not never seen that kind of error, but it could be some kind of misconfiguration in rocBLAS related to src_projects/rocBLAS/library/src/blas3/Tensile/Logic/asm_full/vega10/vega10_Cijk_Alik_Bljk_HB_GB.yaml But let's try to check first couple of basic issues step by step so I get basic info.
/opt/rocm_sdk_612/docs/examples/hipcc/hello_world |
Hello @lamikr,
|
tests:
|
Opencl test:
|
by the way, gfx906 has 'Vega 20' GPUs, but not 'Vega 10' GPUs. Not sure if some instruction that does not exist in gfx906 is being called from llama.cpp. |
Here is the app crash log :
|
Based on app crash logs, I see that rocm is not able to find the symbol table 'No symbol table info available.' Not sure what that means. Let me know. Thanks! |
Thanks, good to see that the the basic applications works. I will start my gfx906 build and try to check if I can figure out fix for those build errors with llama.cpp. |
Thank you! Looking forward to your updates. |
Hi @lamikr,
I built rocm_sdk_builder on a freshly installed Ubuntu 24.04.1. It took 5 hours, 120GB of storage and many hours of fixing small issues during building the repo (reference: #175).
Also, I chose gfx906 from
./babs.sh -c
.When I ran
./run_and_save_benchmarks.sh
, I got this message.Note the error at the bottom 'Cannot find Symbol with name'. I thought this would not be an issue with llama.cpp.
However, I got a similar error in llama.cpp as well (I built it using
./babs.sh -b binfo/extra/ai_tools.blist
).llama.cpp is failing with a similar error. Note that this llama.cpp worked with the CPU when I do not set the ngl parameter (layer offloading). Please let me know if there is a fix.
The text was updated successfully, but these errors were encountered: