-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on Windows is worse than prepint #202
Comments
I tried to use spral_ssids.exe for these two matrix just now. Console show ND/nd3k and PARSEC/Si10H16 time is ~0.5s/1.1s respectively. Below are logging from console: D:\Desktop\111\bin> ./spral_ssids Si10H16.rb --scale=auction --nrhs 1 |
The precompiled SSIDS binaries are not optimised and are not intended to be, that would be impossible. Please compile your own version of SPRAL If you would like optimised performance, then you can use optimised BLAS/LAPACK etc. |
Thanks for reply! I will try to use menson later. But why the performance given by the executable spral_ssids.exe is much better? I think that exe is using the same dll used by MSVC |
Indeed that is very strange, is It possible that they are using different default SSIDS options/settings? |
I used the default setting both for MSVC and spral_ssids.exe. Attached is my code: |
Right but are you sure the default options are the same for both? In the |
I tried again by not passing --scale=aution, time consumption is almost the same. Below are two test run under R5-5600X. Factorize using MSVC is still much slower than exe on this computer(~4.5s). PS E:\test> .\spral_ssids.exe Si10H16.rb |
Interesting, the |
Note that the library |
The blas provided by precompiled lib is openblas, which will also be used by spral_ssids.exe |
I am using the lateset precompiled binary of SSIDS. I tested matrix ND/nd3k and PARSEC/Si10H16 and fact+solve time are around 1.5s/4.5s respectively. But in the SSDIS prepint A Sparse symmetric indefinite direct solver for GPU architectures, these two matrix can be solved in less than 0.5 second with 2 E5-2687W. I am using i9-14900K and all cores are at full load during runnning. Considering this is a 10 years later CPU with 24 cores, I think time consumption should be less.
Can someone help to test same matrix on similar platform? Thanks!
The text was updated successfully, but these errors were encountered: