-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
squarePacked GEMM. #586
base: master
Are you sure you want to change the base?
squarePacked GEMM. #586
Commits on Dec 13, 2021
-
1. In zgemm, mkernel outperforms nkernel for both m > n, and n > m. 2. Irrespective of mu and nu sizes, mkernel is forced for zgemm based on analysis done. Change-Id: Iafb7ddb2519c17cf2225da84d6cc74ed985cc21e AMD-Internal: [CPUPL-1352]
Configuration menu - View commit details
-
Copy full SHA for 231a464 - Browse repository at this point
Copy the full SHA 231a464View commit details -
gemm_sqp(gemm_squarePacked): 3m_sqp and dgemm_sqp
1. SquarePacked algorithm focuses on efficient zgemm/dgemm implementation for square matrix sizes (m=k=n) 2. Variation of 3m algorithm (3m_sqp) is implemented to allow single load and store of C matrix in kernel. 3. Currently the method supports only m multiple of 8. Residues cases to be implemented later. 4. dgemm Real kernel (dgemm_sqp) implementation without alpha, beta multiple is done, since real alpha and beta scaling are in 3m_sqp framework. 5. gemm_sqp supports dgemm when alpha = +/-1.0 and beta = 1.0. Change-Id: I49becaf6079da4be29be5b06057ff4e50770a7d8 AMD-Internal: [CPUPL-1352]
Configuration menu - View commit details
-
Copy full SHA for 0abc674 - Browse repository at this point
Copy the full SHA 0abc674View commit details -
1. Added comments. AMD-Internal: [CPUPL-1429] Change-Id: Ie37e24e58cd8bf836038a2258ebd09c3912fab9e
Configuration menu - View commit details
-
Copy full SHA for 5dc5ffa - Browse repository at this point
Copy the full SHA 5dc5ffaView commit details -
1. bli_malloc modified to normal malloc and address alignment within 3m_sqp. 2. function added to pack A real,imag and sum. 3. function added to pack B real,imag and sum. 4. function added to pack C real,imag and beta handling. 4. sum and sub vectorized. AMD-Internal: [CPUPL-1352] Change-Id: I514e9efb053d529caef2de413d74d0dac2ceca54
Configuration menu - View commit details
-
Copy full SHA for 87c123f - Browse repository at this point
Copy the full SHA 87c123fView commit details -
disabled zgemm induced and gemm sqp temporarily.
1. mx1, mx4 kernel addition and framework modification. 2. 8mx6n kernel addition. 3. NULL check added in dgemm_sqp malloc. 4. mem tracing added. 5. Restricted 3m_sqp to limited matrix sizes. 6. Induced methods disabled temporarily for debug. AMD-Internal: [CPUPL-1352] Change-Id: I31671859b32bfbb359687fb7c9056f9eb904c8b2
Configuration menu - View commit details
-
Copy full SHA for 2bb4e87 - Browse repository at this point
Copy the full SHA 2bb4e87View commit details -
Enabling 3m_sqp and 3m1 methods
1. Re-enabling 3m methods for zgemm. 2. Vectorization of pack_sum routines re-enabled with bug fix. 3. 8mx6n kernel added. AMD-Internal: [CPUPL-1352] Change-Id: Id9f010ba763afc52d268c2e68805f069919b8810
Configuration menu - View commit details
-
Copy full SHA for acfec6a - Browse repository at this point
Copy the full SHA acfec6aView commit details -
squarePacked(sqp) framework and multi-instance handling
1. kx partitions added to k loop for dgemm and zgemm. 2. mx loop based threading model added for dgemm as prototype of zgemm. 3. nx loop added for 3m_sqp and dgemm_sqp. 4. single 3m_sqp workspace allocation with smaller memory footprint. 5. sqp framework done from dgemm and zgemm. 6. sqp kernels moved to seperate kernel file. 7. residue kernel core added to handle mx<8. 8. multi-instance tuning for 3m_sqp done. 9. user can set env "BLIS_MULTI_INSTANCE" to 1 for better multi-instance behavior of 3m_sqp. AMD-Internal: [CPUPL-1521] Change-Id: Ibef50a8a37fe99f164edb4621acb44fc0c86514c
Configuration menu - View commit details
-
Copy full SHA for 74800cf - Browse repository at this point
Copy the full SHA 74800cfView commit details -
3m_sqp conjugate support added
1. 3m_sqp support for A matrix with conjugate_no_transpose and conjugate_transpose added. AMD-Internal: [CPUPL-1521] Change-Id: Ie6e5c49cf86f7d3b95d78705cf445e57f20b3d1f
Configuration menu - View commit details
-
Copy full SHA for 35ad5d8 - Browse repository at this point
Copy the full SHA 35ad5d8View commit details -
Induced method turned off, fix for beta=0 & C = NAN
1. Induced Method turned off, till the path fully tested for different alpha,beta conditions. 2. Fix for Beta =0, and C = NAN done. Change-Id: I5a7bd1393ac245c2ebb72f9a634728af4c0d4000
Configuration menu - View commit details
-
Copy full SHA for 93e3d7a - Browse repository at this point
Copy the full SHA 93e3d7aView commit details
Commits on Dec 15, 2021
-
1. New err_t param in bli_malloc_user added. 2. AOCL_DTL log removed.
Configuration menu - View commit details
-
Copy full SHA for 7cd7968 - Browse repository at this point
Copy the full SHA 7cd7968View commit details -
Configuration menu - View commit details
-
Copy full SHA for 59029ee - Browse repository at this point
Copy the full SHA 59029eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for b3e82ba - Browse repository at this point
Copy the full SHA b3e82baView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f984c5 - Browse repository at this point
Copy the full SHA 0f984c5View commit details