Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial gfx1036 and gfx1103 support #111

Merged
merged 2 commits into from
Jul 17, 2024
Merged

Commits on Jul 16, 2024

  1. initial gfx1036 and gfx1103 support

    - initial support for gfx1036 and gfx1103 as a build target
    - updated also the gfx1010 configuration settings to be
      more similar in composable kernel and miopen
    
    fixes: #101
    fixes: #103
    
    Signed-off-by: Mika Laitio <[email protected]>
    lamikr committed Jul 16, 2024
    Configuration menu
    Copy the full SHA
    989fdf5 View commit details
    Browse the repository at this point in the history

Commits on Jul 17, 2024

  1. initial rocBLAS logic files for iGPUs

    - add initial rocBLAS logic files for
      rembrandt (gfx1035), raphael (gfx1036)
      and phoenix (gfx1103) iGPUs.
    - when testing with the
      https://github.com/LeiWang1999/rocblas-benchmark
      by using the std::make_tuple(8192, 8192, 8192, false, false, enable_tune),
      the speedup was about 4-5x.
    - gfx1035 without logic files
    
    Device 0: AMD Radeon Graphics
    m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec)
    8192,8192,8192,n,n,0,912.287,814.502,854.257,865.103
    
    - gfx1035 with logic files
    
    Device 0: AMD Radeon Graphics
    m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec)
    8192,8192,8192,n,n,0,652.499,834.796,237.42,189.945
    
    - gfx1103 without logic files
    Device 0: AMD Radeon 780M
    m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec)
    8192,8192,8192,n,n,0,916.684,820.721,823.48,1018.46
    
    - gfx1103 with logic files
    ROCR_VISIBLE_DEVICES="1" ./rocblas_benchmark
    Device 0: AMD Radeon 780M
    m,n,k,a_t,b_t,enable_tune,fp32 time (msec),fp16-f32 time (msec), f16-f16 time (msec), int8-int32 time (msec)
    8192,8192,8192,n,n,0,1346.02,634.836,193.613,119.29
    
    Signed-off-by: Mika Laitio <[email protected]>
    lamikr committed Jul 17, 2024
    Configuration menu
    Copy the full SHA
    d660777 View commit details
    Browse the repository at this point in the history