Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to open hip GPU device (gfx1030) on 4.0.x branch. #2238

Closed
powderluv opened this issue Feb 19, 2021 · 5 comments
Closed

unable to open hip GPU device (gfx1030) on 4.0.x branch. #2238

powderluv opened this issue Feb 19, 2021 · 5 comments

Comments

@powderluv
Copy link

powderluv commented Feb 19, 2021

See corresponding bug here: ROCm/aomp#187

Based on the guidance there I was able to verify rocr with rocm_bandwidth_test.

HIP fails its directed tests with:

5950x:~/github/hip-on-vdi/b$ gdb ./directed_tests/deviceLib/hipTestDevice

(gdb) r
Starting program: /home/foo/github/hip-on-vdi/b/directed_tests/deviceLib/hipTestDevice 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffef28e700 (LWP 2998742)]

Thread 1 "hipTestDevice" received signal SIGSEGV, Segmentation fault.
0x00007ffff7c5b508 in ?? () from /home/foo/rocm/aomp/lib/libamdhip64.so.4
(gdb) bt
#0  0x00007ffff7c5b508 in ?? () from /home/foo/rocm/aomp/lib/libamdhip64.so.4
#1  0x00007ffff7c69180 in hipMalloc () from /home/foo/rocm/aomp/lib/libamdhip64.so.4
#2  0x0000000000401f5f in run_sincosf() ()
#3  0x0000000000406326 in main ()

@powderluv
Copy link
Author

on further debugging looks like the Device context is not populated:

Thread 1 "hipTestDevice" received signal SIGSEGV, Segmentation fault.
hip::Device::asContext (this=0x0) at /home/foo/github/hip-on-vdi/rocclr/hip_internal.hpp:185
185 amd::Context* asContext() const { return context_; }
(gdb) bt
#0 hip::Device::asContext (this=0x0) at /home/foo/github/hip-on-vdi/rocclr/hip_internal.hpp:185
#1 0x00007ffff7c16744 in ihipMalloc (ptr=0x7fffffffda50, sizeBytes=2048, flags=0) at /home/foo/github/hip-on-vdi/rocclr/hip_memory.cpp:111
#2 0x00007ffff7c19832 in hipMalloc (ptr=0x7fffffffda50, sizeBytes=2048) at /home/foo/github/hip-on-vdi/rocclr/hip_memory.cpp:248
#3 0x0000000000401f5f in run_sincosf() ()
#4 0x0000000000406326 in main ()
(gdb) dis
disable disassemble disconnect display
(gdb) disassemble
Dump of assembler code for function hip::Device::asContext() const:
0x00007ffff7bc7050 <+0>: push %rbp
0x00007ffff7bc7051 <+1>: mov %rsp,%rbp
0x00007ffff7bc7054 <+4>: mov %rdi,-0x8(%rbp)
0x00007ffff7bc7058 <+8>: mov -0x8(%rbp),%rax
=> 0x00007ffff7bc705c <+12>: mov 0x68(%rax),%rax
0x00007ffff7bc7060 <+16>: pop %rbp
0x00007ffff7bc7061 <+17>: retq
End of assembler dump.

@powderluv
Copy link
Author

This happens with Tensile (gfx10) branch too:

5950x:~/github/Tensile/build$ /home/foo/github/Tensile/build/0_Build/client/tensile_client --config-file /home/foo/github/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/build/../source/ClientParameters.ini
loading config file /home/foo/github/Tensile/build/1_BenchmarkProblems/Cijk_Ailk_Bljk_SB_00/00_BF/build/../source/ClientParameters.ini
terminate called after throwing an instance of 'std::runtime_error'
what(): Error 100(hipErrorNoDevice) /home/foo/github/Tensile/Tensile/Source/client/main.cpp:243:
hipGetDeviceCount(&deviceCount)
hipErrorNoDevice

Aborted (core dumped)

looks like rocclr is not initializing the context (?)

@xuhuisheng
Copy link

hipErrorNoDevice means /opt/rocm/bin/rocminfo returns no devices.
You could run dmesg | kfd and dmesg | amdgpu to found if there is error info.

@powderluv
Copy link
Author

powderluv commented Feb 20, 2021

Thanks for the quick response. It is found ok.

5950x:~/github/Tensile/build$ sudo dmesg | grep kfd
[ 8420.665823] kfd kfd: Allocated 3969056 bytes on gart
[ 8420.666409] kfd kfd: added device 1002:73bf

5950x:~/github/Tensile/build$ sudo dmesg | grep amdgpu
[ 8417.808502] [drm] amdgpu kernel modesetting enabled.
[ 8417.808651] amdgpu: Ignoring ACPI CRAT on non-APU system
[ 8417.808658] amdgpu: Topology: Add CPU node
[ 8417.808741] amdgpu 0000:31:00.0: enabling device (0006 -> 0007)
[ 8417.808773] amdgpu 0000:31:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 8417.810521] amdgpu 0000:31:00.0: amdgpu: Fetched VBIOS from VFCT
[ 8417.810522] amdgpu: ATOM BIOS: 113-E438XTX-UO2
[ 8417.810549] amdgpu 0000:31:00.0: amdgpu: HBM ECC is not presented.
[ 8417.810550] amdgpu 0000:31:00.0: amdgpu: SRAM ECC is not presented.
[ 8417.810558] amdgpu 0000:31:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 8417.810560] amdgpu 0000:31:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 8417.810561] amdgpu 0000:31:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 8417.810623] [drm] amdgpu: 16368M of VRAM memory ready
[ 8417.810625] [drm] amdgpu: 16368M of GTT memory ready.
[ 8420.558322] amdgpu 0000:31:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x0000003b, smu fw version = 0x003a3100 (58.49.0)
[ 8420.558330] amdgpu 0000:31:00.0: amdgpu: SMU driver if version not matched
[ 8420.558340] amdgpu 0000:31:00.0: amdgpu: use vbios provided pptable
[ 8420.630798] amdgpu 0000:31:00.0: amdgpu: SMU is initialized successfully!
[ 8420.635610] snd_hda_intel 0000:31:00.1: bound 0000:31:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 8420.666407] amdgpu: Topology: Add dGPU node [0x73bf:0x1002]
[ 8420.666411] amdgpu 0000:31:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, active_cu_number 80
[ 8420.666575] amdgpu 0000:31:00.0: [drm] Cannot find any crtc or sizes
[ 8420.666669] amdgpu 0000:31:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 8420.666671] amdgpu 0000:31:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 8420.666672] amdgpu 0000:31:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 8420.666673] amdgpu 0000:31:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 8420.666673] amdgpu 0000:31:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 8420.666674] amdgpu 0000:31:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 8420.666675] amdgpu 0000:31:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 8420.666676] amdgpu 0000:31:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 8420.666676] amdgpu 0000:31:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 8420.666677] amdgpu 0000:31:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 8420.666678] amdgpu 0000:31:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 8420.666679] amdgpu 0000:31:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 8420.666680] amdgpu 0000:31:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[ 8420.666680] amdgpu 0000:31:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[ 8420.666681] amdgpu 0000:31:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ 8420.666682] amdgpu 0000:31:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ 8420.666683] amdgpu 0000:31:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ 8420.666684] amdgpu 0000:31:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[ 8420.666685] amdgpu 0000:31:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[ 8420.666685] amdgpu 0000:31:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[ 8420.666686] amdgpu 0000:31:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[ 8420.675438] [drm] Initialized amdgpu 3.40.0 20150101 for 0000:31:00.0 on minor 1

I am also able to run rocr benchmark tests ( Posted here: ROCm/aomp#187) . Something is broken with rocclr in the hip-on-vdi path.

(pyenv_3.8)@5950x:~/github/Tensile/build$  /opt/rocm/bin/rocminfo 
ROCk module is loaded
Able to open /dev/kfd read-write
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 5950X 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 5950X 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3400                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               

      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    131896948(0x7dc9674) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-XX                             
  Marketing Name:          Device 73bf                        
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 29631(0x73bf)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2660                               
  BDFID:                   12544                              
  Internal Node ID:        1                                  
  Compute Unit:            80                                 
  SIMDs per CU:            4                                  
  Shader Engines:          8                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        64(0x40)                           
  Max Work-item Per CU:    2048(0x800)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

@ppanchad-amd
Copy link

@powderluv, Sorry for the lack of response. Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if your issue still exists? If resolved, please close the ticket. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants