Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

framework laptop 16 hybrid gpu support #101

Open
lamikr opened this issue Jul 5, 2024 · 4 comments · Fixed by #111
Open

framework laptop 16 hybrid gpu support #101

lamikr opened this issue Jul 5, 2024 · 4 comments · Fixed by #111

Comments

@lamikr
Copy link
Owner

lamikr commented Jul 5, 2024

I received a Framework 16 laptop for testing and development with AMD's cpus and gpus.

  • 7840HS CPU
  • 780M iGPU (gfx1103) with 12 CU's (rocm-smi device id 0x7480)
  • 7700S GPU (gfx1102) with 32 CU's (rocm-smi device id 0x15bf
  • 32GB so-ram (need to check if I can update it to 64 or 96 gb later)

So far tested:

  • gfx1102 7700S works with basic tests ok. Have not had time to do any benchmarks yet with it.
  • gfx1103 will need more work and will start debugging it now

This is the first time I am able to test with hyprid gpu's and would like to find ways to test all 3 scenarios:

  • Either 7700S or 780M alone (should be doable by masking another gpu away from rocm)
  • Some tasks where it would make sense to share the task between both GPU's
@lamikr
Copy link
Owner Author

lamikr commented Jul 5, 2024

Framework laptop has 2 M.2 SSD slots and I plan to install different Linux distros to one of them and use second slot as a storage for builds. Not sure whether I could also get some distros installed to usb keys and booted from there.

So far I have tested the 7700S functionality with Fedora 40.

@jrl290
Copy link

jrl290 commented Jul 11, 2024

I will be very interested to see what you find. My 7840U (780M - gfx1103) will operate properly with pytorch on any gfx11xx build, but it randomly halts. Right now I'm just running it by restarting the python script if it exits without an expected return value. Not ideal but it gets me by.

lamikr added a commit that referenced this issue Jul 14, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 14, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 14, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
@lamikr lamikr closed this as completed in 750fe4c Jul 15, 2024
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 15, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 16, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
lamikr added a commit that referenced this issue Jul 17, 2024
- initial support for gfx1036 and gfx1103 as a build target
- updated also the gfx1010 configuration settings to be
  more similar in composable kernel and miopen

fixes: #101
fixes: #103

Signed-off-by: Mika Laitio <[email protected]>
@lamikr
Copy link
Owner Author

lamikr commented Jul 17, 2024

Initial work is now done and both the integrated M780 (gfx1103) and external 7700S (gfx1102) are
selectable as a build target and can be used. Memory and GPU usage for both of them also show up on nvtop.

More testing with the distro and new Linux 6.10 kernel is however still needed.

@lamikr lamikr reopened this Jul 17, 2024
@jrl290
Copy link

jrl290 commented Jul 17, 2024

That's great! I have downloaded and installed and am testing now. Seems I am unable to install the official Linux 6.10 kernel, but I am able to use the Linux 6.10 rc4 kernel. Important too, since the auto allocation of shared memory is supported

I am getting this warning upon loading the pytorch_lightning module, but it doesn't seem actually affect the processing:
hip_fatbin.cpp: COMGR API could not find the CO for this GPU device/ISA: amdgcn-amd-amdhsa--gfx1103

I am still randomly coming across a fatal error:
HW Exception by GPU node-1 (Agent handle: 0x5d70d5ca5b90) reason :GPU Hang

Interestingly enough though, this is only occurring in one section of pytorch code and not in another. So I'll have to investigate to see where exactly the differences may be triggering the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants