Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method 2 not working, but Method 3 does #11

Open
Trollwut opened this issue Nov 8, 2023 · 13 comments
Open

Method 2 not working, but Method 3 does #11

Trollwut opened this issue Nov 8, 2023 · 13 comments

Comments

@Trollwut
Copy link

Trollwut commented Nov 8, 2023

Hey there!

Glad that I found your script, thanks for sharing!

Trying it all on KDE Wayland, Arch Linux, script via AUR.
So I did the setup and chose my eGPU (nvidia) as primary and my two integrated (Intel + nvidia) as the integrated ones.

On the three questions after the guided setup, when I yes the Method 2 - regardless if with 3 or not - it won't work.

I have full-disk-encryption and on boot it asks for the password. It decrypts and I will always still see the "enter your password" on TTY1. I can change TTYs and fumble around there, but returning to TTY1 will still show the password prompt.

If I deactivate Method 2 and ONLY use Method 3, it kinda works.
The SDDM greeter is on my integrated screen, but after login, my eGPU monitors turn on and have a very nice wayland setup. Really nice, everything is running on the eGPU (checked via nvidia-smi).
Problem is, that when I'm opening something else (even like glxgears), it will render on GPU0, which is the integrated nvidia, not the eGPU one.
This results in one window that only has display artifacts in it.

Is there something I'm missing to get Method 2 (or 3) to work? Or is this a known problem?

Would an AMD GPU be less hassle?

Please tell me which information you need for troubleshooting, I'll gladly submit it. :)

@Trollwut
Copy link
Author

Trollwut commented Nov 8, 2023

I might have found something.

This is my status output of the script:

Enter Choice [0-9]: 
9
Method 1 setup with following Bus IDs
0000:00:02.0 i915
0000:01:00.0 nvidia
Method 2, 3 setup with following Bus IDs
0000:06:00.0 nvidia
0000:06:00.0 eGPU connected, not set as primary with Method 2
0000:06:00.0 eGPU currently set as primary with Method 3
Method 1 auto switch at startup service
○ all-ways-egpu.service - Configure eGPU as primary under Wayland desktops
     Loaded: loaded (/etc/systemd/system/all-ways-egpu.service; disabled; preset: disabled)
     Active: inactive (dead)
Method 2 auto switch at startup service
○ all-ways-egpu-boot-vga.service - Configure eGPU as primary using boot_vga under Wayland desktops
     Loaded: loaded (/etc/systemd/system/all-ways-egpu-boot-vga.service; disabled; preset: disabled)
     Active: inactive (dead)
Method 3 auto switch at startup service
○ all-ways-egpu-set-compositor.service - Configure eGPU as primary using compositor variables under Wayland desktops
     Loaded: loaded (/etc/systemd/system/all-ways-egpu-set-compositor.service; enabled; preset: disabled)
     Active: inactive (dead) since Wed 2023-11-08 22:31:35 CET; 2min 55s ago
   Duration: 242ms
    Process: 2027 ExecStart=all-ways-egpu set-compositor-primary egpu (code=exited, status=0/SUCCESS)
   Main PID: 2027 (code=exited, status=0/SUCCESS)
        CPU: 199ms

Nov 08 22:31:35 trollwut systemd[1]: Started Configure eGPU as primary using compositor variables under Wayland desktops.
Nov 08 22:31:35 trollwut all-ways-egpu[2030]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2030]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2032]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2032]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2035]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2035]: grep: warning: stray \ before :
Nov 08 22:31:35 trollwut all-ways-egpu[2027]: Compositor variables set. Restart Display Manager for changes to take effect.
Nov 08 22:31:35 trollwut systemd[1]: all-ways-egpu-set-compositor.service: Deactivated successfully.
Press [Enter] to return to menu.

Seems quite cool with everthing, except that grep has a little problem.

So I checked the files in /etc/environtment.d/ and they seem a little off:

$ bat /etc/environment.d/10kwin.conf 
───────┬────────────────────────────────────────────────────────────────────────
       │ File: /etc/environment.d/10kwin.conf
───────┼────────────────────────────────────────────────────────────────────────
   1   │ KWIN_DRM_DEVICES=/dev/dri/card1::/dev/dri/card0/dev/dri/card2
───────┴────────────────────────────────────────────────────────────────────────

See the :: and one missing between the cards.
Same in the sway file.

So I tried to correct the file to KWIN_DRM_DEVICES=/dev/dri/card1::/dev/dri/card0/dev/dri/card2 and tried to restart it, but then the same happens as with Method 2: it just doesnt work.

When I restart the machine, the file is in the wrong ::-State again.

Any idea on how to investigate further, or even solve this?

@NL-TCH
Copy link

NL-TCH commented Nov 8, 2023

the output of my 10kwin.conf with the egpu disconnected on nobara 38 gnome:
KWIN_DRM_DEVICES=/dev/dri/card0:/dev/dri/card1
does this help you?

@ewagner12
Copy link
Owner

@Trollwut
For the SDDM issue I think that is down to SDDM still using X on the backend rather than Wayland.

It does look like the script is having an issue parsing your 3 GPUs. I don't have a system with 3 GPUs to test so I've only tested it with 2 GPU entries. I'll look into the parsing issue.
In the meantime, could you try running the setup again and only setting the iGPU as integrated and not setting the dGPU as either? With Method 2 this setup should still work to set the eGPU as primary and you should get an output file similar to what @NL-TCH posted

@Trollwut
Copy link
Author

Trollwut commented Nov 8, 2023

@Trollwut For the SDDM issue I think that is down to SDDM still using X on the backend rather than Wayland.

most likely :) I dont mind having it on any screen though. But would you have an alternative that runs on Wayland? Maybe gdm, but does this integrate well with KDE-Wayland?
(Sry, this is my first Wayland setup, so this tech is quite new to me.)

It does look like the script is having an issue parsing your 3 GPUs. I don't have a system with 3 GPUs to test so I've only tested it with 2 GPU entries. I'll look into the parsing issue. In the meantime, could you try running the setup again and only setting the iGPU as integrated and not setting the dGPU as either? With Method 2 this setup should still work to set the eGPU as primary and you should get an output file similar to what @NL-TCH posted

So I did this. I even uninstalled the script first to start from scratch. Fun fact: It even hates parsing two GPUs on my setup :D

So guided setup, eGPU as primary, iGPU (not dGPU) as internal.
Activated Method 2, doesnt work.

Deactivated Method 2 and activated 3 -> works the same as with 3 GPUs selected. SDDM on integrated screen, boots into a Wayland session, but any 3D graphical program will open on the wrong GPU and result in a graphic-artifact-window.

This is the KWin environment file with only 2 GPUs selected:

$ cat /etc/environment.d/10kwin.conf 
KWIN_DRM_DEVICES=/dev/dri/card1::/dev/dri/card0/dev/dri/card2

I will be going to sleep now, so I can only troubleshoot further tomorrow evening.

This would be some lspci output:

$ lspci -k | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics] (rev 0c)
01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [Geforce RTX 3070 Ti Laptop GPU] (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080] (rev a1)

And when using Method 2 - which doesnt work for me right now - shows just the blank screen, this is the SDDM status output (which confirms its using X as backend):

● sddm.service - Simple Desktop Display Manager
     Loaded: loaded (/usr/lib/systemd/system/sddm.service; enabled; preset: disabled)
     Active: active (running) since Wed 2023-11-08 23:10:51 CET; 56s ago
       Docs: man:sddm(1)
             man:sddm.conf(5)
   Main PID: 2120 (sddm)
      Tasks: 2 (limit: 37955)
     Memory: 286.1M
        CPU: 1.576s
     CGroup: /system.slice/sddm.service
             └─2120 /usr/bin/sddm

Nov 08 23:10:55 trollwut sddm[2120]: Failed to read display number from pipe
Nov 08 23:10:55 trollwut sddm[2120]: Display server stopping...
Nov 08 23:10:55 trollwut sddm[2120]: Attempt 2 starting the Display server on vt 2 failed
Nov 08 23:10:57 trollwut sddm[2120]: Display server starting...
Nov 08 23:10:57 trollwut sddm[2120]: Writing cookie to "/run/sddm/xauth_OghlRV"
Nov 08 23:10:57 trollwut sddm[2120]: Running: /usr/bin/X -nolisten tcp -background none -seat seat0 vt2 -auth /run/sddm/xauth_OghlRV -noreset -displayfd 16
Nov 08 23:10:57 trollwut sddm[2120]: Failed to read display number from pipe
Nov 08 23:10:57 trollwut sddm[2120]: Display server stopping...
Nov 08 23:10:57 trollwut sddm[2120]: Attempt 3 starting the Display server on vt 2 failed
Nov 08 23:10:57 trollwut sddm[2120]: Could not start Display server on vt 2

I hope this is some info that might help you investigate the problem with me. I will reply tomorrow. :) Thanks in advance!

@ewagner12
Copy link
Owner

@Trollwut
SDDM has an experimental wayland backend so if you're on the KDE side I'd just keep on SDDM and wait for the wayland work to be done.

In the logs from using Method 2 I think it looks like the SDDM is trying to start, but can't find a display. I think nvidia has a specific Xorg option that disables the display output on mobile GPUs and that might be applying to the eGPU as well because it's sharing the same driver. You could try using the experimental wayland SDDM backend to get around Xorg or look into changing the xorg.conf "UseDisplayDevice" option described here: https://download.nvidia.com/XFree86/Linux-x86/390.157/README/xconfigoptions.html

I made a change to the git code to fix the parsing issue, could you could try installing the latest git version according to https://github.com/ewagner12/all-ways-egpu#git
And see if that changes anything when using Method 3 only.

I've also heard of people having better luck on eGPUs with the nvidia-open driver rather than the proprietary nvidia driver but can't confirm that personally.

@Trollwut
Copy link
Author

Trollwut commented Nov 9, 2023

Hey there! So I tested it and it seems like it works now. I still have problems rendering 3D apps, but we'll come to that. :)

Ok, so here is what I did and how it went:

At first, I activated SDDM's Wayland support as described in the [https://wiki.archlinux.org/title/SDDM#Running_under_Wayland](Arch Wiki).
No changes with the script, I just rebooted to test it and it worked. Nice.

Then I used the script to uninstall itself and revert its changes.

After that, I downloaded and installed your script via Git method, as you wished.

I ran the guided setup, chose my eGPU and both internal GPUs.

I only used Method 2 (no 3 involved) and rebooted -> This worked!!! WOOHOOO!!!

I then ran the guided setup again and activated Method 2 AND Method 3 and rebooted -> It works!! And I also can use my internal display as well! :D

Ok, so this just works now. Thanks for your quick support!

But there still is one thing left:

When opening 3D accelerated apps, like glxgears, I still just get a window with graphical artifacts ([https://wiki.archlinux.org/title/SDDM#Running_under_Wayland](here is an image).

With nvidia-smi I can see that it runs on the internal dGPU and not the eGPU:

$ nvidia-smi
Thu Nov  9 20:33:05 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3070 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   43C    P8              14W / 125W |     20MiB /  8192MiB |      8%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080        Off | 00000000:06:00.0  On |                  N/A |
|  0%   47C    P0             124W / 340W |   1065MiB / 10240MiB |     48%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3145      G   /usr/bin/ksmserver                            3MiB |
|    0   N/A  N/A      3406      G   /usr/bin/kaccess                              3MiB |
|    0   N/A  N/A     11520      G   glxgears                                      5MiB |
|    1   N/A  N/A      3005      G   /usr/bin/kwalletd5                            3MiB |
|    1   N/A  N/A      3028      G   /usr/bin/kwin_wayland                       433MiB |
|    1   N/A  N/A      3114      G   /usr/bin/Xwayland                             4MiB |
|    1   N/A  N/A      3151      G   /usr/bin/kded5                                3MiB |
|    1   N/A  N/A      3182      G   /usr/bin/plasmashell                        248MiB |
|    1   N/A  N/A      3228      G   ...b/polkit-kde-authentication-agent-1        3MiB |
|    1   N/A  N/A      3230      G   /usr/lib/xdg-desktop-portal-kde               3MiB |
|    1   N/A  N/A      3382      G   /usr/bin/barrier                              3MiB |
|    1   N/A  N/A      3383      G   /usr/bin/nextcloud                            3MiB |
|    1   N/A  N/A      3408      G   /usr/lib/DiscoverNotifier                     3MiB |
|    1   N/A  N/A      3704      G   /usr/bin/konsole                              3MiB |
|    1   N/A  N/A      3770      G   /usr/bin/konsole                              3MiB |
|    1   N/A  N/A     10223      G   /usr/bin/krunner                             16MiB |
|    1   N/A  N/A     10290      G   /usr/lib/baloorunner                          3MiB |
|    1   N/A  N/A     10325      G   /usr/lib/firefox/firefox                    234MiB |
|    1   N/A  N/A     10644      G   ...bin/plasma-browser-integration-host        3MiB |
|    1   N/A  N/A     10945      G   /usr/bin/systemsettings5                     40MiB |
+---------------------------------------------------------------------------------------+

Is there a way to only use my eGPU for rendering? I cant seem to get it to work.

My DRI_PRIME= wont work, it always uses my dGPU, not eGPU:

$ DRI_PRIME=0 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2
$ DRI_PRIME=1 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2
$ DRI_PRIME=2 glxinfo | grep "OpenGL renderer"
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2

Any other idea on how to make it work?

If you need any further logs, please ask! :)

@ewagner12
Copy link
Owner

@Trollwut
Glad to hear that the script is working as it should now!

For the 3D apps running on the dGPU I believe that's down to the nvidia driver side. Could you try using __NV_PRIME_RENDER_OFFLOAD_PROVIDER as noted here. I think you'll want to set it equal to NVIDIA-G1. DRI_PRIME only works with the open drivers (intel, amdgpu, and nouveau).

@Trollwut
Copy link
Author

Trollwut commented Nov 10, 2023

Well, it works almost as it should :D

So I tested your env, but it doesnt do anything for me:

$ __NV_PRIME_RENDER_OFFLOAD_PROVIDER=0 glxinfo | grep renderer
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2
$ __NV_PRIME_RENDER_OFFLOAD_PROVIDER=1 glxinfo | grep renderer
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2
$ __NV_PRIME_RENDER_OFFLOAD_PROVIDER=2 glxinfo | grep renderer
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2

It also seems that there simply is no choice:

$ xrandr --listproviders
Providers: number : 0

I don't have an X config (as I thought I'm running Wayland anyway?). So maybe thats the problem, not having any X config?
If I do need one, what would be the best approach to create it? Especially so that it fits with either eGPU or no-eGPU setup.

@Trollwut
Copy link
Author

I'm now a little confused... for the sake of troubleshooting, I tried Method 1.

So yes, it switched internals off, both eGPU monitors are now running bla bla bla.

But after login, when I run some 3D stuff, it still wants to grab my dGPU. I thought it was switched off?

$ glxinfo | grep renderer
OpenGL renderer string: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2

@Trollwut
Copy link
Author

So I fiddled around every evening for a few hours and I'm out of ideas...

things I did try:

  • adding my user to the video group (it's unnecessary since systemd, but I'm desperate)
  • using nvidia-open-dkms
  • disable dGPU via ASUS software (I can't turn it off via BIOS/UEFI...)

It's always the same outcome: If I open graphics accelerated window with XWayland, I just get a window filled with graphical artrifacts. nvidia-smi shows that these program run on the wrong GPU id, not the eGPU.

If someone has another idea, I'm open!

@ewagner12
Copy link
Owner

@Trollwut
The only other thing I'd recommend is trying out an Xorg based desktop like Plasma (X11) along with one of the scripts like egpu-switcher. If it works in a regular Xorg desktop then that could be a workaround and would narrow it down to an XWayland issue specifically. If it doesn't work there then this might be a more general issue. Besides that I don't think I have any other ideas for this.

@Trollwut
Copy link
Author

Xorg was the first thing I tried (even though I wanted to end up on Wayland), just because that's the platform I know and I want my eGPU setup.

Yes, there it works. With all disadvantages. :D But it works.

Problem is, that I have 2 monitors on it with different refresh rates, different aspect ratio and one of them is 90 degrees tilted. This results in a veeeery laggy desktop experience (as long as anything moves on both monitors). So yes, Xorg would work, but definitely not for me.

I just ordered an AMD 7900 XT, so that I can try it with that (read: Intel iGPU, nvidia dGPU, AMD eGPU). If this works as intended, I must say that it's some kind of nvidia fuckery.

This would also explain why I cant find any setup like mine. There are many iGPU+eGPU setups, but never with dGPU (at least not nvidia+nvidia; mixed setups are possible, as it seems).

I guess I can test it this weekend, so I will reply when I have results. :)

Maybe I'll find something else to test for nvidia+nvidia, but for now I don't have any ideas left...

@ewagner12
Copy link
Owner

IDK if you figured this out on your end, but for reference or if anyone else comes across this I'm linking a similar issue and workaround reported for an nvidia+nvidia eGPU setup.
hertg/egpu-switcher#117
It seems that the nvidia proprietary driver has a quirk where it needs to be reloaded for the script to work properly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants