Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying it out on Arch Linux #8

Open
cadr10 opened this issue Mar 28, 2020 · 20 comments
Open

Trying it out on Arch Linux #8

cadr10 opened this issue Mar 28, 2020 · 20 comments

Comments

@cadr10
Copy link

cadr10 commented Mar 28, 2020

I've successfully made qemu work in arch and setup kvm. Came across your program and setup everything as describe in the readme. When I run ./start-vm.sh, both my monitors go black (this happens after unbinding the pci). Any idea why it's happening? I use optimus-manager instead of bumblebee and I tried setting mode to intel, hybrid and nvidia. Same result.

Device: Aspire VX5-591G (muxed, according to compability-check).

I'll try to fix it myself in the coming days but posting this if there's clarity somewhere.

@T-vK
Copy link
Owner

T-vK commented Mar 28, 2020

At the moment only Fedora is supported. If you want to make it work on Arch, you have create utils/Arch/{version} and translate the files from https://github.com/T-vK/MobilePassThrough/tree/master/utils/Fedora/30 to work on Arch.

@cadr10
Copy link
Author

cadr10 commented Apr 17, 2020

What are the dependencies to generate the helper iso? I ran the script to generate it and it just downloads virtio-drivers and couple of exes.
I got all of the dependencies with official repositories and aur. When I reinstall my os, I'll make a list of what I installed to get it working.
I know question is different topic but I am not opening a new issue for it, l.

@T-vK
Copy link
Owner

T-vK commented Apr 17, 2020

For the helper iso you need a package providing genisoimage. Not sure which version.
I haven't actually tested the iso generator in quite some time. It might be broken.

I think the last line should probably be replaced with

genisoimage -J -joliet-long -r -allow-lowercase -allow-multidot -o "${HELPER_ISO}" "${PROJECT_DIR}/helper-iso-files"

@Rabcor
Copy link

Rabcor commented Dec 4, 2020

For the helper iso you need a package providing genisoimage. Not sure which version.
I haven't actually tested the iso generator in quite some time. It might be broken.

I think the last line should probably be replaced with

genisoimage -J -joliet-long -r -allow-lowercase -allow-multidot -o "${HELPER_ISO}" "${PROJECT_DIR}/helper-iso-files"

Yeah I'm trying to set this up on manjaro over here, the original files ended up giving me these errors

> Generating new iso...
Warning: creating filesystem that does not conform to ISO-9660.
Setting input-charset to 'UTF-8' from locale.
genisoimage: No such file or directory. Invalid node - '/home/rabcor/Windoze/vm-files/mobile-passthrough-helper.iso'.

When running the genisoimage command, but replacing it with the version you mentioned in this comment solved it.

NOw I'm finally at the start vm stage, crossing my fingers. I didn't actually fix the dependency download scripts though, I just installed them all manually, so unfortunately I have no such files to share, but any arch user should easily be able to figure out what to download from the files.

Also depends on cdrtools for aforementioned genisoimage command.

Edit: Sadly got the same black-screen issue, although this is further than I ever got without the help of these scripts, too bad it was still an undesirable result, time for some troubleshooting :P

Edit2: Running it again after the initial crash yielded these results:

> Loading config from /home/rabcor/Windoze/user.conf
> Using a virtual OS drive...
> Warning: Bumblebee is not available or doesn't work properly. Continuing anyway...
> Retrieving and parsing DGPU IDs...
./start-vm.sh: line 64: optirun: command not found
> Loading vfio-pci kernel module...
> Using Looking Glass...
> Calculating required buffer size for 1920x1080 for Looking Glass...
> Looking Glass buffer size set to: 32M
> Starting IVSHMEM server...
sudo: unknown user: qemu
sudo: error initializing audit plugin sudoers_audit
> Adjusting permissons for the IVSHMEM server socket...
chmod: cannot access '/tmp/ivshmem_socket': No such file or directory
> Not using DGPU vBIOS override...
> Using SMB share...
> Using dGPU passthrough...
> Unbinding dGPU from nvidia driver...
0000:01:00.0 /sys/bus/pci/devices/0000:01:00.0/driver/unbind
> Binding dGPU to VFIO driver...
> Not using mediated iGPU passthrough...
> Using spice on port 5900...
> Using QXL...
> Not using USB passthrough...
> Using virtual input method 'usb-tablet' for keyboard/mouse input...
> Starting the Virtual Machine...
qemu-system-x86_64: -drive file=/run/media/rabcor/New Volume/Windoze Ameliorated/AME_2004_(2020-10-31).iso,index=1,media=cdrom: Could not open '/run/media/rabcor/New Volume/Windoze Ameliorated/AME_2004_(2020-10-31).iso': No such file or directory
> Binding dGPU back to nvidia driver...
bash: /sys/bus/pci/drivers/vfio-pci/0000:01:00.0/driver/unbind: No such file or directory
bash: /proc/acpi/bbswitch: No such file or directory

Fortunately not a crash, but it highlights some issues, namely the requirement for bumblebee to be installed (which I don't, something I put off, manjaro is a bit iffy when it comes to installing bumblebee these days, it's semi-not-supported) and some issues with sudo & users (presumably for skipping some commands in the looking-glass-setup script)

@Rabcor
Copy link

Rabcor commented Dec 4, 2020

Ok so I had to create a user (useradd -m qemu) then run those looking-glass-setup script commands to give it permission to use looking glass to get rid of the missing user error; additionally I had to edit where start-vm.sh tries to use optirun and make it use primusrun instead, because optirun was not working correctly.

And yea after fixing everything I ran start-vm.sh > output.txt to try and see what happens when I get the black screen, and this is it:

> Loading config from /home/rabcor/Windoze/user.conf
> Using a virtual OS drive...
> Bumblebee works fine on this system. Using optirun when necessary...
> Retrieving and parsing DGPU IDs...
> Loading vfio-pci kernel module...
> Using Looking Glass...
> Calculating required buffer size for 1920x1080 for Looking Glass...
> Looking Glass buffer size set to: 32M
> Starting IVSHMEM server...
> Adjusting permissons for the IVSHMEM server socket...
> Not using DGPU vBIOS override...
> Using SMB share...
> Using dGPU passthrough...
> Unbinding dGPU from nvidia driver...
0000:01:00.0 /sys/bus/pci/devices/0000:01:00.0/driver/unbind
> Binding dGPU to VFIO driver...
> Using mediated iGPU passthrough...
> Create vGPU for mediated iGPU passthrough...
> Using dma-buf...
> Using spice on port 5900...
> Using QXL...
> Not using USB passthrough...
> Using virtual input method 'usb-tablet' for keyboard/mouse input...
> Starting the Virtual Machine...
QEMU 5.1.0 monitor - type 'help' for more information
(qemu) DNSMASQ terminated
ip_forward disabled
> Binding dGPU back to nvidia driver...
> Remove Intel vGPU...

not entirely sure what to make of it.

Update: Digging deeper I discovered that iGPU sharing was what caused the black screen, disabling it solved that issue.

And the reason Qemu keeps crashing is #2

Update2: After working around issue #2 I am now encountering this error instead:

(qemu) qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,x-pci-sub-device-id=0x1272,x-pci-sub-vendor-id=0x1462,multifunction=on: vfio 0000:01:00.0: failed to open /dev/vfio/1: No such file or directory

This is where I'm truly stuck, I imagine the cause for this is that the unbinding of the GPUs is failing and that it cannot bind the devices as a result.... I have no idea how to solve that though.

Update3: The aforementioned "no such file or directory" issue only happened because I didn't have bumblebee enabled at the time. But now that I have up-to-date bumblebee installed, qemu finally didn't crash! (yay!) So I got it kinda sorta maybe but not really working

Qemu may have kept running, but my host machine went haywire, random stuff like conky just crashing, I couldn't open my web browser or any applications really they'd just hang, it was a real mess, I couldn't even restart my machine normally and had to forcefully power it down, so that would be my next issue, making it work without destroying my host. Now I need to find out however if it is really working, since I couldn't manage to actually display it last time I ran it.

Update 4: Yeah that did not go according to plan, running the qemu VM fucks up my host and when it is running if I try to run spicy, although the command seems to go through just fine it does not display any window and just hangs. Upon further troubleshooting however I discovered that iommu is not properly activating, and in dmesg I'm getting this really weird string of "AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)" error messages. I know iommu was working on this laptop earlier too so this is kinda odd, guess that's where I look to next.

Update 5: So, I changed my nvidia driver to an outdated (430) version, one that doesn't actually work properly (but the card loads it, just that when I try to use opti/primusrun it gives errors, and my intel drivers also seem broken on this setup); this actually (weirdly) seems to solve the iommu issue, but now I'm getting a new error.

(qemu) qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,x-pci-sub-device-id=0x1272,x-pci-sub-vendor-id=0x1462,multifunction=on: vfio 0000:01:00.0: group 1 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.
DNSMASQ terminated

Update 6: Finally limited success! It turns out it was the stupid fucking kernel, I was on the 5.4 kernel, and it was causing that PCIe error spam which was messing up my laptops graphics and bogging down my system pointlessly, what ever happened to quality control? That was supposed to be a freaking LTS kernel god damn it...

Anyhow, I switched back to 4.19, that AER PCIe Bus error spam was gone, I tried running the VM again and bam, just like that, it worked.

Reason I say limited success though is because there are still a few problems. (this is an everything that can go wrong, will go wrong, kind of day for me...) the internet connection didn't work, I'm pretty sure that's just because I set up Ameliorated Windows instead of the normal one, so I'm downloading a normal windows 10 pro now for that one....

But it's also that everytime after I turn on the VM once, once it turns off I will not be able to turn it back on again unless I restart the PC. After turning off the VM I'm also experiencing some weird audio glitches in pulse, where the audio sometimes freezes for like a second, it's actually kinda creepy, like a ghost in the machine :/

I also get this spammed at me in dmesg:

NVRM: The NVIDIA probe routine was not called for 1 device(s).
[  671.443997] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[  671.443998] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[  671.443998] NVRM: No NVIDIA devices probed.
[  671.444177] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
[  671.930680] nvidia-nvlink: Nvlink Core is being initialized, major device number 235

Which could be a hint what's going on (and no, the Nvidia card is not loading nouveau, it's using the nvidia driver correctly, optirun still works and everything!)

Anyhow, the error I get when trying to turn on the VM a second time is the same as a bit earlier: `failed to open /dev/vfio/1: No such file or directory``

I believe it's got something to do with 'prime' most likely, but I'm not entirely sure what to do about it.

It also seems to make my battery monitor glitch the hell out, it thinks I have 30 hours of battery life :'D (I only have 1)

Yeah, there's a lot of weird shit that's going on after I turn the VM on and off, although I think whilst the VM is still turned on I'm not getting these issues, but yeah...

Still more troubleshooting ahead of me :(

Update 7: So I installed a normal Windows 10 installation just in case, turns out most likely the reason why I couldn't access the internet on my ameliorated install was because I forgot to install the virtio drivers.

I found that the batch file that ships in the helper iso is entirely broken, I had to rewrite some parts of it to get it to work (it wasn't anything complicated, just a wrong path here, wrong command there)

As for my system going to shit after turning off a VM, I traced it to the command that binds the gpu back to the nvidia driver. If I let it stay bound to the vfio driver,besides the fact that my nvidia card will not work on my host, everything works normally and I can restart the VM as much as I want.

The nvidia probe routine call thing seems to be a normal message that I get because the nvidia driver is running but the gpu is unbound from it so it had no relation to any of my issues.

I'm now downloading the nvidia driver and unigine heaven to see if this stuff is truly going to work or not before I try to fix the remaining issues.

Speaking of which, new issues cropped up as I tried to switch from spicy to looking glass, starting with Remmina, Remmina fails to connect, it tries to connect but just appears to time out, here's my output

(org.remmina.Remmina:3686089): Gdk-CRITICAL **: 21:55:15.671: gdk_window_thaw_toplevel_updates: assertion 'window->update_and_descendants_freeze_count > 0' failed
[21:55:30:511] [3686089:3686143] [ERROR][com.freerdp.core] - freerdp_tcp_connect:freerdp_set_last_error_ex ERRCONNECT_CONNECT_FAILED [0x00020006]
[21:55:30:511] [3686089:3686143] [ERROR][com.freerdp.core] - failed to connect to 192.168.99.2

Update 8: Still no idea how to connect to RDP... It seems like there's some stupid thing like a firewall config or something getting in my way

I did get code 43, but installing the nvidia driver from my manufacturer's site solved it, I get code 43 again on each restart, however I can solve it again by reinstalling the nvidia driver again, however I have had no way to test if the card actually works or not because I can't connect to the stupid RDP to test it!

Argh! Also I noticed that the shared folder does not seem to be found on the guest machine, weirdly...

I do have some good news though, as it turns out the step to unbind the vfio driver/rebind to nvidia driver is entirely unnecessary and can just be skipped. Even when the card is bound to the vfio-pci driver, if I use optirun, it'll hook it into the nvidia driver on it's own, and getting the nvidia card working again this way, instead of the way the original script tries to do it does not break my PC anymore. Still no word on how to unbind the card from the nvidia drivers a second time though, I think it's because the card is going into 'Prime' mode whenever I actually run it, which means the card is always active unlike bumblebee, there has to be a solution to that though, but right now RDP is the biggest worry for me.

@T-vK
Copy link
Owner

T-vK commented Dec 5, 2020

Sorry for all the bugs. I haven't done any work on this project in a long time. It's great to see though that you figured out most of the issues.
I remember having a lot of issues with looking glass as well the last time I was working on this. RDP did work pretty reliably for me though. The only issue with it that I can remember was that I had to set up a password on the Windows user account, otherwise it wouldn't let me connect.

@Rabcor
Copy link

Rabcor commented Dec 5, 2020

@T-vK well I knew what to expect for a project that hasn't been updated for almost 2 years ;)

I'm surprised it worked as well as it did in fact. I probably never could have figured out the initial setup for qemu without your script automating and guiding me through most of the process (I've tried before, so I know I wouldn't even get halfway through it in fact)

I'm still stumped by the RDP issues but I'm too burnt out from working all day yesterday to get to the point I'm currently at to go troubleshoot that too (besides I don't even know where to start, I tried it, got some errors, googled the errors, and got essentially nothing, I also tried xfreerdp only to get the exact same results, failure to connect)

It sucks to be so close, and then get caught on some stupid technicality like this that was never even supposed to be a possible issue ☹️ it looks like everything should work, but it just doesn't, I'm so close I can taste it, I even managed to get rid of that legendary code 43 that trips up most people (I was actually expecting to get stuck there, not... not here...)

I also have problems with ACPI on the host, namely that bbswitch can't seem to properly turn off the dgpu, I also tried this acpi_call method that's supposed to do the same but I always get a permission denied error there; unless I solve this issue I won't be able to turn the VM on at any point after using the dGPU on the host, if I solve it however, it might open a lot of doors.

As it stands it looks like these are the only two issues left, but I'm just too burnt out to deal with them today, it's a cold winter night, so I'm gonna make hot chocolate and watch the next episode of mandalorian instead 👍

Hopefully tomorrow's me will be able to figure something out.

Anyway thanks for all your work man! I really appreciate it, even if you're not still maintaining it.

@Rabcor
Copy link

Rabcor commented Dec 6, 2020

I've done some fixes to the start-vm.sh, fixed a few important typos/broken commands. (Like unbinding the dgpu from the nvidia driver, and the subsystem id collection)

This allows me to successfully run the VM again after using the dGPU on the host machine, however unbinding the dGPU from vfio-pci is no longer as easy as optirun doing it automagically, and the commands in the script to do just that (the ones which formerly broke my PC) are not doing it right either.

Additionally since bumblebee is kinda obsolete and I'm having some problems with it I was wondering if I could do this with Prime instead, I can actually switch between prime and bumblebee as simply as running 1 command and restarting the machine, but when I tried with prime the script hangs on unbinding the dGPU from the driver.

So I have 3 goals now.

1: Get RDP working 😿
2: Reconnect dGPU to host nvidia driver after shutting down VM successfully
3: Get everything working with Prime instead of Bumblebee (it seems there's only something minor missing for this; I know with the older method of using prime to run X from nvidia or intel instead of allowing both to co-exist, it was possible to swap between the two without a restart, although it did require logging out and back in, I might be able to find clues there)

...

Reconnecting the dGPU was quite a simple matter, all I had to do was do it manually, I added this line to the script:
sudo bash -c "echo '${DGPU_VENDOR_ID} ${DGPU_DEVICE_ID}' > '/sys/bus/pci/drivers/nvidia/new_id'"

And that does it, I can now freely unbind/rebind the dGPU to the nvidia/vfio-pci drivers, and start/shut down the VM as many times as I want without any issues besides the aforementioned RDP ones.

The fact that I'm this close to success but am still failing to get RDP working and thus cannot test if the dGPU is actually working inside the VM or not is maddening. It appears that everything except RDP is now working near-flawlessly.

The biggest issue actually is on intel's end. I keep getting this error when bumblebee is enabled:
[drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A

It seems entirely unrelated to the VM though, but what it does is cause occasional audio glitches/artifacts. (This is my primary reason for wanting to get this shit done without bumblebee)

@T-vK
Copy link
Owner

T-vK commented Dec 7, 2020

I haven't really looked into prime, but from what I recall it didn't allow for switching GPUs without logging out. Having to log out makes for a really bad user experience imo. I was also hoping to support the nouveau driver at some point.
Maybe it would make sense to add a config option to let the user decide if he wants to use bumblebee, nvidia-prime or nothing at all.

It's really great to see that you found and fixed so many bugs. Maybe you can make a pull request so that others can benefit in the future.

Can you go a bit more into detail on your RDP issue?
Have you checked if you can ping the VM at all from the host?
Did you check if the RDP port is actually exposed properly (e.g. using nmap)?
If you have a USB network card maybe you can try passing that through directly. I think that's how I ended up using my VM most of the time.

I'm not sure about audio issues. Last time I was working on this my focus was on getting the video part working more stable. I haven't spent any time focusing on audio or performance optimizations.
Pulseaudio on my Fedora machines has never really been stable to begin with so I probably wouldn'T have noticed any additional issues. In regards to audio within the VM I always used USB passthrough to pass a USB sound card through to the VM.

@Rabcor
Copy link

Rabcor commented Dec 7, 2020

I finally got freerdp to work, apparently the problem was with the Guest machine's firewall settings, turning off the firewall worked as a hotfix, gonna have to figure out what ports it's blocking, hopefully it's just the default RDP port.

Looking glass however is not working:

$ looking-glass-client 1134210911 [I] main.c:1675 | main | Looking Glass (B2-0-g76710ef201) 1134210927 [I] main.c:1676 | main | Locking Method: Atomic 1134270798 [I] ivshmem.c:180 | ivshmemOpenDev | KVMFR Device : /dev/shm/looking-glass 1134270813 [E] ivshmem.c:227 | ivshmemOpenDev | Failed to open: /dev/shm/looking-glass 1134270816 [E] main.c:1249 | lg_run | Failed to map memory

In other news though I did sorta figure out a way to make this work with prime offloading mode.

installing bumblebee and running systemctl start bumblebeed allows me to successfully unbind the dGPU from the nvidia drivers again, however I have not found a way to make prime offload start working again, I mean when it exits i just run systemctl stop bumblebeed, and then rebind the card to the nvidia driver as described before, and althuogh nvidia-smi will work again fater doing this, I can't actually seem to run any graphical applications off the card which is kinda weird, but I bet there's an easy fix for it somewhere.

I'm almost there! While looking glass is down I've been using xfreerdp instead of remmina as described in this guide: https://gist.github.com/Misairu-G/616f7b2756c488148b7309addc940b28#update-attention-for-muxless-laptop

xfreerdp /v:192.168.99.2:3389 /w:1600 /h:900 /bpp:32 +clipboard +fonts /gdi:hw /rfx /rfx-mode:video /sound:sys:pulse +menu-anims +window-drag

However I finally hit an actually serious snag, the nvidia gpu does not seem to be working, yet it appears as if it absolutely should be working:

https://imgur.com/YOa4XzH.png

It is actually not, if I try to run unigine it hangs on the loading screen and doesn't actually run, trying to run with opengl instead of directx makes it not even run at all just giving me an error: GLAppWindow::create_context():wglGetProcAddress(): failed

Indicating that despite appearances, it is not actually working.

Perhaps I need to unbind some more devices before launching the VM or something, I know there are 3 more subdevices (01:00.1, 01:00.2 and 01:00.3) which I am not unbinding from the host.

It seemed unnecessary since they're not part of the same iommu group (on boot all these devices + the gpu itself get loaded to iommu group one, but then these 3 devices immediately get removed from said iommu group)

@T-vK
Copy link
Owner

T-vK commented Dec 8, 2020

Regarding looking glass did you set up the IVSHMEM server and socket like I did in the start-vm script before starting the looking glass client? Does the looking glass client have all permissions required to run properly? Have you installed the driver for the shared memory device within the VM?

3D applications have been a major issue for many people including me. But I remeber there was at least one guy who got it to work: jscinoz/optimus-vfio-docs#2 (comment)

Somewhere in this issue it's also mentioned how you can add a fake battery to the VM which fixed some nvidia driver issues within the VM. I can't remember if I ever added that to this project.

What notebook are you using btw? I should add it to the list of compatible notebooks on https://gpu-passthrough.com

I always assumed that iommu groups are kind of hardwired. It's interesting to hear that they are dynamic in a way. But I guess it makes sense, thinking about it there was a thing called ACS override patch that I always wanted to look into. From what I recall it allows you to split up iommu groups.

@mrkvn
Copy link

mrkvn commented Jan 6, 2021

@Rabcor Do you have a working script or guide that I can follow to work with Arch/Manjaro? I have Dell Inspiron 7567, Nvidia 1050 GPU. I just follow guides. I don't do much debugging. So far, nothing has worked for me. I saw you in this gist too. Looks like you've been doing this for a long time now. For now, I just wanted the bare minimum instruction for it to work in order for me to know whether my laptop is able to actually do it. From the gist I linked, do you know what distro and what version the OP used? I just want to try the instructions as is, 'cause I'm not that good in changing the instruction that relates to the distro I'm using. For example, I tried to follow the instruction using Pop OS, which is not really different from the instruction 'cause he's using apt as package manager in the guide. But for example, he installed bumblebee, which I'm not sure is needed in Pop OS 'cause it already has GPU switching and I think Ubuntu has too, that's why I'm confused why bumblebee is needed. And also he's using Qemu v2 but I think it's v5 now so I'm not sure if I should use 2 or 5. Sorry for posting it here. I really need help :) . Of course, first thing I need to know is whether my laptop is supported. I'm pretty much sure it is but I can't know for sure if I'm not able to make it to work, which then might be the result of it "not being supported", so yeah, I'm in this loop of "unsureness". I consider myself a noob with these things, but I want to make it to work, at least until I confirm that it cannot work on my laptop.

BenjiPugh added a commit to BenjiPugh/MobilePassThrough that referenced this issue Jan 19, 2021
@Rabcor
Copy link

Rabcor commented Aug 23, 2021

@Rabcor Do you have a working script or guide that I can follow to work with Arch/Manjaro? I have Dell Inspiron 7567, Nvidia 1050 GPU. I just follow guides. I don't do much debugging. So far, nothing has worked for me. I saw you in this gist too. Looks like you've been doing this for a long time now. For now, I just wanted the bare minimum instruction for it to work in order for me to know whether my laptop is able to actually do it. From the gist I linked, do you know what distro and what version the OP used? I just want to try the instructions as is, 'cause I'm not that good in changing the instruction that relates to the distro I'm using. For example, I tried to follow the instruction using Pop OS, which is not really different from the instruction 'cause he's using apt as package manager in the guide. But for example, he installed bumblebee, which I'm not sure is needed in Pop OS 'cause it already has GPU switching and I think Ubuntu has too, that's why I'm confused why bumblebee is needed. And also he's using Qemu v2 but I think it's v5 now so I'm not sure if I should use 2 or 5. Sorry for posting it here. I really need help :) . Of course, first thing I need to know is whether my laptop is supported. I'm pretty much sure it is but I can't know for sure if I'm not able to make it to work, which then might be the result of it "not being supported", so yeah, I'm in this loop of "unsureness". I consider myself a noob with these things, but I want to make it to work, at least until I confirm that it cannot work on my laptop.

Sorry for the late response, I'm not very active on here, but sadly the answer is no, and if you "don't do much debugging" then this mostly experimental tech probably isn't gonna be very much for you, until you start 'doing much debugging' i spent weeks on and off over the span of months I mean look, I started trying to get this to work in june 2019 and the closest I ever got was "error code 43" in december 2020 and i felt like that was a huge achievement :D (even if it basically just proved that it probably couldn't be done with my laptop, but I hear nvidia has done something to make their windows drivers on recent windows 10 versions more compatible with this so it might be worth trying again, but that's not what I'm here for today)

Just saying, keep your expectations low, like real damn low. And be patient if you wanna try this stuff. I probably wouldn't ever have figured it out without T-vK's script 👍

Anyhow I am doing this again now on a fresh install of manjaro, I decided to properly ditch windows on my main machine and just set up a VM for running things I need but can't run on LInux (like possibly photoshop); for this I only intend to use the iGPU passthrough component of T-vK's Mobile Passthrough and skip over the entire dGPU passthrough part.

Anyhow, here is my adjusted dependency install line from manjaro (which will also work on arch)

sudo pacman -S unzip wget curl vim screen git lshw msr-tools sysfsutils remmina samba docker spice-gtk acpica cdrkit qemu virt-viewer edk2-ovmf
(do we even need remmina tho? I never even used it)

Things u can only get from AUR (I use yay to download from AUR):

yay -S virtio-win crudini uml_utilities uuid looking-glass
Note: Also, these aur packages might get moved to the official repos, be renamed or become outdated at any moment so install at your own risk.

intel specific:
pacman -S intel-gpu-tools
Note: could not find command intel-virtual-output, not sure what it's for or if it matters...

I did not find any suitable packages to replace qemu-utils, qemu-efi or qemu-kvm, only a qemu package, I am pretty sure they are all included in that one package though.

I do not intend to use bumblebee this time around, but if you're looking for ways to do that this is how I succeeded in setting it up last time, I'm sure this method is at least slightly outdated now though, but it might give you some idea of how to do it at least. I wait with bated breath for T-vK to figure out how to make this script compatible with Prime tho 😆

For Vbios Finder:

pacman -S ruby ruby-bundler rubygems p7zip innoextract upx

Could not find: ruby-dev (maybe already included in ruby package?)

Follow this to add the required path for rubygems to your PATH variable: https://wiki.archlinux.org/title/ruby#Setup

For Kernel Params:

I do not feel like it is a very good idea to allow a script (especially one pulled from git and meant for another distro) to edit kernel params, solely because IF it fails for whatever reason, then it might break your system and you wouldn't be able to fix it again if you didn't know what the script was doing in the first place.

So I think manually editing kernel params is always gonna be the way to go.

These are the recommended kernel params from the script:

iommu=1 amd_iommu=on intel_iommu=on i915.enable_gvt=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci
Note: 1 and on are not documented options for iommu and amd_iommu, I am as confused as T-vK about this...

Other kernel params not added in the script but recommended from this guide

nogpumanager acpi_osi=! acpi_osi=Linux acpi_osi=\"Windows 2015\" pcie_port_pm=off
(acpi_osi=\"Windows 2009\" if 2015 is disabling your trackpad)

also:

default_hugepagesz=1G hugepagesz=1G hugepages=8 transparent_hugepage=never

Warning: I had some issues with these parameters before, namely that 8x1G size non-transparent hugepages will eat up 8gb of RAM just by existing, a workaround for this is to disable transparent hugepages but disable hugepages (hugepages=0 instead of 8) and then enable hugepages via command before launching your VM like so:
sudo sysctl -w vm.nr_hugepages=8

and to disable again after turning off the vm:
sudo sysctl -w vm.nr_hugepages=0

Enabling non-transparent hugepages should help performance in your VM considerably, but it is essentially permanently allocating a set amount of RAM to VM use. (not sure if this info is outdated or not)

I will be using the scheme I suggested with hugepages (in the sense that I will use 1gb size hugepages and turn them off and on on-demand whenever I turn on my VM.

Every user should mix and match these kernel params as they seem fit it's part of the process, for instance I'm pretty sure I don't need amd_iommut on my intel laptop, but the acpi_osi params are probably likely to have some effect for me (possibly in ways entirely unrelated to the passthrough thing too)

I am not sure if I need nogpumanager and pcie_port_pm=off, but nogpumanager tells the system not to edit xorg config files for gpus I think (at least according to nvidia), and pcie_port_pm=off seems to be necessary for some of the more recent kernels for bumblebee to work, or at least it was at some point, I think it disables some power management thing.

Now all the scripts in the utils directory except ones reliant on bumblebee should be working and I am currently testing that.


compatibility-check.sh -> All green
ovmf-vbios-patch-setup -> Needs adjustments

  • service docker start needs to be changed to systemctl start docker
  • docker and vpns seem to conflict, if you try to start docker with vpn service enabled, docker will not run.
  • I am skipping this step until I actually need it to try dgpu passthrough again.

build-fake-battery-ssdt -> Successful, 0 issues.
vbios-finder-installer -> Worked (after editing out the apt-get line) but with warnings (Warnings are ok tho, errors are problems)

  • warning: Pathname#untaint is deprecated and will be removed in Ruby 3.2. May be a problem with script.

generate-vm-config.sh -> Succesful

  • No dgpu or igpu rom present yet tho
  • OVMF_VARS.fd location: /usr/share/edk2-ovmf/x64/OVMF_VARS.fd (assuming edk2-ovmf is installed and that it works)
  • virtio-win.iso location: /var/lib/libvirt/images/virtio-win.iso

generate-helper-iso.sh -> Succesful

  • Warning: creating filesystem that does not conform to ISO-9660.
  • had to install unzip, it was an unlisted dependency (already added into above dependency install command tho)

vbiosfinder -> Untested

  • I don't need it but it should work

ovmf-with-vbios-patch -> Untested

start-vm.sh -> Success (after some more hoops)

  • Initial attempt failed due to a nasty situation where GVT-G, DMA_BUF and Spice all being activated at once would make the VM crash, disabling any one of them allowed me to run the machine. (discussed in comments below)
  • In order to get the best out of the situation I disabled DMA_BUF, logged into the machine with spicy -h localhost -p 5900 , installed windows.
  • added a firewall rule to allow all traffic from host to guest.
  • enabled remote desktop in advanced windows settings, set a static IP address (192.168.99.2; gateway is 192.168.99.1).
  • shutdown the pc and exit spicy
  • re-enable DMA_BUF and disable Spice, then start VM again
  • used freerdp to access vm xfreerdp /v:192.168.99.2:3389 /w:1600 /h:900 /bpp:32 +clipboard +fonts /gdi:hw /rfx /rfx-mode:video /sound:sys:pulse +menu-anims +window-drag
  • did some tweaks to make freerdp run better

And that is currently the best I can do, at least with just GVT-G. FreeRDP is a lot better than spicy/remmina/remote-viewer, a lot faster, more responsive, the cursor doesn't lag (just everything else).

To describe the quality of freerdp it feels like 30fps with an insane amount of screen tearing. It is narrowly just good enough to watch youtube videos (but the experience will not be as great as doing it on your native machine), you would also probably be able to play Visual Novels and RPG maker games, but any games that have any sort of panning, like say RTS games, while they might be playable with this setup, everytime you move the camera you will have shitloads of tearing (I actually tried with Factorio, it was playable if you could live with the tearing, but still felt like 30fps even if on the VM it was probably running faster, rdp is the problem). First person games are definitely a no-go (I tried with krunker.io, I also tried system shock 2 but it complained about my hardware and wouldn't run)

I do not know how to further improve the experience without an external monitor & looking glass. On the surface using freerdp it feels about like you'd expect a VM to feel, but at it's core it's much faster than that and you can tell it is just by how fast it is when you, say try to open a program or something (or actually successfully play a game on it even)

I tried using in-home streaming but the default screen for qemu is an 800x600 screen with an ambiguous refresh rate and that's what you get on in-home streaming, if I were to be able to hack that to 1920x1080@60hz then launch notepad it might actually be possible to get a good experience but I don't know if that is actually possible.

@T-vK
Copy link
Owner

T-vK commented Aug 23, 2021

I'm currently working torwards making this project distribution-agnostic. You might wanna check out the unattended-win-install branch.
Dependecies aren't listed as package names anymore in that branch (see requirements.sh), but instead as binary names and files, which are then used to automatically scan the available package manager for the right packages to install.
But as I said, it's a work-in-progress.
So far I have only added support for dnf and apt (although apt support hasn't even been tested yet). But adding more package managers should be relatively straight forward once I'm done.

@Rabcor
Copy link

Rabcor commented Aug 23, 2021

@T-vK that is cool, can't wait to see it.

Also I updated my other post now, I reached the end and tried running a vm but it failed and I don't know why, I'd appreciate your input if you have any, I tested building qemu from git but I got the same result. I also tried disabling my vpn on the off chance it's an issue like the one with docker but it had no effect.

Update: Disabling spice worked, found an official related bug report

I still get the same errors except for:

qemu-system-x86_64: The console requires display DMABUF support.
DNSMASQ terminated

but qemu at least does not stop running.

which means I may be in need of an alternative to spice, or maybe a new version of spice, or a different version of qemu, what a mess.

@T-vK
Copy link
Owner

T-vK commented Aug 24, 2021

Have you tried disabling dmabuf and just using qxl?

@Rabcor
Copy link

Rabcor commented Aug 24, 2021

Have you tried disabling dmabuf and just using qxl?

I did, and it runs. I can now get in with spice, but incidentally I just figured I might not necessarily need spice but I need RDP and RDP is not working (e.g. the xfreerdp command mentioned earlier in this thread); at least if I recall correctly RDP is the only way to get decent performance without using looking glass, which I can't use cuz I don't have an external display and unlike last time around it's not a firewall issue I'm pretty sure it's a bug tho.

Btw I found a related wiki entry for it now that I knew what to look for, so thanks :) (Using the suggested romfile did not solve the issue of launching with dma-buf tho, did not try other solutions)

Edit: I got it all working now, I don't really know why freerdp wasn't working earlier I had it all set up the same but this time after I installed an older version of spice to try it (according to that bug report I linked to) it worked, then I installed the new version again, it still worked.

I also got it working with DMA Buf again, I just disabled spice, not really sure what I would need spice for anyways if I have freerdp working.

Been a weird little puzzle but now everything works just about as I wanted, except, of course, the damn rdp performance (sure is a huge step up from spicy/remmina/remote-viewer tho)

@T-vK
Copy link
Owner

T-vK commented Aug 25, 2021

@Rabcor Out of curiosity, you said that service docker start needs to be changed to systemctl start docker. Is that true? I'm just asking because many distributions allow both (if the system uses systemd, it will internally translate that to a systemctl call). It seemed like the most portable way of starting a service, but if arch doesn't support it, I need to add my own abstraction layer.

@Rabcor
Copy link

Rabcor commented Aug 26, 2021

@T-vK yes, at least on manjaro 'service' just returns 'command not found'.

Also while we're at it, I did a little adjustment to the script, I don't think the hugepage thing I was talking about is such a big deal that it needs to be implemented (potential 2% performance increase)

But in the current script, the CPU core count command is missing data, when I booted into the machine which I told the script to use 8 cores for, in the VM it showed as 2 cores, 2 sockets, 2 cores, so it was like I had a server board with 2 unicore CPUs, it was weird, to solve this turned out to be really simple, just add this line in:
CPU_CORES=$(($CPU_CORE_COUNT / 2)) #Socket must always be 1, Threads must always be 2 since CPU_CORES * threads = CPU_CORE_COUNT.

and amend the cpu command for qemu like so:
-smp ${CPU_CORE_COUNT},sockets=1,cores=${CPU_CORES},threads=2 \

(As for the reason socket must always be 1, technically it doesn't have to be, but if I tried playing with that setting at all my VM would just crash on launch and complain about my number of sockets (even if it was just 2 which was proven to work if I only put 1 core in there; and I can't actually think of any reason why we would actually want the vm to think we have more sockets anyways in the first place)

With this addition the VM now gives the correct core count. (e.g. type in 8 in core count setting and it will give you 4 cores and 8 threads; I suppose it might also be possible to skip the threads setting entirely and just set the cores directly to the core count instead, but I haven't tested it and the hyperthreading style method is what I have seen recommended elsewhere which is why that's what I went with)

I took a little bit of a break from this stuff (kinda had a bit of a burnout after working to set all this up, going through the troubleshooting and only to get results that weren't good enough for me in the end, so I needed one) but my next step's gonna be to look into ways to get higher performance out of spice, or maybe try something like vnc, or sdl or gtk display implementations, and maybe see if I can figure out a way to get looking glass to work for me.

And after that (regardless of results) it'll be trying to see if I can get dgpu passthrough working.

I also need to actually finish setting up my host machine lol, I mean I'm mostly on stock manjaro still, haven't played around with pipewire yet like I intend to, haven't set up a system-wide equalizer, haven't even installed wine or lutris. (Then again this is a sign that the defaults on manjaro are pretty good, although I did need to install kinit because certain functions in manjaro kde were broken out of the box :/ )

@T-vK
Copy link
Owner

T-vK commented Aug 26, 2021

Thank you, I'll apply the suggested change CPU core change and probably also the huge pages. Improving performance was on my list anyway, I just didn't look into it too much yet because are still so many other things to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants