Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameter 'x-pci-stub-device-id' expects an int64 value or range #2

Open
SamKG opened this issue Mar 12, 2019 · 13 comments
Open

parameter 'x-pci-stub-device-id' expects an int64 value or range #2

SamKG opened this issue Mar 12, 2019 · 13 comments

Comments

@SamKG
Copy link

SamKG commented Mar 12, 2019

After running sudo ./start-vm.sh and exiting once, trying to start the vm again using sudo ./start-vm.sh results in the following error:

qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,x-pci-sub-device-id=0x,x-pci-sub-vendor-id=0x,multifunction=on,romfile=/home/samkg/Documents/MobilePassThrough/vm-files/vbios-roms/vbios.rom: Parameter 'x-pci-sub-device-id' expects an int64 value or range

Doing a sudo reboot seems to fix it, but it is annoying to not be able to start up the vm multiple times in succession.

Is there any known fix for this?

@T-vK
Copy link
Owner

T-vK commented Mar 13, 2019

What is your output of sudo lspci -vvv after you get the error?
Do you have Bumblebee installed and if so, what is the output of sudo optirun echo "Hello"?

@SamKG
Copy link
Author

SamKG commented Mar 13, 2019

Output before error:
lspci_1.txt

After error:
lspci_2.txt

Weird thing about it - the LnkSta Speed is downgraded from 8GT/s to 2.5 GT/s . is this normal?

sudo optirun echo "Hello" prints out Hello as expected

@T-vK
Copy link
Owner

T-vK commented Mar 13, 2019

It says !!! Unknown header type 7f for your Nvidia GPU in both files. Something is wrong there.
I'm not sure if the LnkSta is a problem or if it's normal. Maybe I can check on my device when I have some more time.

Can you show me the output of the following:

GPU_PCI_ADDRESS=01:00.0
GPU_IDS=$(optirun lspci -n -s "${GPU_PCI_ADDRESS}" | grep -oP "\w+:\w+" | tail -1)
GPU_VENDOR_ID=$(echo "${GPU_IDS}" | cut -d ":" -f1)
GPU_DEVICE_ID=$(echo "${GPU_IDS}" | cut -d ":" -f2)
GPU_SS_IDS=$(optirun lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")
GPU_SS_VENDOR_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f1)
GPU_SS_DEVICE_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f2)

echo "GPU_PCI_ADDRESS: ${GPU_PCI_ADDRESS}"
echo "GPU_IDS: $GPU_IDS"
echo "GPU_VENDOR_ID: $GPU_VENDOR_ID"
echo "GPU_DEVICE_ID: $GPU_DEVICE_ID"
echo "GPU_SS_IDS: $GPU_SS_IDS"
echo "GPU_SS_VENDOR_ID: $GPU_SS_VENDOR_ID"
echo "GPU_SS_DEVICE_ID: $GPU_SS_DEVICE_ID"

and also of:

GPU_PCI_ADDRESS=01:00.0

if sudo which optirun &> /dev/null && sudo optirun echo>/dev/null ; then
    USE_BUMBLEBEE=true
    OPTIRUN_PREFIX="optirun "
else
    USE_BUMBLEBEE=false
    OPTIRUN_PREFIX=""
fi

GPU_IDS=$(sudo ${OPTIRUN_PREFIX}lspci -n -s "${GPU_PCI_ADDRESS}" | grep -oP "\w+:\w+" | tail -1)
GPU_VENDOR_ID=$(echo "${GPU_IDS}" | cut -d ":" -f1)
GPU_DEVICE_ID=$(echo "${GPU_IDS}" | cut -d ":" -f2)
GPU_SS_IDS=$(sudo ${OPTIRUN_PREFIX}lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")
GPU_SS_VENDOR_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f1)
GPU_SS_DEVICE_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f2)

echo "GPU_PCI_ADDRESS: ${GPU_PCI_ADDRESS}"
echo "GPU_IDS: $GPU_IDS"
echo "GPU_VENDOR_ID: $GPU_VENDOR_ID"
echo "GPU_DEVICE_ID: $GPU_DEVICE_ID"
echo "GPU_SS_IDS: $GPU_SS_IDS"
echo "GPU_SS_VENDOR_ID: $GPU_SS_VENDOR_ID"
echo "GPU_SS_DEVICE_ID: $GPU_SS_DEVICE_ID"
echo "OPTIRUN_PREFIX: $OPTIRUN_PREFIX"
echo "LSPCI_OUTPUT: $(sudo ${OPTIRUN_PREFIX}lspci -vnn -d ${GPU_IDS})"

@SamKG
Copy link
Author

SamKG commented Mar 14, 2019

out1.txt
out2.txt

@T-vK
Copy link
Owner

T-vK commented Mar 14, 2019

Did you run these before getting the error? Because the output looks perfectly fine. Can you run these after getting the error?

@SamKG
Copy link
Author

SamKG commented Mar 18, 2019

For first script:
out3.txt

For second script:
out4.txt

@T-vK
Copy link
Owner

T-vK commented Mar 19, 2019

The problem is that this line is missing:

Subsystem: Lenovo Device [17aa:39f5]

or at least that is the symptom...

Because of that, the script can't extract the subsystem vendor id and the subsystem device id which are both required in this line.

I am not sure why the Subsystem line is missing. Maybe there are deeper issues with your system? Have you checked dmesg for GPU related errors?

I have only tested the script on a fresh installation of Fedora 29 btw. Maybe you made some changes to the system that my scripts can't compensate for yet.

Edit:

As a dirty workaround you could try to set the subsystem IDs manually by replacing

GPU_SS_IDS=$(optirun lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")

with

GPU_SS_IDS="17aa:39f5"

in this line.

@T-vK
Copy link
Owner

T-vK commented Jul 8, 2019

I have just pushed a major update, adding support for Fedora 30 and some other changes. Maybe you can give it another shot now.

@SamKG
Copy link
Author

SamKG commented Jul 8, 2019

Thanks!
Unfortunately, I no longer have this laptop (ran into some issues), and instead have another one without an iGPU.

I don't think it would be possible for me to test

@midi1996
Copy link

midi1996 commented Oct 7, 2021

Hello,

So I ran into this same issue, this happened also on my Lenovo device (P50), what I saw is that if the card is reset or turned on/off or passed through then released, the subsystem line disappears until you reboot the device. One of the solutions is to check if the subsystem even exist, when running lspci, and then saving those values for that computer in a file (you can try matching those ids with the uuid of the device in case there are some hardware changes in the future).

Edit: turning off/on the card with bbswitch might bring back that value, but not always.

@T-vK
Copy link
Owner

T-vK commented Oct 7, 2021

What GPU does your laptop have?

@midi1996
Copy link

midi1996 commented Oct 8, 2021

A Quadro M2000M. I found that when passing the dGPU and then releasing it makes the subsystem disappear.

GPU_PCI_ADDRESS: 01:00.0
GPU_IDS: 10de:13b0
GPU_VENDOR_ID: 10de
GPU_DEVICE_ID: 13b0
GPU_SS_IDS:
GPU_SS_VENDOR_ID:
GPU_SS_DEVICE_ID:
OPTIRUN_PREFIX:
LSPCI_OUTPUT: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M2000M] [10de:13b0] (rev a2) (prog-if 00 [VGA controller])
Flags: fast devsel, IRQ 16
Memory at d3000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at d4080000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel modules: nvidiafb, nouveau

Here is how it looks like post-passing it and releasing it. (Ubuntu here)

However with my tests with bbswitch, if I turn off and then on the card with bbswitch, I do get the subsystemids back, but I cannot pass it to the vm again somehow (this also breaks HDMI Audio device as it will stay disabled until I reboot or reset the pcie device, which will also result in a loss of subsystemid).

So what kills subsystemid from showing: (from my experience)

  • passing the dGPU and releasing it (after vm is off)
  • resetting the pcie device (/sys/bus/pci/{ID}/reset or remove then rescan)

As I only have this laptop, I do not know if any other laptop have this issue, so far in this thread 2 Lenovo laptops show the same symptoms.

@T-vK
Copy link
Owner

T-vK commented Oct 8, 2021

Okay in that case I'm not sure. If it was an AMD GPU I would have said that it might be the reset bug in which case the vendor-reset project may have helped. But I suppose it doesn't apply to Nvidia GPUs.

This issue is hard for me to debug as I don't have a laptop with a Quadro. But if we can somehow pin-point it further we could jump on the related mailing list and ask the developers themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants