Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DXVK causes random system halt with input/output error #3933

Open
only-su opened this issue Apr 2, 2024 · 7 comments
Open

DXVK causes random system halt with input/output error #3933

only-su opened this issue Apr 2, 2024 · 7 comments

Comments

@only-su
Copy link

only-su commented Apr 2, 2024

For some reason running games with dxvk cause my system to suddenly lose system disk reference. I have troubleshooted any other possibility but, for the best of my knowledge this is what is happening. I would be glad with any help to how to investigate this further to give more info. I will be as thorough as possible with my system info as this maybe can help anyone understand? I'm not sure, I'm not a specialist and I'm really lost.

So, some time ago my computer started having a strange behavior. It would just start to freeze some apps a little but not completely, games would keep running but a little strange and suddenly it would all freeze but any audio would keep playing on repeat, I would not be able to change to tty. It gave me a hell of a headache, but I eventually thought of leaving a terminal window running open on my second screen and when the problem happened I checked that trying to run any command would give me a "input output error".

So obviously I thought I had a faulty disk. Booted on a live ISO and ran a bunch of tests on my NVME. No problems warned. Repaired filesystem, maybe it was it. The problem kept happening. Okay, something is strange so I bought a new NVME, a really good one, clean installed my system, the problem kept happening. I would be playing a game, a game running on proton. "Oh! maybe it's proton?" I thought. and then stumbled on this post Mysterious crash from Proton games .

I performed what the OP suggested. "PROTON_USE_WINED3D=1". Disable the default DXVK and run through WineD3D. The problem disappeared, I can play games for hours on end and no input/output error to haunt me.

So this is a cry for help. How can a gpu library yeet my disk driver? I'm scared and utterly confused. I am willing to give any more help I can if someone care to explore it or to elucidate what is happening.

Software information

Any game, default steam settings.

System information

Computer Information:
Manufacturer: ASRock
Model: B450M Steel Legend
Form Factor: Desktop
No Touch Input Detected
Processor Information:
CPU Vendor: AuthenticAMD
CPU Brand: AMD Ryzen 5 3600X 6-Core Processor
CPU Family: 0x17
CPU Model: 0x71
CPU Stepping: 0x0
CPU Type: 0x0
Speed: 3902 MHz
12 logical processors
6 physical processors
Hyper-threading: Supported
FCMOV: Supported
SSE2: Supported
SSE3: Supported
SSSE3: Supported
SSE4a: Supported
SSE41: Supported
SSE42: Supported
AES: Supported
AVX: Supported
AVX2: Supported
AVX512F: Unsupported
AVX512PF: Unsupported
AVX512ER: Unsupported
AVX512CD: Unsupported
AVX512VNNI: Unsupported
SHA: Supported
CMPXCHG16B: Supported
LAHF/SAHF: Supported
PrefetchW: Unsupported
Operating System Version:
"BigLinux" (64 bit)
Kernel Name: Linux
Kernel Version: 6.1.80-1-MANJARO
X Server Vendor: The X.Org Foundation
X Server Release: 12101011
X Window Manager: KWin
Steam Runtime Version: steam-runtime_0.20240304.79797
Video Card:
Driver: AMD AMD Radeon RX 580 Series (radeonsi, polaris10, LLVM 16.0.6, DRM 3.49, 6.1.80-1-MANJARO)
Driver Version: 4.6 (Compatibility Profile) Mesa 24.0.2-manjaro1.1
OpenGL Version: 4.6
Desktop Color Depth: 24 bits per pixel
Monitor Refresh Rate: 60 Hz
VendorID: 0x1002
DeviceID: 0x67df
Revision Not Detected
Number of Monitors: 2
Number of Logical Video Cards: 1
Primary Display Resolution: 1920 x 1080
Desktop Resolution: 3360 x 1169
Primary Display Size: 18.78" x 10.55" (21.54" diag), 47.7cm x 26.8cm (54.7cm diag)
Primary VRAM: 8192 MB
Sound card:
Audio device: ATI R6xx HDMI
Memory:
RAM: 32023 Mb
VR Hardware:
VR Headset: None detected
Miscellaneous:
UI Language: English
LANG: pt_BR.UTF-8
Total Hard Disk Space Available: 953868 MB
Largest Free Hard Disk Block: 398638 MB
Storage:
Number of SSDs: 2
SSD sizes: 1000G,500G
Number of HDDs: 0
Number of removable drives: 0

  • Driver: amdgpu
  • Wine version: wine-8.21 (Staging)
  • DXVK version: 1b31aa5 dxvk (v2.3-21-g1b31aa5d)

Apitrace file(s)

  • If this is deemed necessary for anyone that comes here to try to understand the problem (like if it isn't already known) I can do it but I'm kinda scared to keep fidgeting with this error and it messes my disk. I dunno so I will wait.

Log files

  • Same as before
@WinterSnowfall
Copy link
Contributor

WinterSnowfall commented Apr 2, 2024

I hope it's clear that dxvk can't crash your computer - what it can do is use your hardware resources more efficiently and that could in turn expose problems that otherwise stay dormant.

To be honest your issue sounds like something I'm glad I'm not stuck with, because the root cause could be a large number of things and it's probably going to take a lot of patience and time to narrow it down. Here are some potential causes I can think of:

  • PCIe bus issues - panicking GPUs can sometimes cause PCIe bus resets. Those may reset your NVMe drive as well and cause havok with other PCIe attached things, including the PCH, soundcard etc.
  • C-states / powersaving madness - either your CPU or one of your PCIe devices doesn't like the C-states or powersaving options your drivers or motherboard are using. I'd recommend disabling all power saving options in the BIOS to see if that helps.
  • Unstable overclock - tune it down a little if you are overcloking.
  • Underspecced power supply - try running GPU + CPU benchmarks on your system to see how it behaves in a high-power draw scenario. It could just be your PSU can't supply the correct voltage at a high load and your MB panics and partially locks up.
  • Kernel bug - you are blessed with hardware (or a hardware combination) that exposes a kernel bug. Look at dmesg and see if there's anything relevant before or during crashes. Oh, and pray you are not the first to have this bug, that someone has already gone through hell and there's a nice forum post somewhere about what kernel boot options to use to circumvent the problem (if you're lucky).
  • Hardware fault - bit of a bummer, but it's not unheard of. Should be investigated as any potential hardware fault once all other options have been ruled out.

In any case, I doubt this is a (user space) software problem, the symptoms you've mentioned are a bit too harsh for that.

@Blisto91
Copy link
Contributor

Blisto91 commented Apr 2, 2024

We'd probably at minimum need a dmesg or journal log to be able to get an idea.

@only-su
Copy link
Author

only-su commented Apr 2, 2024

Thanks you all for taking your time to exploring my problem.
As of Mysterious crash from Proton games I'm not the only one with this exact problem. I will try benchmarking to see if I can make it emerge in another situation as suggested.
PCIe bus issues - If this is the case, how do I make sure? Is there a good channel for me to post this information and make the problem avaliable to interested people?
C-states / powersaving madness - I had already had problems with it in the past. All options are already disabled as of now. Maybe it is related.
Unstable overclock - I do not overclock.
Kernel bug - If it is the case, how do I communicate it? Where should I post for making this problem avaliable to interested people? What should I look for on dmesg?
Hardware fault - It was my first thought, that made me change the NVME. I ran a bunch of tests on this and related possibilities and all came clean. But maybe it is mobo or pci related as you pointed out.

Dmesg or journal log - How does one attain it? What is relevant to you to see? How does I provide it to you? I already tried to take a peek on dmesg but nothing made sense to me. The messages just before the crash are no different in my eyes from the messages that were before. I think, as the disk is not accessible anymore, even if it was the case of a loggable error no log would be written anyway as the computer can't access the log file anymore.

Thank you all that commented for your time.

@turol
Copy link

turol commented Apr 2, 2024

For dmesg, ssh in from another computer and run dmesg -w in that shell. Also works for journalctl -f. Assuming the network interface doesn't disapper immediately when the error occurs you can then save the messages on the other computer.

@WinterSnowfall
Copy link
Contributor

WinterSnowfall commented Apr 2, 2024

PCIe bus issues - If this is the case, how do I make sure? Is there a good channel for me to post this information and make the problem available to interested people?

I don't think there's a specific channel for these sort of things, but any hardware enthusiasts' forum is potentially a good place to start.

Kernel bug - If it is the case, how do I communicate it? Where should I post for making this problem available to interested people?

You should probably rule out other root causes first, before you start bothering any of the kernel folk. A lot of this is, unfortunately, scouring the internet for users with similar hardware configurations and seeing if they've had issues. In short, long and arduous detective work.

What should I look for on dmesg?

Anything that might point to a cause, as dmesg will capture kernel module responses to a whole range of situations, including hardware faults.

I think, as the disk is not accessible anymore, even if it was the case of a loggable error no log would be written anyway as the computer can't access the log file anymore.

Don't quote me on this, but I think dmesg will write to memory first, so assuming your system/CPU is still alive, it should still dish out information. That being said there are other things that can make it choke. If dmesg doesn't register anything, that's quite the pickle. Perhaps run it in a separate terminal before the crash happens, so you can still see the output once it does?

ssh in from another computer and run dmesg -w in that shell

Good advice in theory, but ssh logins won't work if it can't read auth keys or passwords from your drive. Or if you meant doing that before the crash, then sure, but I doubt it will continue logging anything.

@Blisto91
Copy link
Contributor

Blisto91 commented Apr 2, 2024

You can show dmesg from previous boot with journalctl so no need to ssh in while it is happening.
journalctl -k -b -1
To output it in to a file you can
journalctl -k -b -1 > dmesg.txt (or a path to the file if you want it a specific location)

Edit: Though if it can't actually save this to disk after the crash happens doing it the above way might not yield anything useful

@only-su
Copy link
Author

only-su commented Apr 2, 2024

More info on the hardware theory: I checked the post that saved me (Mysterious crash from Proton games) and noticed that the OP had their inxi info, I checked and we do not share any hardware in CPU, MOBO or GPU. We both have a WD NVME, but, I had the problem before I changed NVME and it was a xraydisk. Their filesystem is ext4 and mine BTRFS.

As for now our only similarity is having the system running in a NVME. And running a Manjaro based distro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants