Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Windows 10 1607-1703 subpar d3d9 performance #164

Open
mirh opened this issue Nov 22, 2022 · 4 comments
Open

Investigate Windows 10 1607-1703 subpar d3d9 performance #164

mirh opened this issue Nov 22, 2022 · 4 comments

Comments

@mirh
Copy link

mirh commented Nov 22, 2022

Follows #111.
It turns out that newer Windows did far more than just gimping VP.
I couldn't get 1607 to experience more than a ~15% handicap (and boy wasn't it hard to find the right scenes), but with 1703 you'd have to be blind not to see a difference any time you are CPU limited.

I tested (in a 640x480 window at minimum detail for games, default size and settings for the rest):

  • Mass Effect at my benchmark place (otherwise on 1703 even the title screen already hits you hard)
  • Nvidia's directx samples (both hardware and software vertex processing devices, require VS2003 redist)
  • Old OGRE demos (in particular: crowd, dot3bump, grass, lighting and water)
  • CSGO (current build in the fps_benchmark locker room, or demo_viewer just spawned in the training course)

I understand performance isn't exactly the kind of "qualitative" issue that the project usually addresses, but the effect is consistent and just as annoying if your framerates weren't ludicrous to begin with (in CSGO the performance uplift of multicore rendering is practically nullified, while the worst case microbenchmark scenario that I could scavenge was not even ONE THIRD of the W7/1511 speed).
There are of course other caveats, but none that should lessen the main point.

Yes, I did all of this testing inside VMWare workstation, but while the virtual SVGA device isn't exactly comparable to native, the provided d3d9 driver should be pretty legit to compare with itself (unlike normal gpus I believe it even has the same codepaths for every Windows).
Secondly this was done on my i7-6500U+950M old laptop, which isn't exactly a workhorse. But either through downclocking or core limiting (which should come especially easy if you use a VM), I see no reason anybody couldn't get down to the same level if it turned out it was actually required for reproducing the problem.
Last but not least, I also took care of excluding any possible spectre and meltdown consideration (<2018 Windows should know nothing about them, and my host has mitigations disabled anyway).

@mirh
Copy link
Author

mirh commented Apr 28, 2023

Ok, so... a few other oddities.
First of all, VMW 17 sucks and I couldn't even start some of the samples without the whole mksSandbox crashing down.

Secondly, at least when testing my favourite HDR_FP16x2.. I figured even W7 seemed to be somewhat suboptimal (220fps, vs 1511 doing 330fps and 1703 barely touching 170fps).
That also correlated with pretty different relationships wrt cpu usage. On W7 I got these numbers while hitting 25% utilization (i.e. exactly one full thread). With 1511, I was barely registering cpu activity at all. While on 1703 it was averaging 40%.

Last but not least, I tried to swap 1703's d3d9.dll with the 1511 one (and in syswow64, for good measure and even because some applications are very finicky). Performance didn't seem to sway the slightest (at least in this one aforementioned sample).

@mirh
Copy link
Author

mirh commented May 14, 2023

Ok never mind, mystery unravelled for W7 falling behind the highest expectations
The vm3dmp/vm3dum driver is the same across all windows versions (8.17.3.5, at least on VMW 16)
But it is enabling/supporting/using different capabilities depending on that (if not any, you can notice it uses WDDM 1.0 there, as opposed to 1.1 anywhere else). But even just vanilla W8 was enough to reproduce the 1511 numbers (if not even a pinch better).

FWIW I also tested this natively on my 9600k+2080S desktop (with the legacy 473 branch versus 531, but still) and I could report 900fps in W7 vs 550-600 in W10 22H2. Funnily enough, not even dxvk (640) could compete.

@Trass3r
Copy link
Contributor

Trass3r commented Jul 11, 2023

You could try to record an ETW trace if it's really a problem on the latest version of the OS.
https://github.com/google/UIforETW/releases

@mirh
Copy link
Author

mirh commented Jul 21, 2023

Ok so, uh.. There's a lot to unpack here.
First of all, I have spent some time in these two months, to finally resurrect the damn old nvidia sample (binaries included).

I couldn't find (or at least I couldn't be bothered to have play nice) the exact same original build environment of the day, but the newer one works just as good. In fact.. it seems even too good? Pick up the fps numbers of the last post (from whatever ~2004 dx sdk and VS .NET 2003 exe they give you), and now multiply them by 2.5.
On my desktop you have 1550 for new W10, almost 1800 for dxvk and 2100 for W7. And then on linux (both with wined3d and dxvk) you can make about the later results.

But I also gave a run to the old build while I was there, and wtf? Linux can pull off 1700fps even with that.
And yes, optimization it might be argued, but then I also tried your absolutely delightful tool in Windows.
And whereas the new build seems to take the majority of cpu time inside of nvd3dum.dll (I couldn't really spot anything else really, can you give it a go too?), the old one has like half of the cpu usage wasted by d3dx9_29. And not in any whatever "proper" function, but rather in gdi32full.dll.

And given how fairly quirky the virtual 3d device can be, I wonder if that couldn't also be responsible for the biggest ass imbalances that I have measured inside my vm.
Conversely, there's still at least a tangible 20% handicap between the best performing (real!) conditions and W10 that I'm very confident about.
EDIT: come on, W7 with dxvk scores 1000 on the old version and 2500 on the new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants