Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLB invalidation bug in Windows 2000 with PAE #4

Open
DotNetTester opened this issue Jun 28, 2022 · 11 comments
Open

TLB invalidation bug in Windows 2000 with PAE #4

DotNetTester opened this issue Jun 28, 2022 · 11 comments

Comments

@DotNetTester
Copy link

DotNetTester commented Jun 28, 2022

Windows 2000 Advanced Server with PAE enabled and nested paging enabled refuses to load anything (goes blank) after completing the log in screen in VirtualBox on my AMD systems. If nested paging is disabled it works OK with PAE. I also have absolutely no problems on my older systems with nested paging and PAE enabled.

I'm thinking this is possibly a TLB invalidation related bug because everything works with nested paging disabled. I have been trying to isolate where this bug is in Windows 2000 with PAE without much luck yet. I suspect that it could be a bug in the PAE kernel or one of the important processes or system files. After seeing this project I was curious about a possible patch. It is a bit of niche case though and only mildly related to Windows 9x.

PAE is the CPU feature that enables 32-bit OSes to use more than 4GB of RAM which is supported by Windows 2000 Advanced Server. PAE is enabled by adding "/pae" to the end of the line in the "boot.ini" file.

@DotNetTester
Copy link
Author

I'm curious about whether the same bug exists anywhere in Windows NT 4.0.

@anx95
Copy link

anx95 commented Jul 1, 2022

I tested Win2k with PAE (8G RAM, 8CPU) on a Ryzen 7 5800X3D under VMWare. The OS boots and works perfectly with no errors. It's possible the issue is related to VirtualBox. I have also tested the OS under a Ryzen 3700x and 1600 using VMWare. Again, no issues.

@DotNetTester
Copy link
Author

DotNetTester commented Jul 2, 2022

I tested Win2k with PAE (8G RAM, 8CPU) on a Ryzen 7 5800X3D under VMWare. The OS boots and works perfectly with no errors. It's possible the issue is related to VirtualBox. I have also tested the OS under a Ryzen 3700x and 1600 using VMWare. Again, no issues.

Is nested paging enabled on both of your virtual machines? Is nested paging supported by VMWare? I don't have much experience with VMWare. :)

@JHRobotics
Copy link
Owner

Hello @DotNetTester,

I done some tests and yes, this is TLB flushing related bug. I'm surprised - I thought that NT family is free from them - but I now see, that at some setup is not.

I'm sorry but I can't locate this bug closely, it is probably somewhere in ntoskrnl.exe - most of BSOD are from KfReleaseSpinLock (hal.dll) but it is probably only called from ntoskrnl.exe - some function created spinlock, changed page mapping and if it tries to free it, it'll access old memory. I tried injected TLB flush to this function and system is a little more stable, but only little bit (BSOD about 1 minute after logon instead of few second). If I have some time, I'll look at it again, but I'm out of luck today :-(

@DotNetTester
Copy link
Author

DotNetTester commented Jul 10, 2022

I've done a bunch of testing and so far.... This bug doesn't appear to affect any configuration of Windows Server 2003 32-bit. Windows Server 2003 32-bit RTM and Service Pack 2 and newer update levels run without issue in VirtualBox with nested paging and PAE on my AMD systems. Windows Server 2003 Service Pack 2 enables PAE by default and it runs great with 8GB of RAM.

Windows 2000 runs great with PAE as long as nested paging is disabled on my AMD systems. Windows 2000 also runs great with nested paging if PAE is disabled on my AMD systems. It's the combination of PAE and nested paging that results in Windows 2000 failing to load in VirtualBox.

@DotNetTester
Copy link
Author

DotNetTester commented Jul 10, 2022

The old documentation says that Windows 2000 and Windows Server 2003 use a different kernel for PAE and non-PAE. I suspect that the PAE kernel has a TLB invalidation bug in it, especially after reading your post. If this is the case and the bug in fact only occurs with PAE enabled, a possible fix could be to create a tool that loads into memory at boot time and uses DLL injection to fix the bug? That would be quite practical since PAE is disabled by default in Windows 2000. You could install the tool, add "/pae" to the boot.ini file and reboot.

I am only posting this suggestion because of all the various update levels, service pack levels and the system file protection, system file signing, etc. which could create a mess with directly patching the kernel for this bug in Windows 2000. Of course, you may find a way to work around it all by patching the kernel "on the fly". :)

It might be wise to double check that it doesn't affect Windows 2000 with PAE disabled in any way once the source of the bug is identified.

@anx95
Copy link

anx95 commented Jul 25, 2022

I decided to launch my vmware vm image inside virtualbox. As far as this machine is concerned, the OS boots normally. Nested paging is active.
image

It could be a conflict/related with virtualbox guest additions.

@DotNetTester
Copy link
Author

DotNetTester commented Jul 26, 2022

My virtual machines freeze even without the VirtualBox Guest Additions.

It's possible that this could be a TLB invalidation bug that is only exposed by VirtualBox and somehow doesn't affect VMWare due to it's design or whatever. I'd imagine that each brand of virtualization software has different code designs "under the hood", possibly vastly different. It's also possible that VMWare includes a (built in) workaround for the TLB invalidation bugs when running old operating systems. Does Windows 98 run without out issue on VMWare with modern AMD CPUs?

It is interesting that it works on VMWare and that could represent clues to the source of problem and it's possible that a fix could be made to VirtualBox. I'm unsure as to whether or not the developers of VirtualBox would be willing to fix a VLB Bug in VirtualBox with Windows 2000, if it really is that. The developers haven't made fixes for Windows 9x either. It would be wiser to fix Windows 2000 directly.

@DotNetTester
Copy link
Author

DotNetTester commented Jul 26, 2022

I updated to VirtualBox 6.1.36 and Windows 2000 Advanced Server with PAE and nested paging enabled behaves even more like Windows 98 with the TLB invalidation bug. It loads successfully now but with frequent crashes while using Windows 2000 and many installers fail to start and display errors.

@DotNetTester DotNetTester changed the title An odd possible TLB invalidation related bug on Windows 2000 TLB invalidation bug in Windows 2000 with PAE Jul 26, 2022
@DotNetTester
Copy link
Author

DotNetTester commented Jul 27, 2022

I found potentially useful code here for ideas regarding patching the Windows 2000 kernel: https://github.com/evgen-b/PatchPAE3

@DotNetTester
Copy link
Author

I tried enabling PAE on Windows 2000 SP4 without any additional patches and it's impacted the same exact way. The TLB invalidation bug impacts a wide range of Windows 2000 Advanced Server installs with PAE enabled on VirtualBox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants