New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG, SIMD: Fix detecting AVX512 features on Darwin #19365
Conversation
On Darwin, machines with AVX512 support, by default, threads are created with AVX512 masked off in XCR0 and an AVX-sized savearea is used. However, AVX512 capabilities are advertised in the commpage and via sysctl. For more information, check: - https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L175-L201 - golang/go#43089 - numpy#19319
@charris FYI, Darwin kernel bug clobbers AVX512 opmask state on external signal. |
@Developer-Ecosystem-Engineering Thoughts? |
And also @seiko2plus Ping. |
we have many kernels counts on mask operations. actually, I don't think it's possible to use AVX512 without involving k0-7 registers. I don't have access to macOS with SKX at the current moment to reproduce the bug and since this patch, no one has reported such a bug. We should wait till Apple verifies it, and meanwhile, affected users can disable on runtime |
@seiko2plus This opmask corruption issue only occurs in processes that register signal handlers with the kernel; and then, only when signals are received. It is acutely felt by golang because signals are widely used by its runtime for profiling and goroutine (green-thread) preemption. Even with that, in real code the signal needs to be received when there's a live non-zero opmask dependency in flight. So it's a sneaky one that took me forever to isolate from what started as just a small handful of unexplained unit-test failures. But it's definitely real, as this simple Gcc reproduction clearly demonstrates: https://gist.github.com/vsivsi/8511aca1bac528f49fbb45a636afa4b5 |
We see the report. Reading it and golang/go#49233 (have not tested/confirmed), sounds as though its been around for awhile. Was there something new/recent thats made its impact larger? |
One factor is the range of Mac hardware affected:
I don't think the iMac Pro was a big seller. The Mac Pro is an expensive workstation. But I bet Apple sold a ton of those 13" Macbook Pros, because they are arguably the best Intel based laptop Apple will ever sell. |
Probably the tests. If it is a rare occurrence that occasionally produces wrong numbers, who would notice? |
@charris That is correct. I discovered this issue when a unit test of a deterministic AVX-512 function randomly failed with no explanation. Rerunning it succeeded, but nothing had changed. So I added a large number of iterations over the test and got it to fail a handful of times out of many thousands of runs. In this code, the failure manifested as a truncated result relative to the AVX2 and pure golang codepaths. But in other cases it could easily corrupt values or cause a buffer overrun or leak private data in the output. The opmasks are so critical to the way AVX-512 works, there's really no end to the number and types of insidiously rare failures this could cause. |
@vsivsi, thank you for the clarification & reporting, it seems we have no other option except erase @Developer-Ecosystem-Engineering,
I can name several projects that don't correctly detect AVX512 features on macOS, most of them just count on |
@seiko2plus FYI, the kernel bug in Darwin has been spotted, clarifying the scope of this issue considerably. See here for details: golang/go#49233 (comment) |
FYI, it appears Apple fixed this issue in the kernel update included with the latest MacOS 12.2 Monterey release. It remains uncorrected in Big Sur and Catalina. |
Backport of #19362.
closes #19319
On Darwin, machines with AVX512 support, by default, threads are created with
AVX512 masked off in XCR0 and an AVX-sized savearea is used.
However, AVX512 capabilities are advertised in the commpage and via sysctl.
For more information, check: