Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG, SIMD: Fix detecting AVX512 features on Darwin #19365

Merged
merged 1 commit into from Jun 27, 2021

Conversation

charris
Copy link
Member

@charris charris commented Jun 27, 2021

Backport of #19362.

closes #19319

On Darwin, machines with AVX512 support, by default, threads are created with
AVX512 masked off in XCR0 and an AVX-sized savearea is used.
However, AVX512 capabilities are advertised in the commpage and via sysctl.

For more information, check:

  On Darwin, machines with AVX512 support, by default, threads are created with
  AVX512 masked off in XCR0 and an AVX-sized savearea is used.
  However, AVX512 capabilities are advertised in the commpage and via sysctl.

  For more information, check:
   - https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L175-L201
   - golang/go#43089
   - numpy#19319
@charris charris added 00 - Bug 08 - Backport Used to tag backport PRs component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jun 27, 2021
@charris charris added this to the 1.21.1 release milestone Jun 27, 2021
@charris charris merged commit 4f6a15c into numpy:maintenance/1.21.x Jun 27, 2021
@charris charris deleted the backport-19362 branch June 27, 2021 23:31
@vsivsi
Copy link

vsivsi commented Nov 3, 2021

@charris FYI, Darwin kernel bug clobbers AVX512 opmask state on external signal.

golang/go#49233

@charris
Copy link
Member Author

charris commented Nov 3, 2021

Darwin kernel bug clobbers AVX512 opmask state on external signal.

@Developer-Ecosystem-Engineering Thoughts?

@charris
Copy link
Member Author

charris commented Nov 3, 2021

And also @seiko2plus Ping.

@seiko2plus
Copy link
Member

seiko2plus commented Nov 3, 2021

we have many kernels counts on mask operations. actually, I don't think it's possible to use AVX512 without involving k0-7 registers. I don't have access to macOS with SKX at the current moment to reproduce the bug and since this patch, no one has reported such a bug. We should wait till Apple verifies it, and meanwhile, affected users can disable on runtime AVX512/SKX features within the environment variable NPY_DISABLE_CPU_FEATURES. i.e. export NPY_DISABLE_CPU_FEATURES="AVX512F AVX512CD AVX512_SKX".

@vsivsi
Copy link

vsivsi commented Nov 3, 2021

@seiko2plus This opmask corruption issue only occurs in processes that register signal handlers with the kernel; and then, only when signals are received. It is acutely felt by golang because signals are widely used by its runtime for profiling and goroutine (green-thread) preemption. Even with that, in real code the signal needs to be received when there's a live non-zero opmask dependency in flight. So it's a sneaky one that took me forever to isolate from what started as just a small handful of unexplained unit-test failures.

But it's definitely real, as this simple Gcc reproduction clearly demonstrates: https://gist.github.com/vsivsi/8511aca1bac528f49fbb45a636afa4b5

@Developer-Ecosystem-Engineering
Copy link
Contributor

Darwin kernel bug clobbers AVX512 opmask state on external signal.

@Developer-Ecosystem-Engineering Thoughts?

We see the report. Reading it and golang/go#49233 (have not tested/confirmed), sounds as though its been around for awhile. Was there something new/recent thats made its impact larger?

@vsivsi
Copy link

vsivsi commented Nov 3, 2021

One factor is the range of Mac hardware affected:

  • iMac Pro (late 2017)
  • Mac Pro (late 2019)
  • 13" MacBook Pro (4 Port, Ice Lake) (late 2020)

I don't think the iMac Pro was a big seller. The Mac Pro is an expensive workstation. But I bet Apple sold a ton of those 13" Macbook Pros, because they are arguably the best Intel based laptop Apple will ever sell.

@charris
Copy link
Member Author

charris commented Nov 3, 2021

Was there something new/recent thats made its impact larger?

Probably the tests. If it is a rare occurrence that occasionally produces wrong numbers, who would notice?

@vsivsi
Copy link

vsivsi commented Nov 4, 2021

@charris That is correct. I discovered this issue when a unit test of a deterministic AVX-512 function randomly failed with no explanation. Rerunning it succeeded, but nothing had changed. So I added a large number of iterations over the test and got it to fail a handful of times out of many thousands of runs. In this code, the failure manifested as a truncated result relative to the AVX2 and pure golang codepaths. But in other cases it could easily corrupt values or cause a buffer overrun or leak private data in the output. The opmasks are so critical to the way AVX-512 works, there's really no end to the number and types of insidiously rare failures this could cause.

@seiko2plus
Copy link
Member

@vsivsi, thank you for the clarification & reporting, it seems we have no other option except erase AVX512 features from the default dispatched features on macOS.

@Developer-Ecosystem-Engineering,

Was there something new/recent thats made its impact larger?

I can name several projects that don't correctly detect AVX512 features on macOS, most of them just count on
xgetbv/xcr0 to detect the os support.

@vsivsi
Copy link

vsivsi commented Nov 9, 2021

@seiko2plus FYI, the kernel bug in Darwin has been spotted, clarifying the scope of this issue considerably. See here for details: golang/go#49233 (comment)

@vsivsi
Copy link

vsivsi commented Feb 17, 2022

FYI, it appears Apple fixed this issue in the kernel update included with the latest MacOS 12.2 Monterey release. It remains uncorrected in Big Sur and Catalina.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 08 - Backport Used to tag backport PRs component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants