-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pre-C99 emulated float functions and require hardware FMA support on Windows #12519
base: trunk
Are you sure you want to change the base?
Conversation
Nothinglooks wrong to me but I am not an expert so this should be reviewed For my part, the only remark I have is about the Makefile change done in the |
As a sidenote: in
I tend to rewrite thoselike this:
because this looks cleaner and is more concise. HOwever, the price topay for If it was only me I woul'nt mind paying the price of a fork for |
Not a proper review at this time, just some questions and remarks based on a quick reading.
|
Thanks for the feedback! Answering each:
|
The emulated version of `caml_fma` is retired - OCaml either assumes hardware support or an ISO C99-compliant software implementation (e.g. glibc).
caml_log1p wasn't guarded with CAML_INTERNALS in misc.h but this was almost certainly just an oversight (technically this is an API-breaking change, therefore).
Now that the C99 functions are required, use them directly instead of via one-line intermediaries. caml_fma remains, to allow for adding error trapping on Windows.
fma returns nan for pre-Haswell or pre-Piledriver CPUs.
Intel has been releasing new CPUs without AVX2 or AVX as late as 2021, under the Pentium Silver/Gold and Celeron brands. So it isn't just 2012 hardware, even if these are budget and/or low-power CPUs. |
@devvydeebug - those chips presumably support the FMA instruction, though? Is there any easy way to get a list of them? The reference to "quite old CPUs here" specifically means CPUs without hardware FMA support, rather than CPUs without AVX2 (hence the rest of the sentence "affecting just |
I don't think so. These CPUs can't support the full FMA3 set because they lack AVX(1) as well, so they can't handle ymm operands. Maybe they support the xmm subset of it, but I doubt it considering there isn't a separate feature flag for it like there is for AVX-512 fma. I guess I could actually test executing vfmadd132sd on one, just to be sure. I think I could borrow a PC with such a CPU for a few minutes.
For some reason CPU manufacturers insist on not providing actual databases, just clunky web listings with too much javascript yet very bad search forms. But checking https://ark.intel.com I see that most Intel chips released between 2017 and 2021 under the Pentium Silver/Gold and Celeron brands lack AVX, with some exceptions. (FMA is not listed on its own.) I'm not necessarily saying this shouldn't go ahead, just wanted to make it clear it affects recent low budget hardware as well. Maybe attempting a vfmadd132sd during initialization and falling back to C99's fma if it fails is worth considering. |
I can confirm that a Pentium Gold G6400 (launched Q2 2020) can't execute vfmadd132sd. I just assembled a |
Thanks, @devvydeebug - that's very useful info. There being newer chips without |
This is a proposal for #12513 for
trunk
. There are two intertwined facets to this PR, which mostly deletes (or moves) code.Require hardware FMA on Windows
Cygwin has an intentionally "broken" implementation of
fma
(it is simplyreturn x * y + z
!). mingw-w64 is known to be broken as, unsurprisingly, is Visual Studio. Since Visual Studio 2019, where hardware support is available, hardware support is always used when available (in previous versions, the optimiser was capable of emitting FMA instructions, but never for the actual C99 function itself).The hardware support in Visual Studio interacts badly with VirtualBox bug #15471, which for reasons known only to Oracle, masks out the FMA bit from the cpuid. However, it is only the masking which is incorrect - code using FMA instructions correctly executes. In terms of cpuid, it is just about possible to conflate AVX2 (which is reported in VirtualBox) with FMA, because they were released in the same architecture (indeed, the
cl
flag for forcing FMA is-arch:AVX2
), but while no Intel or AMD chips exist which support AVX2 but not FMA, there are VIA chips which have AVX2 but not FMA.However, we're talking about quite old CPUs here (pre-2013), affecting just
Float.fma
. Certified Windows hardware now requires Broadwell (2015) CPUs for both Windows 10 and Windows 11 (NB that's certification - Windows will physically run on anything from 2008's Nehalem). This PR therefore proposes a big simplification for Windows, by always synthesisingcaml_fma
usingvfmadd132sd
. Since thecl
option to force this also enables AVX2, it's necessary to putcaml_fma
in a C file of its own, or the entire runtime becomes unusable on pre-Haswell CPUs.There then remains the question of what happens if you try to use
Float.fma
on Windows on an old CPU. @kit-ty-kate has been using an old laptop for Windows testing, and I took a trip to my loft and fished out an old 2012 laptop with an Intel i7-2760QM (AVX, no AVX2 or FMA, therefore) to have a look. IfFloat.fma
is called, the program will by default abort withEXCEPTION_ILLEGAL_INSTRUCTION
(0xc000001d). Cygwin should translate that to a more Unix-ly familiarSIGILL
.The VirtualBox bug means there's not much value in us doing anything with the cpuid (@jonahbeckford suggests this is one of the most common bug reports installing Windows OCaml for Diskuv).
I went down a little rabbit-hole to see if we could do something better in that situation. For the MSVC port, it's pretty straightforward to catch asynchronous exceptions via Structured Exception Handling (SEH) (using cl's
__try
and__except
extensions), although entertainingly there seems to be a mis-compilation bug in Visual Studio 2019 (the code compiles correctly in Visual Studio 2022).GCC doesn't implement the Microsoft extensions for SEH, but it's possible to use Vectored Exception Handling (VEH) instead. It's not pretty, though - it's a very similar trick to stack overflow detection in Windows in OCaml 4.x.
Since
Float.fma
is declarednoalloc
, we can't raise an exception so instead the function returnsFloat.nan
on all inputs when the support is missing.I've left the two commits in place for now, but I'm not convinced - especially as it's tricky to test. A possibility might be to include a helper function (similar to
Unix.has_symlink
) which could do the test much less invasively (that would also be much easier to test, because such a function could be easily tested with__ud2()
so that the logic could be verified in CI).Emulated C99 float operations; retiring
--enable-imprecise-c99-float-ops
#944 added various C99 float operations. To build with old (but supported) versions of Visual Studio,
configure
has a--enable-imprecise-c99-float-ops
option which turns on various emulations (this emulation is automatically enabled for Windows, but can be manually enabled on any platform). When the MSVC port of OCaml returns to trunk, it will necessarily require at least Visual Studio 2022, which has all of these functions. This PR therefore removes the emulations.The mingw-w64 port has a long-known bug in its implementation of
round
. The work-around for this remains, but the detection of it is only used for mingw-w64 (pedantically, configuring with--disable-imprecise-c99-float-ops
continues to cause a configuration error for mingw-w64, therefore). The enormous emulated version ofcaml_fma
is gone, though.The unboxed versions are now truly unboxed, which results in a visible renaming of the primitives - i.e.
caml_cbrt
is removed andFloat
now directly callscbrt
.#8684 added
caml_log1p
tocaml/misc.h
outside ofCAML_INTERNALS
. This function no longer exists, so is removed - technically, that's a breaking API change, but I think that the addition ofcaml_log1p
without theCAML_INTERNALS
guard was a minor oversight in a much more complicated change!