Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

NaN canonicalization #134

Open
sunfishcode opened this issue May 24, 2021 · 39 comments
Open

NaN canonicalization #134

sunfishcode opened this issue May 24, 2021 · 39 comments

Comments

@sunfishcode
Copy link
Member

Some popular source languages don't have the ability to preserve NaN bit patterns. For example, JS has only a single NaN value, and makes no guarantees about preserving specific bit patterns. NaN bit patterns could become an API feature that some implementation source languages support that others don't. Consequently, interface types should consider canonicalizing NaNs in f32 and f64 values at interface boundaries.

@lukewagner
Copy link
Member

Good observation! I agree.

@tlively
Copy link
Member

tlively commented May 25, 2021

I can think of two design principles that would motivate this:

  1. Determinism/predictability: Languages that only deal in single, canonical NaNs should not have to deal with receiving non-canonical NaNs across interface boundaries. (Although what problems could this cause? Could those languages just reat canonical and non-canonical NaNs uniformly?

  2. Declaration of intent: Semantically, floats are not supposed to express arbitrary bits, so interface declarations are simpler and clearer if NaN boxing cannot possibly be part of the interface and if documented interfaces cannot possibly try to promise that the NaN payloads will be preserved.

Are those the principles motivating this suggestion, or am I off base? Regardless, are the underlying design principles documented anywhere?

@sunfishcode
Copy link
Member Author

I'm thinking about this in the context of this part of the interface-types motivation:

As part of this maximum-reuse profile, the language and toolchain that a module author uses should be kept an encapsulated module implementation detail allowing module clients to independently choose their own language and toolchain.

To promote maximal reuse, we should discourage APIs that rely on NaN bits to convey meaningful information, because such APIs wouldn't be accessible from JS or other languages with a single NaN.

@RossTate
Copy link

The reasoning seems to be that interface types should restrict communication between components to information that all languages can retain in their obvious representation without loss. That seems to be adding yet another scope creep to interface types, and one that's fairly fuzzy.

Also, there's nothing that says that an interface-types f64 has to lower to a double in JS—a JS program could specify a non-canonical lowerer that lowers it to a BigInt via its bits. Sure that causes boxing, but so does lowering i32 to Int32 in OCaml; I imagine we aren't planning to limit i32 to 31-bit integers, but such a limitation seems to be advocated by the same reasoning. For OCaml, I would expect the tooling for generating a converter from an interface type to OCaml data will, by default, lower i32 to int, but will also have an option for lowering to Int32 as well.

There also might be a cost to this. I imagine some day we will have bulk operations for lifting and lowering list f64. Fusing a bulk lift with a bulk lower could result in simply a memcpy. But the restriction here would require canonicalization as well.

@sunfishcode
Copy link
Member Author

To be sure, I'm still exploring the space here.

One option would be to say that it's nondeterministic whether NaNs are canonicalized at boundaries. That would let implementations skip the overhead, but still signal the intent of f32 and f64 and allow JS and others to participate with their main number types.

Another option would be to observe that NaN canonicalization is SIMD-optimizable, so we may be able to make it fairly fast. It still wouldn't be free though, especially for small-to-medium arrays.

Does OCaml have a way of representing a value which is unboxed if it's in the 31-bit integer range, and boxed otherwise? Could it use its tagged integer representation for this? If so, it wouldn't have to box in the common case, and it wouldn't have to fail on values that other languages accept.

@lukewagner
Copy link
Member

That seems to be adding yet another scope creep to interface types, and one that's fairly fuzzy.

Regardless of what the answer to this technical question is, as of the vote on the scope yesterday, this sort of question seems squarely within the scope of interface types (and the component model), specifically under promoting cross-language interopability. This isn't the first time this sort of question has popped up and it won't be the last. In all these sorts of questions, there is an inherent tension between expressing every last thing a particular language might want and defining language-agnostic interfaces that are widely implementable.

By way of comparison, I'd be surprised if COM or protobufs makes any guarantees about preserving NaN payloads across interfaces. That would mean that they've implicitly chosen the "non-deterministically canonicalize NaNs" route @sunfishcode suggested is another option above. Given wasm's emphasis on determinism, it makes sense for us to ask if we should choose differently.

I think there's a third motivation in addition to the two @tlively gave: if f64 carries non-canonical NaN payloads, it may force a toolchain (which knows nothing of the specific workloads it is being used to compile) to conservatively decide that it must preserve non-canonical payloads (b/c some other toolchain does, thus parity) incurring extra cost (not just at the boundary, but potentially within the body of computation as well) -- concretely I'm thinking about this harming NaN-boxing engines. If the cost of NaN-canonicalization was significant, then this would be a tough tradeoff, but as @sunfishcode says, it's quite cheap these days, and I expect insignificant in the context of a cross-component call.

@RossTate
Copy link

This isn't the first time this sort of question has popped up and it won't be the last. In all these sorts of questions, there is an inherent tension between expressing every last thing a particular language might want and defining language-agnostic interfaces that are widely implementable.

Yes, these sorts of questions will come up regularly. One way to resolve them is to have everyone fight over an answer, pick a winner, and then apply that solution to everyone. That strategy results in lots of fights and comes out with winners and losers.

Another way to resolve them is to find a way to accommodate everyone's needs. Adapters (even simple ones) and fusion provide a great way to make this possible. For example, a producer of an API for which NaNs are supposed to be insignificant can use an f64.lift_canonicalize_nan operation. This ensures consumers do not depend on insignificant nan information, and it also informs tooling that nan's are not significant and can be optimized appropriately. Similarly, a consumer of such an API can use an f64.lower_canonicalize_nat operation. The fuser, when it matches lift_canonicalize_nan with lower_canonicalize_nan can easily avoid canonicalizing twice.

Meanwhile, an API intended for efficient numeric programs (and which has little interest in JS programs) can still use interface types as a means of efficient shared-nothing data transfer. Their needs are not bottlenecked by others' irrelevant needs.

As a bonus, if the numeric program using the API happens to rely on NaN canonicalization for its own purposes, it can use lower_canonicalize_nan to make the data conform to the program's internal invariants as it is transferred. If it has a specific canonicalization it relies on, it can specify the bit pattern as an argument to lower_canonicalize_nan.

I would rather interface types offer options to people rather than impose choices on people (of course, so long as it also provides sufficient isolation so that others' choice of the available options does not interfere with one's own choice).

@fgmccabe
Copy link
Contributor

This is perfectly reasonable: to have more than one coercion operation. That is one of the fundamental merits of the adapter fusion approach.
There is, of course, a counter argument: the existence of lower_canonicalize_nan is embodying the possibility of additional semantics that is not adequately modeled in the type signature. To maximize interoperability, especially across ownership boundaries, we should have as much a 'universal' interpretation as is possible.

@RossTate
Copy link

I'm not sure I follow the counter argument. The consumer is generally free to do whatever it wants with the data. If you didn't provide lower_canonicalize_nan, it could still canonicalize the NaN itself. By providing a specialized adapter for it, you enable optimizations that the consumer cannot employ on its own: eliminating the canonicalization if the lifter already did so, or using hardware accelerating to perform the canonicalization during the transfer. So, semantically speaking we're not adding anything that wasn't already possible—it's just a performance optimization.

@sunfishcode
Copy link
Member Author

Should it be possible to declare an interface with an f64 argument where the NaN bits are a significant part of the interface contract?

Choices so far include:

  • Yes, so some source languages can't map f64 to their main number types.
  • Yes, some source languages just can't be used to implement some interfaces.
  • No, NaNs are always canonicalized at interfaces.
  • No, it's nondeterministic whether NaNs are canonicalized at interfaces.

@RossTate
Copy link

The choice I suggested was:

  • Yes, but some programs in some languages will need to not use the default mapping for f64 and instead map to some more complete representation (e.g. BigInt) in order to access/provide the full range of functionality permitted/required by the interface.

@RossTate
Copy link

In fact, the (non-default) lowerer for JS could lower to the number type for non-NaNs and lower to BigInt for NaNs.

@fitzgen
Copy link
Contributor

fitzgen commented May 28, 2021

It seems like a reasonable balance that also allows for incremental progress would be

  • no canonicalization in interface types itself
  • for the canonical ABI and JS, interface type floats turn into Numbers and NaNs get canonicalized just like they do with the regular Wasm -> JS value semantics
  • when we eventually have adapter functions, JS programs can do whatever they want, such as turn NaNs into BigInts

@fgmccabe
Copy link
Contributor

Depends on what the goal is. If maximum interoperability is then canonicalization seems essential

@RossTate
Copy link

When I was helping with the early days of Kotlin, they wanted to interop with Java but to also be nulll-safe, which posed an obvious problem. The solution strategy I developed for them at a high-level was to have an ergonomic default—the type-checker would treat Java values as non-null but automatically insert checks where such assumptions were made—but also a more manual fallback—the type-checker would also recognize Java values were potentially non-null and still permit programmers to test them for nullness before automatically inserting checks. That interop strategy was quite successful and is analogous to what I am suggesting here.

@lukewagner
Copy link
Member

The central question is whether the abstract of values admitted by an f64 include NaNs with payloads; that is independent of the lifting/lowering instructions used to produce or consume those values. If non-canonical NaNs are included in the set of valid f64 values, then every constituent language/toolchain in the ecosystem is forced to deal with them (one way or another); I don't think we can sidestep this fact by trying to make this a canonical ABI or lifting/lowering-instruction option.

The concrete experience we have from years of IEEE754 usage in the wild is that non-canonical NaNs aren't useful in practice (other than for NaN-boxing, which wouldn't be a cross-component thing) and mostly only serve to cause language/toolchain vendors to waste time worrying about them, so if indeed the runtime cost is negligible, then I don't see why we wouldn't take the opportunity to (slightly) improve the robustness and simplicity of the ecosystem.

@RossTate
Copy link

That is no longer an interoperability argument. That's fine; I'm just pointing out that the argument has moved to cleaning up the past (in a non-opt-in fashion).

Regardless of what we decide here, languages and tooling will have to worry about NaNs. I don't see how canonicalizing over the boundary will help with that. Even for languages that rely on NaN-canonicalization (e.g. for branch-free NaN-insensitive equality comparisons or hashing), if here you choose a different canonical NaN than the one they chose for their runtime, then they'll have to recanonicalize everything anyways.

I worry that extending the scope of interface types beyond efficient transfer/communication/interaction to the point that we have to try to anticipate/review all programs' needs to come to an answer makes for an infeasible and contentious goal.

@lukewagner
Copy link
Member

e.g. for branch-free NaN-insensitive equality comparisons or hashing

That's not the goal; see NaN-boxing

if here you choose a different canonical NaN than the one they chose for their runtime, then they'll have to recanonicalize everything anyways

In practice (e.g., JS engines today), the canonical NaN bitpattern is an arbitrary global configurable constant, so as long as it is standardized, it can be #ifdefd.

I worry that extending the scope of interface types beyond efficient transfer/communication/interaction to the point that we have to try to anticipate/review all programs' needs to come to an answer makes for an infeasible and contentious goal.

The context here is the Component Model, and the goals and use cases (recently confirmed by the CG) now definitely extend past simply "transfer/communication/interaction" but, rather, robust composition of components implemented in different languages. Maybe it's a bad idea and we'll fail -- but I believe that's scope for this proposal.

@RossTate
Copy link

RossTate commented Jun 3, 2021

The context here is the Component Model

So a component I would expect to be supported by such a model is floating-point libraries (e.g. ones providing functions like acos). Some such libraries are specifically used to favor cross-platform consistency over optimal performance. Common among the specs for these is the requirement that, given a NaN input, the output result is that exact NaN. Of course, it is impossible to satisfy such a spec if NaNs are canonicalized. So canonicalizing NaNs conflicts with existing expectations for cross-platform systems.

If a goal of the component model is to be able to make libraries like libc into components

rather, robust composition of components implemented in different languages

There are existing multi-language systems, some of which have formal guarantees about how components from different languages compose and interact. The norm in these systems is, when composing components from different languages, to insert coercions that are necessary for the two specific languages at hand at the boundary point. (Typically these coercions are auto-generated from the types, though sometimes they can be explicitly specified by the person doing the composing.) So, if you're composing two components of the same language, then you insert identity coercions. But if you're converting between different languages with different representations of the same concept, then you insert a coercion that maps between those representations as faithfully as possible (e.g. mapping arbitrary 64-bit doubles to [non-NaN doubles + NaN] in the obvious fashion, and then selecting a specific 64-bit NaN representation in the reverse direction). The relevant composition theorems still hold with such coercions.

One common composition theorem multi-language systems strive for is that composing two components of language L within language L results in the same behavior as composing two components of language L each as part of the multi-language system. That makes it possible for a program to, say, #include <foo.h> and not worry about whether the (C) component providing foo.h was linked using C semantics or using interface-types semantics. But, clearly, if you're canonicalizing NaNs then you're changing the semantics of linking to the above cross-platform libraries. To satisfy this theorem, it's necessary that the coercions inserted when linking programs from the same language are simply the identity functions.

On the other hand, having a "global" bottleneck representation hinders, rather than aids, multi-language systems. It limits what languages you can add to the system because you've required that they're natural coercions must be at least as expressive as the bottleneck (more formally speaking, they have a semi-invertible surjection to the bottleneck representation). Or, it means that as you consider more languages you'll have to narrow your bottleneck further. This is the issue I raised with 31-bit integers being the "natural/ergonomic" counterpart in OCaml.

So my understanding of multi-language systems suggests that f64 in interface types should preserve NaNs, rather than canonicalize them, because the languages that can observe the difference will care whereas the languages that can't observe the difference won't.

@lukewagner
Copy link
Member

If a goal of the component model is to be able to make libraries like libc into components

Libraries like libc are specific examples given of what would stay as shared-everything modules (possibly imported and reused inside a component). Components are not meant to subsume intra-language modules and linking -- even when components exist, you'll naturally want to use language-native modules and ABIs for rich, deep language integration.

The norm in these systems is [...]

Which systems are you talking about, concretely? Because, if you look at COM or microservices-composed-with-gRPC (which you could roughly summarize as the two large industrial endeavors in this multi-language-composition space in the past), none of the things you're saying hold. It's possible you're thinking about systems that compose, say, within the .NET universe, or within the JVM universe, and those systems will have a natural inclination to unify the world on the runtime's native object/string/etc concepts, but with both wasm and the component model we're both talking about a very different setting.

Fundamentally, the problem in having canonicalization be a pairwise detail is that you lose any abstract semantics for a component in isolation. That is, if I give you a component A, you can't tell me, semantically, what it precisely does without knowing which component B uses it and which languages the two components were implemented in. That's the opposite of "black box reuse", which is one of the fundamental goals of components.

@tlively
Copy link
Member

tlively commented Jun 3, 2021

Hmm, one side effect of rebasing interface types on top of the component model that I hadn't thought of before is that we no longer have a roadmap for to intra-component FFI. Previously we had been punting all issues of inter-language communication to IT, even for boundaries between modules in the same trust domain. For that fully trusted FFI use case, preserving NaN bits is a perfectly reasonable thing to want to do, and I think that's what @RossTate is getting at above. Do we have a story for creating polyglot components?

One way to sidestep the whole problem would be to have both canonicalized and non-canonicalized float interface types, but that seems like a big hammer. Would it also be possible to lift an f64 with NaN boxing into a sum type and lower it back to a NaN boxing f64 on the other side as a runtime no-op?

@lukewagner
Copy link
Member

My operating assumption is: regardless of same-language vs. different-language, all the code inside a single component is either generated from the same toolchain or adhering to the same ABI (like this one). Because of that, there's no strong need for interface types or shared-nothing linking: everyone just follows the same ABI using the same linear memory and trusts that everyone else does as well. (Interface Types are for when you don't want to assume those things.)

Do we have a story for creating polyglot components?

Yes: use core wasm and use the module-linking features to link those core modules together (noting that module-linking features don't require interface types and are happy to link up core modules as shown).

@RossTate
Copy link

RossTate commented Jun 3, 2021

Thanks, @tlively, for effectively rephrasing one of my concerns.

My connection to libc was bad; clearly importing things like malloc only makes sense in a shared-everything model. Sorry for getting my systems crossed.

However, the floating-point libraries I mentioned still seem like perfect examples of what a (multi-language) component model should support. They are easily shared-nothing: they provide solely "pure" input-output functions that don't even have state or even need a shadow stack. And they are regularly used in multi-language settings: Java programs using java.lang.StrictMath link with fdlibm, and many Julia programs link with OpenLibm. Due to the pure nature of these libraries, multiple (separate) Java programs should be able to link to the same fdlibm component, and likewise for Julia and OpenLibm, rather than needlessly duplicating this code. In other words, these shared-nothing libraries are effectively a service that programs needing consistent cross-platform behavior can link to.


But I think the higher-level issue here is agreeing upon what a Multi-Language Component Model is. From your presentation, I understood a component to be a piece of software that implements some API and is self-contained/isolated (i.e. shared-nothing) except through explicit data exchanges (via interface types). To me, the above floating-point libraries match that description. I suspect that we roughly agree on that high-level description of components—where we disagree is on what multi-language means.

From the discussion above, my sense is that the interpretation of multi-language y'all are arguing for is that all expressible component APIs can be conveniently implemented and utilized to their full extent by "any" language. But to me, multi-language means that a component implementing an API can be implemented by "any" language capable of implementing that API. So if the API is to maintain a stateful counter, then "pure" languages like Haskell (or, even more constraining, Coq) are probably not what you're going to implement your component with. And if the API requires preserving NaNs, then JavaScript is probably not what you're going to implement your component with. And if that API offers more precision than what some other components (or languages) need, then those other components simply won't utilize the full benefits of the services your component offers (and no one is hurt by that).

I consider my interpretation to be additive—the more languages you include in "any", the more APIs you can (conveniently) support—whereas the other interpretation seems to be subtractive—the more languages you include in "any", the fewer APIs you're allowed to support. I don't see the value of the subtractive interpretation (are we going to restrict all components to be pure functions so that Haskell/Coq can call them conveniently?), but I do see value in the additive interpretation.

An example that comes to mind is decimal values. C# and .NET offer direct support for 128-bit decimal floating-point values, including syntactic integration into the C# language and hardware acceleration in the runtime. This is extremely valuable to financial programs.

With the subtractive interpretation, we wouldn't add something like d128 to interface types. Until every language has built-in support for decimal floating-point values, no API can use them.

With the additive interpretation, we would add something like d128 to interface types. Sure, many languages won't have built-in support for them, but they can choose how best to fit the concept into their system. One way would be to simply approximate them as binary floating-point values, i.e. f64. But another way would be to make a new library available within that language (say Java) that provides a Decimal class that simply stores the bits of a d128 as two i64s and implements the various operations/methods of Decimal by simply calling out to the C#-implemented library and lifting those two i64 values as a d128. Or, in the case of Python and Julia, you could simply lower d128 into the existing decimal or Decimals libraries.

I would like interface types to provide a system where people can deliberately write different components of a program in different languages according to the strengths of those languages and then conveniently compose those components in order to collectively construct programs that no one programming language could write well. That's what a multi-language component model means to me, and to me that means that interface types should broaden rather than restrict interactions.

@lukewagner
Copy link
Member

I don't think the general question you're asking can be answered definitively in the abstract with either of the extremes positions you're suggesting ("only if all languages" vs. "only if any language"). It's easy to think of examples where either extreme position will lead to bad outcomes, and thus I don't think we can simply argue for one or the other in the abstract.

Rather, as I think is usual with standards design, we have to consider the practical reality of use cases and weigh pros vs. cons, case by case. There are real practical downsides (listed above) with allowing non-canonical NaNs to cross boundaries and I think all the use cases for supporting non-canonical NaNs are hypothetical. Moreover, in line with what @tlively suggested, if real use cases did emerge, the right way to support them would be to add a second type; this would be a clear and useful signal to all implementations involved that they should take the extra pains to preserve the NaN payload. (E.g., a JS binding could then produce something other than a Number for this new type while still getting to use Number for the majority-case f64.)

@RossTate
Copy link

RossTate commented Jun 4, 2021

Okay, so we're back to not treating this as a problem about multi-language interop, but rather specifically about floating-point.

There are real practical downsides (listed above) with allowing non-canonical NaNs to cross boundaries and I think all the use cases for supporting non-canonical NaNs are hypothetical.

I gave real existing libraries that real existing languages currently link to in a cross-language shared-nothing manner, and the specifications of those APIs explicitly state requirements (in line with IEEE 754-2019 recommendations) that cannot be supported with NaN canonicalization. As many language runtimes link against a foreign-implemented library for these floating-point libraries (and expect them to preserve NaN payloads per IEEE 754 recommendations), such a component would provide a service that could be shared by many programs implemented across many languages. Could you articulate why you believe this is not a viable use case for interface types?

While you've listed some hypothetical downsides, they did not seem to me to be articulated in sufficient depth to assert that they were real and practical. It would help me understand your perspective better if you were to elaborate on (one of) them further. For example, you mention tooling, but I don't see how NaN canonicalization would affect tools like LLVM/Binaryen—you have to know how the rest of the program handles (or ignores) NaNs, which only the programmer knows and hence there already exist various compiler flags to indicate how much flexibility to grant the compiler with respect to floating-point values. Maybe you have something else in mind, but without an elaboration on what that something is I have a hard time seeing how NaN canonicalization would have a real practical benefit for tooling.

@aardappel
Copy link

Late to this party, and I am not an expert in this area so feel free to ignore, but my gut reaction is that defaulting to canonicalization would not be desirable. My thinking:

  • An f64 is just a bucket of 64 bits of which each combination is assigned meaning by the IEEE standard.. saying that we only support a subset of combinations seems counter to the low level nature of Wasm.
  • The existing f64 already is specced to support all combinations, so we'd need the interface type to not be name f64 if we wanted to be very strict about what kind of data-type this is.
  • A language like JS can read serialized floats from all sorts of sources and already has to deal with canonicalization itself. If you're writing data processing in JS that must roundtrip, then you have to avoid this somehow. To me, this wouldn't be any different if JS code deals with data coming from IT.
  • Generally IT has the philosophy that if two languages agree on data format there should be no additional overhead, so C sending data to Rust should not be subject to unnecessary conversions.

So my solution would be for f64 to continue to mean that all bits have meaning in all combinations, but that adapters for individual languages can decide to either canonicalize or use raw bits storage. It be up to programmers to know if a language is suitable for implementing a particular interface, much like they already must know that they can't roundtrip serialized data using certain data types.

That, or introduce a c64 type :P

@sunfishcode
Copy link
Member Author

I gave real existing libraries that real existing languages currently link to in a cross-language shared-nothing manner, and the specifications of those APIs explicitly state requirements (in line with IEEE 754-2019 recommendations) that cannot be supported with NaN canonicalization.

If we were to compile that OpenLibm acos code to WebAssembly, today, we'd get a function which can return nondeterministic NaN bits. This is because core wasm's own arithmetic operators don't guarantee to propagate NaN payloads. Consequently, that OpenLibm acos example already doesn't satisfy use cases that require NaN propagation.

@RossTate
Copy link

RossTate commented Jun 4, 2021

It would be easy to modify the code to do what .NET does to ensure IEEE compliance. Right now, the functions in those libraries have typically just one line that relies on the fact that either x+x or (x-x)/(x-x) preserves NaN payloads on IEEE-754-2019-compliant hardware, so you could easily make the change in the C source code without having to do anything specifically in WebAssembly (and without meaningfully changing performance since these lines are off the hot path). This would be in line with changes these libraries have made to accommodate buggy compilers and buggy hardware.

@sunfishcode
Copy link
Member Author

More generally, on a platform where add, sub, mul and sqrt don't propagate NaN payloads, how important is it for acos to propagate NaN payloads?

@RossTate
Copy link

RossTate commented Jun 5, 2021

Two thoughts:

  1. While those instructions are not guaranteed to propagate NaN payloads, they are generally implemented with common instructions/libraries, and so based on these libraries' experiences I wouldn't be surprised if in practice wasm propagates NaN payloads fairly reliably (though I do know there have been some weird cases identified in SIMD). You could probably compile the above libraries to wasm without change and see them still preserve NaNs despite the lack of guarantees.
  2. That aside, it's easy to have a compiler generate NaN-guards around these instructions. I suspect any program really wanting to ensure cross-platform determinism will do so.

I get why add and such are not guaranteed to propagate NaN payloads in wasm (though it might be reasonable to require that they do when all inputs have the same payload): due to the variation of hardware, we'd have to go out of our way to make this deterministic and take a performance hit for it. But NaN canonicalization in interface types would be going out of our way to remove existing functionality while adding additional complexity (because now lowering has to specify a canonical NaN—I've found existing systems with different choices of canonical NaNs), and I don't (yet) see the benefit of that.

@sunfishcode
Copy link
Member Author

I've looked around for a source for .NET needing NaN payload propagation functionality and not been able to find one yet.
IEEE 754 says "should" and not "shall". EMCA-335, the spec for .NET, says "this standard does not specify how to access the exact bit pattern of NaNs that are created". If specs or real-world use cases need NaN payloads to be propagated, I'm genuinely curious to learn more.

@RossTate
Copy link

@sunfishcode There are some links I can fish up on this that you might like, but it looks like I won't have time to re-find them until Friday. Sorry for the delay.

@RossTate
Copy link

First, for some context on IEEE 754, this document provides some useful background for the most recent changes:

All new operations are “recommended” even when we really meant “required,” to retain upward
compatibility from 754-2008.

Among these new operations are getPayload, setPayload, and setPayloadSignaling, and among the changes 2019 made to 2008 is more explicit specification of binary representation of NaNs (such as how quiet vs. signaling should be represented). This suggests to me that there's pressure to make NaN payloads more standardized and accessible, but there are obstacles with backwards compatibility for existing hardware, and this seems to be what they found to be the compromise.


As for .NET, unfortunately the language spec significantly underspecifies many things. So you have to dig through the code comments and repo history for the specification "in practice". That link above is one example I found indicating they want IEEE conformance. This comment suggests they believe it makes the platform more appealing to numerical libraries:

Being able to say that we are IEEE compliant for required operations is something that can be quite important for various numerics libraries, for porting code from native, and for determinism across platforms.

Interestingly, there happens to be this recent issue about a problem caused by (unintentional) NaN canonicalization filed by a customer here expressing this concern:

This can be an issue if the application works with binary serialization formats that store double or float values in the form of the bits that make them up, because converting those bits to double or float with these BitConverter methods may make round-tripping that double or float not always possible.

That seems like a concern that would apply to components for (de)seriallizing numerical data.


Digging around, it seems that some statistics software indeed make use of NaN payloads. One concrete example I found was that R uses payloads to distinguish between NA_real_ (meaning missing real-valued data) and NaN (meaning numerical error). This distinction has semantic significance. For example, if you run a survey, you're supposed to use NA to indicate when a participant did not supply an answer, and R functions handle NA values accordingly. Many of these functions rely on the fact that payloads are preserved (so that if you make a new column based on some function of other columns, it will have an NA row (rather than NaN) entry whenever that row's value was dependent on data that was not available), though this documentation notes the non-determinism when there are NaNs in the mix:

Numerical computations using NA will normally result in NA: a possible exception is where NaN is also involved, in which case either might result (which may depend on the R platform).


Hope that helped satiate your curiosity!

@lukewagner
Copy link
Member

The evidence I would like to see is users (of floating point libraries) actually caring about NaN payloads. From what I've heard of the history of this feature, the use case that motivated the whole concept of NaN payloads (stuffing error codes into NaNs that implicitly propagate along with the float values) never actually materialized. Or, said differently: if IT canonicalized NaN payloads, it sounds like no end user would actually notice, much less complain.

And, to reiterate what I said above: if someone did actually notice, due again to the high divergence in NaN-payload propagation, the right solution would be to add a separate type that explicitly called out "I care about the NaN payload", so that the interface contract is explicit and extra steps could be taken on languages that didn't propagate NaN payloads by default. This isn't hypothetical: this is what SpiderMonkey actually had to do in the past to implement the .wast test suite when it was compiled to JS and what SpiderMonkey-on-wasm would want to do if such NaN-payload use cases ever materialized.

@RossTate
Copy link

The evidence I would like to see is users (of floating point libraries) actually caring about NaN payloads.

R programs are linked against a component (not written in R) providing IEEE-compliant implementations of common floating-point numbers. These programs rely on the fact that such a component preserves payloads and thereby maintains the distinction between NA in R and NaN in R, which is semantically significant.

@lukewagner
Copy link
Member

Googling how R uses NaN payloads, I only see mention of the special "NA" value, with links to R documentation explaining that the propagation of NA is unreliable and varies by platform and compiler, which reinforces the overall point made by this issue that non-canonical NaNs are a source of importability and should not be relied upon. Moreover, due to this loose R spec language, coercing NA to a canonical NaN is allowed, so it's not clear whether, to my final question, any end user would actually notice NA vs. NaN.

More generally, my assumption here is that the cross-language interop in the particular case of R is only between the usual C-family set of ABI-compatible languages which would naturally depend on shared-everything linking due to the prevalent use of shared linear memory in these math-intensive types libraries (e.g., when passing a matrix or large vector). Thus, I don't think this concrete example would be an instance of cross-component NaN payload propagation, but rather another instance of shared-everything linking. Another example would be Rust using C-implemented libc or libm.

Most generally, due to the R-documented portability problems, if R wanted to provide deterministic NA semantics and support fully cross-language shared-nothing-linking, then the only robust option would be to keep NA as an implementation detail of the R component and reflect NA in public interfaces as explicit types that will show up well in all languages, like variants with error cases that have explicit payloads.

@saona-raimundo
Copy link

I know we are talking about IEEE 754 here, where NaN bit patterns were designed to provide error handling over the different scenarios where NaN arises... but such error handling is not really used.

A modern and serious revision of numerical representations used for computations (this is what f32 and f64 are) is worked under the umbrella of unums. Posits (unum III) is a drop-in replacement for IEEE 754.

In this format, there is no such thing as multiple types of NaNs, under the premise that it is a waste of bit patterns that can be used to improve accuracy.

@lukewagner
Copy link
Member

Ah, very good point: if there is some new variation on IEEE754 floats on the horizon that is a drop-in replacement for IEEE754 floats except for the removal of non-canonical NaNs, that seems like an additional argument in favor of not including them in component interface float semantics.

Incidentally, I'm actively working on a rebase of this proposal onto the recently-merged Module Linking rebase (itself rebased onto the Component Model, as proposed earlier in the year), and I'm planning to include the fix to this issue. In particular, the current idea is for intertype to include new float32/float64 types which are distinct from core-wasm f32/f64 in that they only include a single semantic NaN value. Thus intertype and core:valtype will be disjoint sets of types instead of slightly-intersecting.

@bathos
Copy link

bathos commented Dec 20, 2021

It seems potentially noteworthy that even though NaN representation distinguishability is itself left implementation-defined by ECMA-262, behavior is specified to be deterministic in the operation that makes it potentially observable:

An implementation must always choose the same encoding for each implementation distinguishable NaN value.

It seems like all major engines do permit observation with the same (maybe?) semantics in practice. I'm not 100% sure of that, but e.g. the following behaves the same in both V8 and Spidermonkey:

Chromium console screenshot showing assignment from one Float64Array to another

Edit: lol I meant to set b[0], not a[0] there, oops. In V8 the representation propagates, in Spidermonkey it canonicalizes, but in both it does so reliably.

This seems to make a case for at least determinacy, esp. given the role TypedArray plays in bridging JS & WASM. I'm totally out of my depth so this hardly counts as a meaningful opinion - I just figured it might be worth describing the specifics of how JS handles it in the spec & in practice because JS has been mentioned as a factor a few times and determinacy is one of the questions, but the specified-determinacy of Float32Array and Float64Array did not appear to have been previously noted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants