Potential Performance gain by not using (min/max) functions on floating point scalars #444

unfinishedprogram · 2023-12-03T14:46:42Z

Currently, the implementation of min_element and max_element for floating point vec3 uses the built in min/max functions. These functions explicitly handle NaN values, and do not propagate them, as is explained here.
This is potentially desired behavior, however, this is semantically different from the meaning of min in vectorized contexts where NaN values would be propagated.

The extra handling of NaNs in these functions incurs significant performance overhead.
Benchmarking on my computer using a custom f32::min implementation, which does not check for NaN, was ~25% faster.

// Current implementation
pub fn min_element(self) -> f32 {
    self.x.min(self.y.min(self.z))
}

// Faster implementation
pub fn min_element(self) -> f32 {
    let min = |a, b| if a < b { a } else { b };
    min(self.x, min(self.y, self.z))
}

We can see why this is so much faster by comparing the generated assembly: https://godbolt.org/z/3Ph1hWx8e

If I'm missing something please let me know, but otherwise this seems like an easy change which could make a significant difference in performance for some use-cases.
I didn't have the time to look into other uses of floating-point min throughout the library but I'd guess it's replacement could improve performance elsewhere as well.

The text was updated successfully, but these errors were encountered:

bitshifter · 2023-12-11T04:58:25Z

Yeah I think that makes sense, especially if it means behaviour is the same with and without SIMD.

bitshifter · 2024-03-14T10:02:42Z

Turns out there are unstable https://doc.rust-lang.org/core/primitive.f32.html#method.maximum and https://doc.rust-lang.org/core/primitive.f32.html#method.minimum functions that are similar to what you suggest, although there are issues with stabilizing them rust-lang/rust#91079.

unfinishedprogram · 2024-03-14T13:40:51Z

Interesting, I guess some digging on godbolt is required to determine if this manual "ternary style" replacement compiles to the expected instruction on all architectures.

Another thing I didn't consider with this proposal, is non-optimized code-gen. This method could be a substantial performance degradation for optimized builds. However, I'm not sure how important this is for the goals of this library or if this is even the case. It would need to be tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Performance gain by not using (min/max) functions on floating point scalars #444

Potential Performance gain by not using (min/max) functions on floating point scalars #444

unfinishedprogram commented Dec 3, 2023 •

edited

bitshifter commented Dec 11, 2023

bitshifter commented Mar 14, 2024

unfinishedprogram commented Mar 14, 2024

Potential Performance gain by not using (min/max) functions on floating point scalars #444

Potential Performance gain by not using (min/max) functions on floating point scalars #444

Comments

unfinishedprogram commented Dec 3, 2023 • edited

bitshifter commented Dec 11, 2023

bitshifter commented Mar 14, 2024

unfinishedprogram commented Mar 14, 2024

unfinishedprogram commented Dec 3, 2023 •

edited