Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Performance gain by not using (min/max) functions on floating point scalars #444

Open
unfinishedprogram opened this issue Dec 3, 2023 · 3 comments

Comments

@unfinishedprogram
Copy link

unfinishedprogram commented Dec 3, 2023

Currently, the implementation of min_element and max_element for floating point vec3 uses the built in min/max functions. These functions explicitly handle NaN values, and do not propagate them, as is explained here.
This is potentially desired behavior, however, this is semantically different from the meaning of min in vectorized contexts where NaN values would be propagated.

The extra handling of NaNs in these functions incurs significant performance overhead.
Benchmarking on my computer using a custom f32::min implementation, which does not check for NaN, was ~25% faster.

// Current implementation
pub fn min_element(self) -> f32 {
    self.x.min(self.y.min(self.z))
}

// Faster implementation
pub fn min_element(self) -> f32 {
    let min = |a, b| if a < b { a } else { b };
    min(self.x, min(self.y, self.z))
}

We can see why this is so much faster by comparing the generated assembly: https://godbolt.org/z/3Ph1hWx8e

If I'm missing something please let me know, but otherwise this seems like an easy change which could make a significant difference in performance for some use-cases.
I didn't have the time to look into other uses of floating-point min throughout the library but I'd guess it's replacement could improve performance elsewhere as well.

@bitshifter
Copy link
Owner

Yeah I think that makes sense, especially if it means behaviour is the same with and without SIMD.

@bitshifter
Copy link
Owner

Turns out there are unstable https://doc.rust-lang.org/core/primitive.f32.html#method.maximum and https://doc.rust-lang.org/core/primitive.f32.html#method.minimum functions that are similar to what you suggest, although there are issues with stabilizing them rust-lang/rust#91079.

@unfinishedprogram
Copy link
Author

Interesting, I guess some digging on godbolt is required to determine if this manual "ternary style" replacement compiles to the expected instruction on all architectures.

Another thing I didn't consider with this proposal, is non-optimized code-gen. This method could be a substantial performance degradation for optimized builds. However, I'm not sure how important this is for the goals of this library or if this is even the case. It would need to be tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants