Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing numbers can take a surprising amount of time #125

Open
BenWiederhake opened this issue Mar 29, 2024 · 10 comments
Open

Comparing numbers can take a surprising amount of time #125

BenWiederhake opened this issue Mar 29, 2024 · 10 comments

Comments

@BenWiederhake
Copy link

I understand that comparing numbers that are very similar might take a bit of time, as it might need to read all stored digits in the worst case. But certainly there is no expensive computation involved, right?

However, calling cmp always has a large overhead in CPU and memory. Example:

#[test]
fn test_compare_extreme() {
    let teenytiny = BigDecimal::from_str("1e-9223372036854775807").unwrap();
    let one = BigDecimal::from_str("1").unwrap();
    assert!(one > teenytiny);
    assert!(teenytiny < one);
}

Note that creating teenytiny and one is very quick: The internal BigInt and wrapping BigDecimal require barely any memory at all. However, the comparisons are excruciatingly slow, because:

impl Ord for BigDecimal {
    fn cmp(&self, other: &BigDecimal) -> Ordering {
        // [… other code …]
        let tmp = self - other;
        // [… other code …]
    }
}

This subtraction effectively runs an infinite loop, even though cmp could be determined (for this input) very easily.
Implementing some kind of early-return path would be really nice.

Related: uutils/coreutils#3318

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

Yeah that can be done.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

PartialEqual has a "scale difference" issue, too:

  let scaled_int_val = &self.int_val * ten_to_the((rhs.scale - self.scale) as u64);

Where that subtraction could cause an out-of-memory error allocating 10^(terabyte-sized-number)

(and by terabyte I mean exabyte.....)

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

Oh no, that was the old version. It was fixed in September. I probably meant to fix impl Ord too but slipped by me.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

Implementation is ready, and I included your example test. I'd like to write a few more tests to improve the coverage.

trunk...f/ImprovedOrd

It now tries to cast the big ints to u64, then u128, then creates Vec of the digits, so the amount of memory required is only dependent on number of digits, not the scale. And no allocations if the value is under 40 digits long.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

I'm calling it for the night, but leaving myself a message:

example : cmp(1e-9223372036854775807, 1e+9223372036854775807) panics as the difference of exponents is out of i64 domain.

I'm not sure what to do in that case. I would guess that it's safe to assume they're not equal, as that would mean exabytes of numbers to reach equality, and go with less Less/Greater base on which scale is larger.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

That was fixed with a new checked_diff(i64,i64) -> (Ordering, Option<u64>) function. If the difference between the two numbers is greater than a u64, we return None in the tuple. Since we're comparing scales (i.e. number of digits) then we're safe to assume if that number doesn't fit in a u64, we can know the ordering/equality of the numbers without needing to inspect them digit-by-digit.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

Testing at the boundaries here lead to finding a similar panic in implementation of eq. That too has been fixed and I think should be faster with a non-allocating algorithm if the scale is <20 digits (max number of decimal-digits which fits in a u64).

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

Now the hard part is making contrived test cases which avoid the optimizations.

@akubera
Copy link
Owner

akubera commented Mar 29, 2024

I'm happy with result. It's now merged into trunk in commit b34622d.

Let me know if that works for you.

@BenWiederhake
Copy link
Author

Fantastic! It works great :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants