Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::fmt precision in format!: regression since 0.3? #126

Open
rubdos opened this issue Apr 1, 2024 · 18 comments
Open

std::fmt precision in format!: regression since 0.3? #126

rubdos opened this issue Apr 1, 2024 · 18 comments

Comments

@rubdos
Copy link
Contributor

rubdos commented Apr 1, 2024

As far as I remember, BigDecimal 0.3 rendered the below as 68.21 (considering the {:.2} as a precision), whereas BigDecimal 0.4 seems to interpret {:.2} as a maximum width.

#[test]
fn test_float_format() {
    let foo = BigDecimal::from_str("68.21003").unwrap();
    assert_eq!(format!("{foo:.2}"), "68.21");
}

0.4/trunk:

---- bigdecimal_tests::test_float_format stdout ----
thread 'bigdecimal_tests::test_float_format' panicked at src/lib.rs:1332:9:
assertion `left == right` failed
  left: "68"
 right: "68.21"

tag v0.3.1:

running 1 test
test bigdecimal_tests::test_float_format ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 50 filtered out; finished in 0.00s

I would have preferred the "old" behaviour, because I was using BigDecimal to work with monetary units. I'm not sure whether this is considered a regression or expected behaviour?

@rubdos
Copy link
Contributor Author

rubdos commented Apr 1, 2024

Update: the above actually worked as I expected (precision, instead of width) in tag v0.4.2! This sounds like a regression to me.

Potentially since the introduction of exponential notation in #121, since that touches a lot of formatting code and appeared in 0.4.3?

@akubera
Copy link
Owner

akubera commented Apr 1, 2024

Yeah I went with copying the behavior of Python's Decimal, as I discussed (with myself) in that thread you link to.

Python 3.12.1 (main, Feb  6 2024, 01:56:00) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import *
>>> foo = Decimal("68.21003")
>>> f"{foo:0.2}"
'68'

That being said, there's a discrepancy between the languages for formatting regular floating points

>>> f = 68.21003
>>> f"{f:.2}"
'6.8e+01'
fn main() {
    let foo = 68.21003;
    println!("{foo:.2}")  // 68.21
}

Investigating further it looks like java matches the Rust behavior:

import java.math.*;
class BigDecimalTest {
    public static void main(String[] args) {
        BigDecimal foo = new BigDecimal("68.21003");
        System.out.printf("%.2f", foo);  // -> 68.21
    }
}

So I think you're right, we should switch it back to "scale" (after-decimal-point count) not "full-precision".

@akubera
Copy link
Owner

akubera commented Apr 1, 2024

because I was using BigDecimal to work with monetary units

❗ Hope the change wasn't discovered after it was too late 😬

@akubera
Copy link
Owner

akubera commented Apr 1, 2024

In the mean time, there's the method BigDecimal::with_scale_round which will trim to exactly two digits and you have control over rounding-direction (and can collect those leftover fractions of cents)

let rounded = foo.with_scale_round(2, RoundingMode::Down);
println!("{}", rounded);

leftovers += &foo - rounded;

@rubdos
Copy link
Contributor Author

rubdos commented Apr 2, 2024

because I was using BigDecimal to work with monetary units

❗ Hope the change wasn't discovered after it was too late 😬

I'm not using BigDecimal formatting to carry out persistent computations, so there's no rounding creep happening luckily. I think the worst that happened is that someone at the office paid 50ct too much or too little for the coffee contribution, but that difference is tracked anyway, so there's no real problem :'-)

But yes, I found out a bit too late... woops.

Thanks for commenting and acknowledging!

@akubera
Copy link
Owner

akubera commented Apr 14, 2024

Most of the re-write is done. But I would like advice on how to format large exponents.

Precision in {:.prec} formatters now set the number of digits after the decimal.

Example tests:

            fn test_input() -> BigDecimal {
                "123456".parse().unwrap()
            }

            impl_case!(fmt_default:      "{}" => "123456");
            impl_case!(fmt_d1:        "{:.1}" => "123456.0");
            impl_case!(fmt_d4:        "{:.4}" => "123456.0000");
            impl_case!(fmt_4d1:      "{:4.1}" => "123456.0");
            impl_case!(fmt_15d2:    "{:15.2}" => "      123456.00");
            impl_case!(fmt_r15d2:  "{:>15.2}" => "      123456.00");
            impl_case!(fmt_l15d2:  "{:<15.2}" => "123456.00      ");
            impl_case!(fmt_p05d1:  "{:+05.7}" => "+123456.0000000");

///////////

            fn test_input() -> BigDecimal {
                "0.00003102564500".parse().unwrap()
            }
            impl_case!(fmt_default: "{}" => "0.00003102564500");
            impl_case!(fmt_d0:   "{:.0}" => "0");
            impl_case!(fmt_d4:   "{:.4}" => "0.0000");
            impl_case!(fmt_d5:   "{:.5}" => "0.00003");
            impl_case!(fmt_d10:  "{:.10}" => "0.0000310256");
            impl_case!(fmt_d14:  "{:.14}" => "0.00003102564500");
            impl_case!(fmt_d17:  "{:.17}" => "0.00003102564500000");

            impl_case!(fmt_e:      "{:e}" => "3.102564500e-5");
            impl_case!(fmt_de:    "{:.e}" => "3.102564500e-5");
            impl_case!(fmt_d0e:  "{:.0e}" => "3e-5");
            impl_case!(fmt_d1e:  "{:.1e}" => "3.1e-5");
            impl_case!(fmt_d4e:  "{:.4e}" => "3.1026e-5");
        }

These now match the behavior of Rust's formatting of standard floating point values:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=29714bd2fe9f84e9bbf3d5b3fcc3e45e

Also Java formatting (except the first %f doesn't keep full precision).

import java.math.*;
class BigDecimalTest {
    public static void main(String[] args) {
        BigDecimal n = new BigDecimal("0.00003102564500");
        System.out.printf("%s\n", n);    // 0.00003102564500
        System.out.printf("%f\n", n);    // 0.000031
        System.out.printf("%.0f\n", n);  // 0
        System.out.printf("%.1f\n", n);  // 0.0
        System.out.printf("%.4f\n", n);  // 0.0000
        System.out.printf("%.5f\n", n);  // 0.00003
        System.out.printf("%.9f\n", n);  // 0.000031026
        System.out.printf("%.14f\n", n); // 0.00003102564500
        System.out.printf("%.15f\n", n); // 0.000031025645000
        System.out.printf("%.15e\n", n); // 3.102564500000000e-05
    }
}

Large exponents

The problem is with large exponents.

fn main()
{
    let n = 1e300;
    println!("{}", n);  // 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    println!("{.1}", n);  // 1000000000000000052504760255204420248704468581108159154915854115511802457988908195786371375080447864043704443832883878176942523235360430575644792184786706982848387200926575803737830233794788090059368953234970799945081119038967640880074652742780142494579258788820056842838115669472196386865459400540160.0
    // (there's a .0 at the end of this one)
}

I think it's clear I have to break from Rust's formatting behavior in this case and just print in exponential form if there's a positive exponent (i.e. a negative "scale"). That, or do so if the exponent/total-digits is greater than some threshold.

fn main()
{
    let n: BigDecimal = "1e300".parse().unwrap();
    println!("      {{}} = '{}'", n);  // 1e300
    println!("      {{}} = '{.1}'", n);  // 1.0e300
}

This also has implications for #127 : what to do with the simple case of "1e1"? Should the default (precisionless) formatter detect there's one trailing zero and print 10, or should the two values be distinguished (10 with 1 or 2 significant figures).

Also, what should precision mean in this case?

let n: BigDecimal = "1e1".parse().unwrap();  // int-value=1, scale=-1
println!("{n:0.3}"); // 10.000 or 1.000e1 ?

// different than?
let n: BigDecimal = "10".parse().unwrap();   // int-value=10, scale=0
println!("{n:0.3}"); // clearly 10.000

@rubdos
Copy link
Contributor Author

rubdos commented Apr 14, 2024

First, I am personally not invested in exponential notation, so please feel free to disregard whatever I say about this.

I would take inspiration from how phonenumber-rs does formatting, with some tweaks. phonenumber-rs allows to wrap a PhoneNumber in some formatting struct to alter the default behaviour. E.g., for bigdecimal, we could:

let n: BigDecimal = "1e1".parse().unwrap();  // int-value=1, scale=-1
println!("{n:0.3}"); // 10.000
println!("{:0.3}", n.format_exp()); // 1.000e1
println!("{:0.3}", n.format_exp_threshold(2)); // 1.000e1
println!("{:0.3}", n.format_exp_threshold(1)); // 10.000

You could consider some default threshold of e.g. 4 (or probably a more logical higher threshold), and allow disabling it altogether.

println!("{:0.3}", n.format_without_exp()); // 1.000e1

Although my personal vote would go to always disable exponential notation by default.

I would suggest you Cc the major discussion contributors from #121 in this issue, to have them chip in too.

Thanks for tackling this!

@akubera
Copy link
Owner

akubera commented Apr 14, 2024

I'd really like some kind of proper formatter object that could support advanced things like locale (specifying digit separators and the like). There's a lot of work to do before getting to that point.

There is the to_scientific_notation & write_scientific_notation methods for always writing it one way.
Those paired with a n.take_and_scale(3) might be enough disambiguate formatting options?

@akubera
Copy link
Owner

akubera commented Apr 14, 2024

Branch has been pushed: trunk...bugfix/formating

Along with updating and adding a lot of new tests, I reverted the test_fmt to the original (that you wrote in 2017), except for the 1e2 case which will now use scientific notation rather than the full "100", "100.0", etc.

Try using that branch and let me know if something needs fixed, or you notice something looks wrong. I think I'm done for the day so I'll probably merge in next weekend.

@rubdos
Copy link
Contributor Author

rubdos commented Apr 14, 2024

I'd really like some kind of proper formatter object that could support advanced things like locale (specifying digit separators and the like). There's a lot of work to do before getting to that point.

Is there, though? I think that's the kind of object you can build out gradually; that's the whole point of such a construction in my opinion.

that you wrote in 2017

That is hilarious, honestly.

Try using that branch and let me know if something needs fixed, or you notice something looks wrong. I think I'm done for the day so I'll probably merge in next weekend.

I'll deploy it right now, thanks! Just a thing: you wrote bugfix/formating instead of bugfix/formatTing. Not that it matters in a branch name.

@akubera
Copy link
Owner

akubera commented Apr 17, 2024

By "a lot of work to do" I mean in relation to the redesign I've been slowly implementing for a couple years, now. I'm trying to stay focused on improving performance and getting some more math functions implemented. Though with that said, formatting is probably the one thing everybody has to use with this library; it should be a high priority too.

you wrote bugfix/formating instead of bugfix/formatTing

Of course I did. 🤦‍♂️

@akubera
Copy link
Owner

akubera commented Apr 17, 2024

At least the commit messages had the correct number of 't's.

I like the Formatter in numfmt. I'll look around at other solutions. Let me know if you find anything you particularly like.

@rubdos
Copy link
Contributor Author

rubdos commented Apr 17, 2024

FYI, our coffee contribution bot seems to behave fine for now with your patch :)

@akubera
Copy link
Owner

akubera commented Apr 17, 2024

It's important to keep the bots happy.

@akubera
Copy link
Owner

akubera commented Apr 18, 2024

I wrote a little program that prints various bigdecimals with various formats and compare it with the same number parse as f64:

        | BigDecimal     | f64
        +-----------     +-----
1234
     {} : 1234            1234
  {:.0} : 1234            1234
  {:.1} : 1234.0          1234.0
  {:.2} : 1234.00         1234.00
  {:.3} : 1234.000        1234.000
  {:.4} : 1234.0000       1234.0000

1.234
     {} : 1.234           1.234
  {:.0} : 1               1
  {:.1} : 1.2             1.2
  {:.2} : 1.23            1.23
  {:.3} : 1.234           1.234
  {:.4} : 1.2340          1.2340

.1234
     {} : 0.1234          0.1234
  {:.0} : 0               0
  {:.1} : 0.1             0.1
  {:.2} : 0.12            0.12
  {:.3} : 0.123           0.123
  {:.4} : 0.1234          0.1234

.00001234
     {} : 0.00001234      0.00001234
  {:.0} : 0               0
  {:.1} : 0.0             0.0
  {:.2} : 0.00            0.00
  {:.3} : 0.000           0.000
  {:.4} : 0.0000          0.0000

.00000001234
     {} : 1.234E-8        0.00000001234
  {:.0} : 1E-8            0
  {:.1} : 1.2E-8          0.0
  {:.2} : 1.23E-8         0.00
  {:.3} : 1.234E-8        0.000
  {:.4} : 1.2340E-8       0.0000

12340
     {} : 12340           12340
  {:.0} : 12340           12340
  {:.1} : 12340.0         12340.0
  {:.2} : 12340.00        12340.00
  {:.3} : 12340.000       12340.000
  {:.4} : 12340.0000      12340.0000

1234e1
     {} : 1.234E+4        12340
  {:.0} : 1E+4            12340
  {:.1} : 1.2E+4          12340.0
  {:.2} : 1.23E+4         12340.00
  {:.3} : 1.234E+4        12340.000
  {:.4} : 1.2340E+4       12340.0000

1234e10
     {} : 1.234E+13       12340000000000
  {:.0} : 1E+13           12340000000000
  {:.1} : 1.2E+13         12340000000000.0
  {:.2} : 1.23E+13        12340000000000.00
  {:.3} : 1.234E+13       12340000000000.000
  {:.4} : 1.2340E+13      12340000000000.0000

Rust is consistent that {:.N} f64 will be printed with N digits after the decimal point. The last one illustrates a problem when it comes to bigdecimals: if not careful it's easy to exhaust memory if printing a large number with any precision. So there should be a limit where we switch from decimal place to exponential form (right now that limit is simply '1').

No questions, I'm just thinking out loud here.

The script will be added to examples/ dir.

@rubdos
Copy link
Contributor Author

rubdos commented Apr 18, 2024

if not careful it's easy to exhaust memory if printing a large number with any precision. So there should be a limit where we switch from decimal place to exponential form (right now that limit is simply '1').

Part of the question here, imho, is whether memory exhaustion would be BigDecimal's fault, or whether it'd be the caller's fault.

@akubera
Copy link
Owner

akubera commented Apr 19, 2024

At the end of the day, I think it is the users' responsibility to validate inputs and check their assumptions, but that doesn't mean that the library shouldn't do everything it can to steer the user in the right direction and minimize surprises.

I had a small discussion about this on hackernews a couple weeks ago https://news.ycombinator.com/item?id=39900871 where some people urged me to play it safe, lest BigDecimal becomes the source of a CVE.

Making "the easy way" also "the safe way" is tough to argue against.

I should have time this weekend to finish this up.

@rubdos
Copy link
Contributor Author

rubdos commented Apr 19, 2024

So then there's the question at what length you would switch to exponential notation, and what the actual behaviour should be. You could even opt for a panic (although that'd also be worthy of CVEs).

Either way, formatting structs will eventually "correctly" resolve this discussion, I would say. At that point, there'll be a new discussion around the bahaviour of a float-only formatter that might go OOM; I suppose in that case it should panic at some boundary? Probably warrants a separate discussion/issue/thread.

Thanks again for taking this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants