Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal precision mismatches against Python Decimal #471

Open
apalepu23 opened this issue Jan 28, 2022 · 3 comments
Open

Decimal precision mismatches against Python Decimal #471

apalepu23 opened this issue Jan 28, 2022 · 3 comments

Comments

@apalepu23
Copy link

Hey all,

I know there have been some conversations around precision (re: #414 ), but wanted to post a different question on the topic for guidance/support:

CONTEXT:
We have a Rust codebase that performs computations (using this rust-decimal module) and outputs a record of the inputs to these computations. We have a Python sister codebase that receives this log and attempts to perform the exact same computations (using the Decimal module) to arrive at the same results.

We noticed that on occasion, there are ever so slight discrepancies.

REPRO:

FYI: For Python, we set the Decimal module's context to be the following:

getcontext().prec = 29
getcontext().Emin = 0
getcontext().Emax = 28
getcontext().clamp = 1

We chose these values after studying the Decimal module and attempting to match how rust-decimal handles arithmetic. In particular, we believe these Emin and Emax values give us an e in the range of [-28, 0], and the prec setting of 29 (as opposed to the default value of 28) would allow us to represent 29 digits since rust-decimal can support numbers as large as 2 ** 96 - 1 (which has 29 digits).

Case 1 (Matches) ==> 5631989.747461568422879160 + 12354.867325583148587639999525:

// RUST
Decimal::from_str("5631989.747461568422879160").unwrap() + Decimal::from_str("12354.867325583148587639999525").unwrap() == 5644344.6147871515714667999995

// PYTHON
Decimal("5631989.747461568422879160") + Decimal("12354.867325583148587639999525") == 5644344.6147871515714667999995

Case 2 (Matches) ==> 6631989.747461568422879160 + 12354.867325583148587639999525:

// RUST
Decimal::from_str("6631989.747461568422879160").unwrap() + Decimal::from_str("12354.867325583148587639999525").unwrap() == 6644344.6147871515714667999995

// PYTHON
Decimal("6631989.747461568422879160") + Decimal("12354.867325583148587639999525") == 6644344.6147871515714667999995

Case 3 (Matches) ==> 7631989.747461568422879160 + 12354.867325583148587639999525:

// RUST
Decimal::from_str("7631989.747461568422879160").unwrap() + Decimal::from_str("12354.867325583148587639999525").unwrap() == 7644344.6147871515714667999995

// PYTHON
Decimal("7631989.747461568422879160") + Decimal("12354.867325583148587639999525") == 7644344.6147871515714667999995

Case 4 (Does not match) ==> 8631989.747461568422879160 + 12354.867325583148587639999525:

// RUST
Decimal::from_str("8631989.747461568422879160").unwrap() + Decimal::from_str("12354.867325583148587639999525").unwrap() == 8644344.6147871515714668000000

// PYTHON
Decimal("8631989.747461568422879160") + Decimal("12354.867325583148587639999525") == 8644344.6147871515714667999995

DISCUSSION:

Given how widespread the usage of rust-decimal and Python's Decimal modules are (appears to be the dominant decimal packages in both languages), it seems there should be a way to synchronize both implementations. Is there a way to have rust-decimal's output match that of Python's, or set up the Python Decimal module's context using the prec, Emax, Emin fields, etc. (or transform the output) to identically mimic the behavior of this module?

@paupino
Copy link
Owner

paupino commented Jan 28, 2022

Great write up, thank you. I'll need to look into this more, however from first glance it is triggered by how Rust Decimal currently handles underflow. I suspect Python doesn't round and possibly has an extra bit or two of precision, whereby Rust Decimal currently does (i.e. the same as .NET).

Calculating by hand:

8631989.747461568422879160000000
  12354.867325583148587639999525
8644344.614787151571466799999525

For Rust Decimal it's that hanging 29th significant figure which it can sometimes represent, and sometimes not (See here for a good explanation). It can guarantee 28 with its current layout so that's why it rounds in this case.

I'll look into it some more to see how possible it is though. I think it's a reasonable request and something I'd love to see how we could modify to cater for this behavior.

@paupino paupino changed the title Decimal precision mismatches Decimal precision mismatches against Python Decimal Feb 8, 2022
@paupino
Copy link
Owner

paupino commented Feb 18, 2022

I've done a little investigating into this. For the fourth example the Rust library rounds to fix everything into a 96 bit mantissa. For this, it rounds up from the 5 hence why we see the value 8644344.614787151571466800000 generated.

The decimal behind the scenes uses the following:

  • Mantissa: 8644344614787151571466800000 / 1b ee 6f 2c 8d a3 c2 a5 ee 83 4b 80
  • Scale: 21

Now, if we look at what the python mantissa is we can see that it is likely storing the following:

  • Mantissa: 86443446147871515714667999995
  • Scale: 22

More interesting is taking a look at the binary representation: 1 17 50 57 bd 88 65 9a 7b 51 20 f2 fb. Interestingly enough, it leverages one extra bit which is effectively what allows it to store the extra precision.

Unfortunately, as the library currently stands, this is not possible in Rust Decimal since we cap the mantissa to the 96 bit boundary. This allowed us to make various assumptions in regards to operations allowing us to squeeze out extra speed optimizations. That being said, as part of v2 I think it'd be useful for two modifications:

  1. flags currently has unused bits. At least 15 bits (possibly more) could be leveraged to store additional precision. This does require some additional logic to handle word overflow to support, but is certainly possible.
  2. Allowing different storage types for the decimal type (e.g. u64, u128 etc).

I'm currently experimenting with this however unfortunately it's not a quick fix so doesn't have a known timeline. Allowing flag overflow is a good first step however does require a fair bit of change to support.

I'll keep this example in mind as I investigate solutions, though I will say that the best way to guarantee equality is to use the same library on both sides. e.g. use pyo3 to bind Rust Decimal in Python or vice versa.

@cardoe
Copy link
Contributor

cardoe commented Mar 8, 2023

Since you mentioned pyo3, I'm trying to get support for rust_decimal in via PyO3/pyo3#3016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants