Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule proposal: Disallow number literals that lose precision #11279

Closed
not-an-aardvark opened this issue Jan 16, 2019 · 18 comments · Fixed by #12747
Closed

Rule proposal: Disallow number literals that lose precision #11279

not-an-aardvark opened this issue Jan 16, 2019 · 18 comments · Fixed by #12747
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules

Comments

@not-an-aardvark
Copy link
Member

not-an-aardvark commented Jan 16, 2019

Please describe what the rule should do:

This rule would report number literals that immediately lose precision at runtime when converted to a JS Number (due to 64-bit floating-point rounding).

What category of rule is this? (place an "X" next to just one item)

[x] Warns about a potential error (problem)

Provide 2-3 code examples that this rule will warn about:

// invalid
const largeConstant1 = 1234567890123456789;
const largeConstant2 = 1.234567890123456789e19;

largeConstant1 and largeConstant2 will actually end up with a Number of 1123456789012345676800, because the mathematical value 1234567890123456789 is not precisely representable as a JS Number. The use of the digits 89 at the end of the literal are an attempt by the programmer to provide extra precision, but this extra precision doesn't appear at runtime, so the code probably contains a bug.

// invalid
const longDecimal1 = 1.234567890123456789

Similarly, longDecimal1 will end up with a Number value of 1.2345678901234567, effectively dropping the digits 89. The use of the trailing 89 is potentially confusing and probably a bug.

The rule uses significant digits in order to determine the "precision" of a literal. As a result, there are a few cases where the rule would consider a literal to be valid despite the fact that the literal is not precisely representable as a Number. For example, the following examples would be considered valid:

// valid
const shortDecimal1 = 0.11;
const largeConstant3 = 1e23;
const largeConstant4 = 100000000000000000000000;

Although the literal 0.11 gets converted to a mathematical value of 0.11000000000000000056..., the literal 0.11 is a correct approximation of that mathematical value to two significant digits. As a result, the use of the literal 0.11 is probably not a bug.

Similarly, although largeConstant3 and largeConstant4 get converted to a mathematical value of 99999999999999991611392, 1e23 and 100000000000000000000000 are both correct approximations of that mathematical value to one significant digit. As a result, the usage of these literals is probably not a bug.

The choice to not report an error for these cases is the main difference between this proposal and #8447.

Note that the rule would report an error for a literal like 100000000000000000000000.0 (with a decimal point and a zero at the end), because that literal contains 25 significant digits rather than one significant digit, and the literal is not a correct approximation of that mathematical value to 25 significant digits.

The above rules also apply to other representations of number literals (binary, hex, and octal), although those literal representations only support integers. However, the rule would not report an error for BigInt literals, because BigInt literals don't have precision loss.

I considered always reporting an error for imprecise binary/hex/octal literals (without looking at significant digits) because large binary/hex/octal literals are more often used to represent exact values (e.g. memory addresses) rather than approximate measurements. However, I decided to keep the behavior consistent to minimize false positives (e.g. to allow it to be added to eslint:recommended in the future).

Why should this rule be included in ESLint (instead of a plugin)?

This rule would identify bugs resulting from a potentially-confusing aspect of the language, with very few false positives.

For an example of a bug like this, see tc39/proposal-bigint#170. (The code in that example uses the experimental BigInt proposal, but fundamentally the footgun exists independent of the BigInt proposal.)

Are you willing to submit a pull request to implement this rule?

Yes

@not-an-aardvark not-an-aardvark added rule Relates to ESLint's core rules feature This change adds a new feature to ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Jan 16, 2019
@not-an-aardvark not-an-aardvark self-assigned this Jan 16, 2019
@not-an-aardvark
Copy link
Member Author

Open question: If someone writes 1208925819614629174706176 (i.e. 2 ** 80) in a literal, should we report an error? Internally, the floating point representation of the resulting Number would be exactly 2 ** 80, but the number could also be represented by literal 1208925819614629200000000. In a way, the floating-point representation would be accurate but not precise. We could choose to only allow the version with trailing zeros.

(If we choose to allow the exact literal, it could be kind of annoying to figure out what the exact mathematical value actually is without BigInt support, given that Number#toString starts truncating digits after a certain point.)

How about 11111111111111112 and 11111111111111113, which will end up with a Number representation of 11111111111111112? In this case, there are no trailing zeros that the 3 could be replaced with. I think it would make most sense to report 11111111111111113 while allowing 11111111111111112, but this seems slightly inconsistent with the first example.

Maybe the general rule here is to report an error if a significant digit can be replaced with a zero to create the same Number value, or if a significant digit can be replaced with another digit that would result in a closer approximation of the represented mathematical value to the interpreted floating-point value.

@not-an-aardvark not-an-aardvark added accepted There is consensus among the team that this change meets the criteria for inclusion and removed evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion labels Feb 21, 2019
@kaicataldo kaicataldo added help wanted The team would welcome a contribution from the community for this issue Hacktoberfest Recommended issue for those participating in Hacktoberfest https://hacktoberfest.digitalocean.com/ labels Sep 29, 2019
@kaicataldo kaicataldo removed the Hacktoberfest Recommended issue for those participating in Hacktoberfest https://hacktoberfest.digitalocean.com/ label Nov 19, 2019
@jmoore914
Copy link
Contributor

If @not-an-aardvark isn't already working on this, I can take a stab at it!

@platinumazure
Copy link
Member

@jmoore914 Please go right ahead! We look forward to seeing what you come up with. Please reach out here or in our Gitter chat if you need help. Thanks!

@not-an-aardvark
Copy link
Member Author

Thanks for working on this. FWIW I ended up creating a small parser for this rule to try to handle weird cases for counting significant figures (e.g. disambiguating octal integers). Feel free to use it (or not) if you think it would be useful.

@jmoore914
Copy link
Contributor

Thanks @not-an-aardvark, I'll take a look!

This issue got more interesting than I thought it would be. You actually lose one digit of precision if you remove any non-significant zeros. And the same behavior happens if the number contains a decimal point.

So, for instance

// 16 significant digits
const num1 = 8888888888888884
const num2 = 8888888888888883
num===num2
// output: false

// 16 significant digits and 1 zero
const num3 = 88888888888888840
const num4 = 88888888888888830
num3===num4
// output: true

// 16 significant digits and a decimal point 
const num5 = 8.888888888888884
const num6 = 8.888888888888883
num1===num2
// output: true

// 15 significant digits and 11 zeros
const num7 = 88888888888888400000000000
const num8 = 88888888888888300000000000
num7===num8
// output: false

// 16 significant digits and 1 decimal point which is removed in expansion
const num9 =   888888888888888.3e1
const num10 =  888888888888888.2e1
num9===num10
// output: false

// 16 significant digits and 1 zero which is removed in expansion
const num11 =   88888888888888830e-1
const num12 =  88888888888888820e-1
num11===num12
// output: false

So what that seems to indicate is that if there is a non-significant zero or decimal point in the numeric expansion, then you only have precision up to 15 digits.

@jmoore914
Copy link
Contributor

And the behavior I find even odder is that if any of the above conditions applies, you have a lower max safe integer. If you have precision up to 16 sig figs, then the max safe integer is 9007199254740991 but if you only have precision up to 15, the max safe integer is somewhere between 700000000000000 and 800000000000000.

@not-an-aardvark
Copy link
Member Author

The internal representation of a Number doesn't have any notion of significant digits in the traditional sense -- it's just a floating-point value. So it's unsurprising to me that the "16 significant digits" case behaves differently from the "16 significant digits and 1 zero" case, because the latter numbers are 10 times larger.

I think significant digits are only relevant here in the sense that they can help determine user intent based on how a user chooses to represent their number. If someone writes 0.11, they're probably trying to get the closest floating-point number to 0.11 that they can get, even though they (hopefully) realize that it won't be exact. If someone writes 0.11000000000000000057321, they're asking for more precision than the format has the capability to provide.

@not-an-aardvark
Copy link
Member Author

not-an-aardvark commented Dec 30, 2019

Basically, I think the behavior of the rule should be equivalent to the following pseudocode:

  • For every number literal in the program:
    1. let intendedValue = the mathematical value represented by the literal using traditional mathematical notation, (e.g. 0.1 is interpreted as one tenth)
    2. let actualValue = the mathematical value represented by the literal after it's interpreted/rounded as a floating-point Number (e.g. 0.1 is interpreted as 3602879701896397 / 255 ≈ 0.100000000000000005551115123126...)
    3. let numDigits = the number of significant figures used to represent the literal in the source code
    4. Round actualValue to numDigits significant digits, and raise a linting error unless the result is equivalent to intendedValue.

This seems like it would have the desired effect from #11279 (comment) of allowing things like 11111111111111112 but not 11111111111111113.

I'm not sure what the best way to actually implement this is, given that it involves operating on mathematical values that might be repeating decimals or be very large, making them hard to represent with either Number or BigInt. (edit: Actually, the values can't be repeating decimals, which might simplify things.) It seems like it should be possible to use a combination of clever string manipulation, BigInt, and Number#toFixed.

Other implementation notes:

  • Depending on how you represent the mathematical values, checking the "equivalent" condition in step (iv) could be tricky due to the possibility of scientific notation, stray decimal points, etc. It might be useful to normalize "equivalent" values to a canonical representation.
  • "Rounding something to a specific number of significant digits" has a different meaning depending on whether the whether the original literal is binary, hex, or decimal.
  • The implementation might want to handle base-ten numbers and everything else as two different cases. (Number#toFixed can only produce a base-ten output, but conveniently Number#toFixed should only be necessary for base-ten literals anyway since other literals are always integers.) In principle it would be possible to make a fully general implementation that doesn't need separate cases (e.g. by converting to BigInt and calling toString(base)), but this might be more trouble than it's worth since it seems hard to avoid precision losses.

@jmoore914
Copy link
Contributor

OK, here's what I came up with: https://github.com/jmoore914/eslint/tree/no-loss-of-precision

It's working perfectly for decimal values and values with scientific notation. Not sure how to approach the other cases. I'm mainly unsure how to determine the level of precision requested but I'll give it some more thought.

@not-an-aardvark
Copy link
Member Author

Thanks. From taking a look at your implementation it seems like it works by getting rid of any exponential part, adding a small power of ten based on how many significant figures are used, and seeing if the resulting value changes.

This is an interesting approach, but I'm not sure it's going to work well. I don't think we want to report an error for literals like 9007199254740992, because the result would be that there is no way to express that literal regardless of the number of significant figures used. Instead, it would be better to allow 9007199254740992 but disallow 9007199254740993.

Using Number#toPrecision is a very good idea though (it seems like it avoids a lot of the pitfalls of toFixed for this use case).

What are your thoughts on the implementation strategy from #11279 (comment)? To me it seems like that would be less likely to run into weird edge cases, and it might end up being a lot simpler anyway when using toPrecision. As an example, when evaluating 9007199254740993 with that strategy:

  • "9007199254740993" has 16 significant digits.
  • Number("9007199254740993").toPrecision(16) is "9007199254740992".
  • "9007199254740993" and "9007199254740992" don't represent the same mathematical value, so report an error.

At that point the only difficulty for base-ten numbers would be to check whether two different (possibly-scientific-notation) number strings represent the same mathematical value, which seems like a solvable problem by normalizing them both to scientific notation manually.

@jmoore914
Copy link
Contributor

Thanks for taking a look.

If I remove the adding/subtracting the small power of ten, if I understand correctly, it would be implementing the strategy in #11279.

So if we look at the function:

function checkRawMatchesValue(node) {

const requestedPrecision = getRequestedLevelOfPrecision(node);
//     getRequestedLevelOfPrecision("90071992540993")=16
            
return stripToSignificantDigits(node.raw) === stripToSignificantDigits(node.value.toPrecision(requestedPrecision));
// stripToSignificantDigits("90071992540993") = "90071992540993"
// stripToSignificantDigits(90071992540993.toPrecision(16))="90071992540992"
// "90071992540993" !== "90071992540992" so this would report an error
        }

If I remove the +/- check then the only deviation from that strategy is the extra step of stripping the numbers down to sig figs which is how I'm normalizing the numbers and avoiding the need to manually convert to scientific notation.

@not-an-aardvark
Copy link
Member Author

Got it, that makes sense. In that case I think the overall approach seems fine if we remove the +/- check.

The fact that stripToSignificantDigits removes all place value information (including the decimal point and the scientific notation exponent) seems a bit weird to me. It's not technically normalizing numbers since e.g. "1e7" and "1e8" will get "normalized" to the same value. Also, "1.00" and "1.0" will get normalized to different values, which is odd if the desired behavior is to check for equivalent mathematical values (it seems like we actually want to remove all "mathematically-irrelevant" digits with this function, rather than removing significant digits). Admittedly, I can't think of a case where this would cause the rule to return the wrong result, since floating-point precision won't cause numbers to be off by factor of ten.

Some other potentially useful test cases/edge cases to consider:

  • The literal 0, which should be considered to have one significant digit (it might break things if it ends up being counted as zero significant digits by mistake). Note that 0.0 also has one significant digit (not two).
  • Literals that are greater than Number.MAX_VALUE. Number("1e999").toPrecision(1) outputs the string "Infinity" -- we should double-check that this doesn't break anything.
  • Literals with a very large number of significant figures, such as 17498005798264095394980017816940970922825355447145699491406164851279623993595007385788105416184430592 (2333, which has 101 digits). Number.toPrecision throws an error when asked to provide more than 100 digits of precision. I don't think it's too important for the rule to be completely correct in that case, but we should make sure it doesn't crash. One solution could be to pretend the user supplied 100 significant digits, even if they actually supplied more (this could theoretically result in some false negatives if the user was trying to use e.g. 2333 - 1 in some math calculation, but hopefully that's pretty rare). Alternatively, we could just always report an error when there are more than 100 significant digits in a literal.

For non-base-10 literals: Number#toPrecision doesn't allow us to produce non-base-10 strings, but we could shim something like toPrecision with:

function toPrecisionWithBase(number, significantDigits, base) {
    const lastSignificantPlaceValue = base ** (Math.floor(Math.log(number) / Math.log(base)) - significantDigits + 1);
    const roundedValue = Math.round(number / lastSignificantPlaceValue) * lastSignificantPlaceValue;

    // Note: Unlike Number#toPrecision, this does not add any trailing zeros after decimal points.
    return roundedValue.toString(base);
}

Aside from that I think the procedure for non-base-10 numbers should work the same as the procedure for base 10 numbers, except a bit simpler because everything is an integer and there's no scientific notation.

@jmoore914
Copy link
Contributor

Thanks, those are some interesting edge cases! I ended up implementing a function to convert to proper scientific notation since I also felt weird about numbers with different orders of magnitude being converted to the same number.

Aside from that, wouldn't it be even simpler than that? Since there is no way to specify the level of sigificance in non-base 10, I would assume that the user wouls want the exact integer. For instance,
0x20000000000000=900719925474099 in which case I would assume that the user would want that exact value so every decimal would be significant. On the other hand, 0xBEBC200 = 200000000. Since there is no way to signify 200000000.0 or 2.00000000e8 in hex notation, I would again assume that the user would want the exact value. If we make the assumption then that every digit is significant, it would be as simple as node.raw!==node.value.toString(base).

@jmoore914
Copy link
Contributor

Better example:

1230000000000000000000000 doesn't report an error but 1230000000000000000000000.0 does.
How would the user specify that when they input 0x104766F00ECB3A4C00000 which representation they mean?

@not-an-aardvark
Copy link
Member Author

That's a good point. After thinking about it some more, though, does that issue ever change the behavior in practice?

We only care about whether zeros are significant because there are cases where the floating-point approximation of the base b literal "(something) followed by n zeros" is different from the mathematical value of (something) * bn. In these cases, rounding the floating-point value to a certain number of significant figures might result in a value with fewer than n trailing zeros, in which case we might need to report the literal if the trailing zeros were considered significant.

But I think that case is only possible for base 10 literals, because all the other available bases (2, 8, and 16) are powers of two. So adding a trailing zero just has the effect of multiplying by a power of two, which won't create any additional floating-point errors (it just increments the exponent without changing the mantissa). As a result, a binary/octal/hex literal with n trailing zeros will have a floating-point value that is also a multiple of bn. So won't doesn't matter whether the trailing zeros are considered significant, because they'll be correct anyway.

tl;dr: Yes, I think we can just assume users want the exact integer for non-base-10 literals, although the behavior of the rule won't change regardless of whether we make that assumption.

@kaicataldo kaicataldo removed the help wanted The team would welcome a contribution from the community for this issue label Jan 9, 2020
@kaicataldo
Copy link
Member

Seems like #12889 would also be useful and is related to this rule (in that they both enforce safer usages of Numbers). They might be different enough that they could be separate rules, but I at least wanted to at least explore the possibility of combining them since we already have so many core rules.

@not-an-aardvark
Copy link
Member Author

IMO, this rule already contains most of the useful functionality from #12889, so that rule wouldn't be necessary when this one is added.

@mdjermanovic mdjermanovic linked a pull request Feb 9, 2020 that will close this issue
@jdoubleu
Copy link

According to the Contribution Guidelines a rule should be simple and easy to understand.

Checking which "Number" loses precision and which doesn't makes things complicated. Furthermore it does not really solve the problem with wrong usage of large (e.g. 64bit) integers in JavaScript.

@jdoubleu jdoubleu mentioned this issue Feb 11, 2020
14 tasks
nzakas pushed a commit that referenced this issue May 20, 2020
* Created rule and test files

* Working rules and tests

* Working for decimals and scientific notation

* Check all digits match instead of just most precise

* Created rule and test files

* Working rules and tests

* Working for decimals and scientific notation

* Check all digits match instead of just most precise

* Expanded rule to non-base ten numbers

* Added docs and updated config files

* Update docs/rules/no-loss-of-precision.md

Co-Authored-By: Teddy Katz <teddy.katz@gmail.com>

* Update lib/rules/no-loss-of-precision.js

Co-Authored-By: Teddy Katz <teddy.katz@gmail.com>

* Removed rule from recommended

* Renamed functions; fixed function description

* Update lib/rules/no-loss-of-precision.js

Co-Authored-By: Teddy Katz <teddy.katz@gmail.com>

* Removed always-true conditional

* Removed rule from recommended

* Fixing octal cases

* Working with octals

* Changed isNotBaseTen to isBaseTen

* Simplify isBaseTen test

* Added regression tests

* Additional regression test

Co-authored-by: Teddy Katz <teddy.katz@gmail.com>
@eslint-deprecated eslint-deprecated bot locked and limited conversation to collaborators Nov 17, 2020
@eslint-deprecated eslint-deprecated bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Nov 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion archived due to age This issue has been archived; please open a new issue for any further discussion feature This change adds a new feature to ESLint rule Relates to ESLint's core rules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants