Error in Floating Point Representation tool about subnormal numbers? #201

pacalet · 2023-10-10T14:02:49Z

According IEEE 754-2019:

$emin$ shall be $1 − emax$ for all formats (section 3.3, page 17),
when biased exponent $E = 0$ and trailing significand $T \neq 0$, the number is subnormal and the corresponding value is $v = (-1)^S \times 2^{emin} \times (0 + 2^{1-p} \times T)$ (section 3.4, page 19),
for 32 bits precision $p = 24, emax = 127$ (table 3.5, page 23).

As a consequence for 32 bits precision the value of subnormal numbers shall be $v = (-1)^S \times 2^{-126} \times (0 + 2^{-23} \times T)$. The Floating Point Representation tool apparently has a different interpretation and displays equation $v = (-1)^S \times 2^{-127} \times 2^{-23} \times T$. The first screenshot below shows the tool for $S = 0, E = 0, T = 1$. Still, it displays the correct scientific notation: 1.4E-45 instead of the wrong $v = 2^{-127} \times 2^{-23} = 2^{-150} \approx 7e^{-46}$.

I suggest to replace $0 - 127$ with $-126$, $2^{-127}$ with $2^{-126}$ and, while we are at it, denormalized with the new standard subnormal term. Pull request submitted, new screenshot added with the correct display.

The text was updated successfully, but these errors were encountered:

pacalet mentioned this issue Oct 10, 2023

Fix minor bugs in the way FloatRepresentation tool displays subnormal numbers #202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Floating Point Representation tool about subnormal numbers? #201

Error in Floating Point Representation tool about subnormal numbers? #201

pacalet commented Oct 10, 2023 •

edited

Error in Floating Point Representation tool about subnormal numbers? #201

Error in Floating Point Representation tool about subnormal numbers? #201

Comments

pacalet commented Oct 10, 2023 • edited

pacalet commented Oct 10, 2023 •

edited