Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny but above subnormal numbers not handled correctly #209

Open
lindstro opened this issue Aug 13, 2023 · 0 comments
Open

Tiny but above subnormal numbers not handled correctly #209

lindstro opened this issue Aug 13, 2023 · 0 comments
Labels

Comments

@lindstro
Copy link
Member

As mentioned in #119, blocks with all subnormal but nonzero numbers are not encoded correctly because of reciprocal overflow, which can be avoided using the ZFP_WITH_DAZ compile-time macro. However, there are even larger, normal numbers that cause problems for zfp due to how they are currently converted to integers here:

/* map floating-point number x to integer relative to exponent e */
static Scalar
_t1(quantize, Scalar)(Scalar x, int e)
{
return LDEXP(x, ((int)(CHAR_BIT * sizeof(Scalar)) - 2) - e);
}
/* forward block-floating-point transform to signed integers */
static void
_t1(fwd_cast, Scalar)(Int* iblock, const Scalar* fblock, uint n, int emax)
{
/* compute power-of-two scale factor s */
Scalar s = _t1(quantize, Scalar)(1, emax);
/* compute p-bit int y = s*x where x is floating and |y| <= 2^(p-2) - 1 */
do
*iblock++ = (Int)(s * *fblock++);
while (--n);
}

When the largest (in magnitude) normal value in a block is strictly smaller than 2-98 ≈ 3.2e-30 for floats or 2-962 ≈ 2.6e-290 for doubles, overflow in the computation of s occurs (even if ZFP_WITH_DAZ is enabled). While very small, both of these numbers are well above the smallest normal numbers FLT_MIN = 2-126 and DBL_MIN = 2-1022, respectively.

While less performant, a potential solution is to make use of division by 1 / s instead of multiplication by s for numbers in this range by computing 1 / s via ldexp directly, which is guaranteed not to underflow. Note that s (and 1 / s) is always an integer power of two, so the same result is obtained whether division or multiplication is used (when no overflow occurs). This solution is more general than ZFP_WITH_DAZ, as it also correctly handles all-subnormals.

@lindstro lindstro added the bug label Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant