Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre decimal-to-double behavior difference on same machine, same code #102149

Closed
cyanite opened this issue May 13, 2024 · 19 comments · Fixed by #102179
Closed

Bizarre decimal-to-double behavior difference on same machine, same code #102149

cyanite opened this issue May 13, 2024 · 19 comments · Fixed by #102179
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@cyanite
Copy link

cyanite commented May 13, 2024

Description

I made a dotnet class library with a single method:

public static double ConvertForMe(decimal d)
{
       return (double)d;
}

I then wrote a small console program that calls this, like this:

var dec = -4619.7403751459050600M;
var doubleDll = DoubleConversion.DoubleConversion.ConvertForMe(dec);
Console.WriteLine($"Decimal:   {dec}");
Console.WriteLine($"Double dll: {doubleDll}");.

I get:

Decimal:   -4619,7403751459050600
Double dll: -4619,740375145904

That's fine, I guess, and this can be reproduced on various machines and online dotnet playgrounds. However, on the same machine using the same, previously compiled, class library, but calling it from inside a larger .NET 8-based server we have running there in test, we instead get:

Decimal:   -4619,7403751459050600
Double dll: -4619,740375145905

Which amounts to a difference in the LSB in the Int64 representation of the double (using BitConverter). Since this is financial software, people are wondering about this, and I can't explain it. This only seems to happen on this particular machine (and only when invoked via our larger server, using Assembly.Load and IoC).

Reproduction Steps

As described above. We haven't been able to reproduce the abnormal result in any other contexts so far.

Expected behavior

The version ending in 4. Well, it's hard to say. I think the version ending in 5 seems more correct, but every other context, machine and online playground seems to give the one ending in 4.

Actual behavior

When we call the method in the dll from our (ultimately .NET 8 console program started as Windows service), we get the version ending in 5.

Regression?

No response

Known Workarounds

No response

Configuration

The problem happens on the same machine, but the environment is slightly different. Common environment variables that might influence this (but that are present both in the 4 and 5 case):

DOTNET_gcServer: 1

Environment variables unique to the case where we get the abnormal behavior:

COMPlus_ReadyToRun: 0
CORECLR_ENABLE_PROFILING: 1
CORECLR_PROFILER: {some guid}
CORECLR_PROFILER_PATH_64: some-path-to-dynatrace/.../oneagentloader.dll

I have tried to manually set the COMPlus variable but it doesn't reproduce the abnormal behavior. I haven't investigated if the tracing can be related.

Other configuration:
.NET 8.0.2
Microsoft Windows 10.0.14393

Both programs run as framework-dependent.

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 13, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 13, 2024
@huoyaoyuan huoyaoyuan added area-System.Numerics and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@huoyaoyuan
Copy link
Member

The actual output doesn't change. What has changed is the formatting of double. You can use BitConverter.DoubleToInt64Bits to verify the underlying bits of double.

See also #59982 etc.

@cyanite
Copy link
Author

cyanite commented May 13, 2024

Right, I will check that. The weird part isn't how the number converts, but how it converts, or formats, differently on the same machine, though. But I will check the bits to see if they are identical. Also, it does seem to be related to Dynatrace in some way, which I'll also check.

@cyanite
Copy link
Author

cyanite commented May 13, 2024

@huoyaoyuan it seems it's not just the string formatting that's different. The raw bits in the "4" case are C0B20BBD8939BE6E whereas they are C0B20BBD8939BE6F in the "5" case.

The abnormal case does, however, stop when Dynatrace is not used, for some reason.

@huoyaoyuan
Copy link
Member

What's your .NET Framework version installed on your system? I can't reproduce it with .NET Framework 4.8.1.

@cyanite
Copy link
Author

cyanite commented May 13, 2024

I assume we have .NET 4.8 installed, I don't know. Whatever comes with Windows 10 version 14393 (which is from 2016, "Redstone"). But all these programs are .NET Core (.NET 8). The installed runtimes on the machine are

Microsoft.AspNetCore.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 8.0.2 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

@cyanite
Copy link
Author

cyanite commented May 13, 2024

All right, the situation with "the problem disappered when Dynatrace was removed" isn't quite that clear. It's much more bizzare:

When the server is started, and the test is done, the normal result is returned (the call is made from a client, which provides the number; the server sends back a long string with output in various formats including the raw bits).

When the test is repeated, without restarting the server (which runs as a Windows service), the correct value is still returned... for a while. After about 1-2 minutes, it starts returning the abnormal value (i.e. the "5" case) instead, from the same running program.

@jkotas
Copy link
Member

jkotas commented May 13, 2024

Looks like a problem with tiered compilation. Repro:

for (;;)
{
    var dec = -4619.7403751459050600M;
    var doubleDll = ConvertForMe(dec);
    Console.WriteLine($"Decimal:   {dec}");
    Console.WriteLine($"Double dll: {doubleDll}");
}


static double ConvertForMe(decimal d)
{
       return (double)d;
}

It starts with -4619.740375145904 and changes to -4619.740375145905 after some time.

@jkotas jkotas added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Numerics labels May 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@cyanite
Copy link
Author

cyanite commented May 13, 2024

@jkotas that sounds promising. I can't exactly reproduce that myself (I also tried with variously switching tiered compilation off or asking for aggressive optimization for the method, or nothing at all), but would the cause then be that the two JIT compilers have a behaviroial difference?

@jkotas
Copy link
Member

jkotas commented May 13, 2024

I can't exactly reproduce that myself

Try:

dotnet new console
<copy and paste my snippet into Program.cs>
dotnet run -c release

would the cause then be that the two JIT compilers have a behaviroial difference?

Yes, it looks like a bug in the JIT compiler.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label May 13, 2024
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 8.0.x, 9.0.0 May 13, 2024
@JulieLeeMSFT
Copy link
Member

@EgorBo, PTAL.

@EgorBo
Copy link
Member

EgorBo commented May 13, 2024

From a quick look, it looks to be somewhere in CoreLib behaving differently for R2R and Tier0 or Tier1. Most likely VarR8FromDec function

@Clockwork-Muse
Copy link
Contributor

Since this is financial software

As a side question, why are you using double here, then? Why not keep it in decimal for the entire flow state? Obviously this is potentially a bug we should fix, but my next reaction would be seeing if I could remove the class of problem entirely.

@EgorBo
Copy link
Member

EgorBo commented May 13, 2024

Alright, so looks like the issue is in "unsigned int -> float" conversion difference for AVX512 (JIT) and SSE2 (crossgen)

@EgorBo
Copy link
Member

EgorBo commented May 13, 2024

Minimal repro:

using System;
using System.Runtime.CompilerServices;

class Prog
{
    public static unsafe void Main()
    {
        double dbl = Test(9303915604039947368);
        Console.WriteLine(*(ulong*)&dbl);
    }

    [MethodImpl(MethodImplOptions.AggressiveOptimization |
                MethodImplOptions.NoInlining)]
    static double Test(ulong x) => x;
}

It prints different values with DOTNET_EnableAVX512F=0 and DOTNET_EnableAVX512F=1 on a CPU with AVX512F

@EgorBo
Copy link
Member

EgorBo commented May 13, 2024

Test AVX512 codegen:

; Assembly listing for method Prog:Test(ulong):double (FullOpts)
       vcvtusi2sd xmm0, rcx
       ret

Test no-AVX512F:

; Assembly listing for method Prog:Test(ulong):double (FullOpts)
       vxorps   xmm0, xmm0, xmm0
       vcvtsi2sd xmm0, xmm0, rcx
       test     rcx, rcx
       jge      SHORT G_M52525_IG03
       vaddsd   xmm0, xmm0, qword ptr [reloc @RWD00]
G_M52525_IG03:
       ret      

RWD00  	dq	43F0000000000000h

@tannergooding does this ring a bell to you? it sounds like AVX512's impl is correct so probably we had a bug in our pre-AVX512 impl?

@tannergooding
Copy link
Member

#43895 tracks the issue, it's known and something we should fix.

It's just been pushed out release after release as "low priority". Given we normalized the floating-point to integer conversions this release, it'd be good for us to also fix the integer to floating-point conversions.

@cyanite
Copy link
Author

cyanite commented May 14, 2024

Since this is financial software

As a side question, why are you using double here, then? Why not keep it in decimal for the entire flow state? Obviously this is potentially a bug we should fix, but my next reaction would be seeing if I could remove the class of problem entirely.

Right. It's life insurance mathematics. Double is used to do expensive calculations that would take too long with decimal. This isn't a money amount as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants