Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT produces different asm from IL emit than from source #89685

Open
timcassell opened this issue Jul 29, 2023 · 11 comments
Open

JIT produces different asm from IL emit than from source #89685

timcassell opened this issue Jul 29, 2023 · 11 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@timcassell
Copy link

While refactoring BenchmarkDotNet to call benchmark methods directly instead of through a delegate (dotnet/BenchmarkDotNet#2334), I ran into an issue where the InProcessEmitToolchain is producing different results than the default toolchain. I disassembled it to try to figure out why it was different, and found the only difference is the call instruction.

Default toolchain

call      qword ptr [BenchmarkDotNet.Autogenerated.Runnable_0.__Overhead()]

InProcessEmit

call      BenchmarkDotNet.Autogenerated.Runnable_0.__Overhead()

It wouldn't really be an issue if the workload call also used the same call instruction, but it doesn't, so the overhead measurement is off.

call      qword ptr [ActualWork.IncrementField()]

Is there any way I can make the asm match so we can get correct measurements?

call-direct-default-asm.md
call-direct-inprocess-asm.md

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 29, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 29, 2023
@ghost
Copy link

ghost commented Jul 29, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

While refactoring BenchmarkDotNet to call benchmark methods directly instead of through a delegate (dotnet/BenchmarkDotNet#2334), I ran into an issue where the InProcessEmitToolchain is producing different results than the default toolchain. I disassembled it to try to figure out why it was different, and found the only difference is the call instruction.

Default toolchain

call      qword ptr [BenchmarkDotNet.Autogenerated.Runnable_0.__Overhead()]

InProcessEmit

call      BenchmarkDotNet.Autogenerated.Runnable_0.__Overhead()

It wouldn't really be an issue if the workload call also used the same call instruction, but it doesn't, so the overhead measurement is off.

call      qword ptr [ActualWork.IncrementField()]

Is there any way I can make the asm match so we can get correct measurements?

call-direct-default-asm.md
call-direct-inprocess-asm.md

Author: timcassell
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Jul 29, 2023

Managed calls are always expected to be indirect (square brackets) so it's not clear to me what produced the direct calls, perhaps, those are direct calls to jump-stubs?

@MichalPetryka
Copy link
Contributor

Managed calls are always expected to be indirect (square brackets) so it's not clear to me what produced the direct calls, perhaps, those are direct calls to jump-stubs?

Maybe it's related to the fact that ILEmit is not tiered?

@timcassell
Copy link
Author

timcassell commented Jul 29, 2023

Maybe it's related to the fact that ILEmit is not tiered?

Would that matter here, though? The OverheadActionUnroll etc methods are annotated with AggressiveOptimization, and the __OverheadWrapper and __WorkloadWrapper methods are annotated with NoOptimization, so there should be no tiering involved.

@EgorBo
Copy link
Member

EgorBo commented Jul 30, 2023

Managed calls are always expected to be indirect (square brackets) so it's not clear to me what produced the direct calls, perhaps, those are direct calls to jump-stubs?

Maybe it's related to the fact that ILEmit is not tiered?

They're indirect not because of tiereing, but because of stubs and potential rejit profiler sessions

@JulieLeeMSFT JulieLeeMSFT added the question Answer questions and provide assistance, not an issue with source code or documentation. label Jul 31, 2023
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Jul 31, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 31, 2023
@timcassell
Copy link
Author

I tried making the wrapper method static and passing in the instance for a virtual call.
I tried making a separate class between the benchmark class and generated class in the hierarchy.
I tried making them completely separate classes (no hierarchical relationship).

No matter what I tried, I could not get the overhead and workload calls to have the same assembly call.

This is only an issue in net7.0+, net6.0 has matching assembly for both methods (it uses the direct calls without qword ptr).

@timcassell
Copy link
Author

@EgorBo This issue impacts #89940 (it's part of the fix in my PR).

@EgorBo
Copy link
Member

EgorBo commented Aug 9, 2023

I tried making the wrapper method static and passing in the instance for a virtual call. I tried making a separate class between the benchmark class and generated class in the hierarchy. I tried making them completely separate classes (no hierarchical relationship).

No matter what I tried, I could not get the overhead and workload calls to have the same assembly call.

This is only an issue in net7.0+, net6.0 has matching assembly for both methods (it uses the direct calls without qword ptr).

Are there any steps on how to reproduce this locally?

@timcassell
Copy link
Author

Are there any steps on how to reproduce this locally?

Are you able to pull my fork/branch and check it? If not, I can try to create a simple repro.

@EgorBo
Copy link
Member

EgorBo commented Aug 9, 2023

Are there any steps on how to reproduce this locally?

Are you able to pull my fork/branch and check it? If not, I can try to create a simple repro.

I can clone it but it'd be nice to have exact steps on how to build it and reproduce 🙂

@timcassell
Copy link
Author

exact steps on how to build it and reproduce

In the BenchmarkDotNet.IntegrationTests.ManualRunning there is a test NonEmptyBenchmarksReportsNonZeroTimeAndZeroAllocated_InProcess. Remove the Skip reason and run it in net7.0 with typeof(ActualWork).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

4 participants