Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stackalloc in separate method #2374

Merged
merged 1 commit into from
Mar 25, 2024
Merged

stackalloc in separate method #2374

merged 1 commit into from
Mar 25, 2024

Conversation

timcassell
Copy link
Collaborator

Fixes #2366

@AndreyAkinshin
Copy link
Member

@timcassell I do not fully understand this fix and why the original code leads to #2366 (since memory randomization is disabled by default). Could you please provide additional explanations for this problem?

@adamsitnik could you please take a look?

@AndreyAkinshin AndreyAkinshin requested review from adamsitnik and removed request for adamsitnik July 19, 2023 08:54
@timcassell
Copy link
Collaborator Author

timcassell commented Jul 19, 2023

I do not fully understand this fix and why the original code leads to #2366 (since memory randomization is disabled by default). Could you please provide additional explanations for this problem?

Since the benchmark disassembled code was identical between net6 and net7, and I got sporadic measurements with memory randomization, I had a hunch that was the cause. So I moved it to a separate method and tested again, and the measurements were fixed on my machine.

I disassembled the Engine.RunIteration method, but it was too much for me to make sense of. So I also disassembled stackalloc by itself, and the code is indeed different between net6 and net7. I can't really explain why it causes the discrepancy when it's used like this Span<byte> stackMemory = randomizeMemory ? stackalloc byte[random.Next(32)] : Span<byte>.Empty; when it's disabled, but that's what I measured on my machine.

[Edit] My guess is the Span<byte> is actually 1.5 words (1 pointer + 1 int), so the action(invokeCount / unrollFactor); invocation is misaligned.

[Edit2] It is not the Span<byte>. I tried changing it to byte* and keep it in the same method, and I got the same results. It's the stackalloc itself.

@timcassell
Copy link
Collaborator Author

timcassell commented Jul 19, 2023

I wrote a benchmark to compare the difference between master and this PR. It doesn't explain why it makes the result match expected for net7 compared to net6, but it does show that execution is more stable.

Method Runtime Mean Error StdDev Ratio RatioSD Code Size
RandomStackInline .NET 6.0 9.997 ns 0.2196 ns 0.3078 ns 1.00 0.00 200 B
RandomStackInline .NET 7.0 7.519 ns 0.1542 ns 0.1287 ns 0.75 0.02 202 B
RandomStackNoInline .NET 6.0 5.805 ns 0.0134 ns 0.0104 ns 1.00 0.00 60 B
RandomStackNoInline .NET 7.0 5.497 ns 0.0077 ns 0.0068 ns 0.95 0.00 62 B
code

public unsafe class Benchmarks
{
    private readonly Random random = new Random(12345);
    private readonly Consumer consumer = new Consumer();
    private readonly Func<object> func;
    public bool randomizeMemory;

    public Benchmarks() => func = DefaultClass;

    public object DefaultClass() => default;

    [Benchmark]
    public void RandomStackInline()
    {
        Span<byte> stackMemory = randomizeMemory ? stackalloc byte[random.Next(32)] : Span<byte>.Empty;
        consumer.Consume(func());
        Consume(stackMemory);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void Consume(in Span<byte> _) { }

    [Benchmark]
    public void RandomStackNoInline()
    {
        if (randomizeMemory)
        {
            ExecuteWithRandomStack();
        }
        else
        {
            Execute();
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void ExecuteWithRandomStack()
    {
        byte* stackMemory = stackalloc byte[random.Next(32)];
        consumer.Consume(func());
        Consume(stackMemory);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void Execute()
    {
        consumer.Consume(func());
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private void Consume(byte* _) { }
}

The reason must be a combination of the other code in Engine.RunIteration (but even that I'm unsure of, because the issue only manifested with a [GlobalSetup] method).

@AndreyAkinshin
Copy link
Member

Let's wait for a review by @adamsitnik

@timcassell
Copy link
Collaborator Author

timcassell commented Mar 16, 2024

@AndreyAkinshin Can we merge this? It's very low risk and removes potential unknown side-effects in the common case, even though we don't fully understand those side effects. I'm thinking a future follow-up will be to move the stackalloc into the source generator after the refactor in #2111 that moves the clock into the generated code (so we avoid the branch altogether).

P.S. I wonder if that side effect could be contributing to those measurements you obtained in #2334.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to repro the issue (see the details below), but the change is very simple and it should definitely not hurt anything.

WithGlobalSetup.DefaultClass: Job-CYKZFQ(LaunchCount=9)
Runtime = .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 0.000 ns, StdErr = 0.000 ns (NaN%), N = 130, StdDev = 0.000 ns
Min = 0.000 ns, Q1 = 0.000 ns, Median = 0.000 ns, Q3 = 0.000 ns, Max = 0.000 ns
IQR = 0.000 ns, LowerFence = 0.000 ns, UpperFence = 0.000 ns
ConfidenceInterval = [0.000 ns; 0.000 ns] (CI 99.9%), Margin = 0.000 ns (NaN% of Mean)
Skewness = NaN, Kurtosis = NaN, MValue = 2
-------------------- Histogram --------------------
[-0.500 ns ; 0.500 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
---------------------------------------------------

WithoutGlobalSetup.DefaultClass: Job-CYKZFQ(LaunchCount=9)
Runtime = .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2; GC = Concurrent Workstation
Mean = 0.000 ns, StdErr = 0.000 ns (NaN%), N = 132, StdDev = 0.000 ns
Min = 0.000 ns, Q1 = 0.000 ns, Median = 0.000 ns, Q3 = 0.000 ns, Max = 0.000 ns
IQR = 0.000 ns, LowerFence = 0.000 ns, UpperFence = 0.000 ns
ConfidenceInterval = [0.000 ns; 0.000 ns] (CI 99.9%), Margin = 0.000 ns (NaN% of Mean)
Skewness = NaN, Kurtosis = NaN, MValue = 2
-------------------- Histogram --------------------
[-0.500 ns ; 0.500 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet v0.13.13-develop (2024-03-25), Windows 11 (10.0.22631.3296/23H2/2023Update/SunValley3)
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores
.NET SDK 9.0.100-alpha.1.23531.2
  [Host]     : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2
  Job-CYKZFQ : .NET 8.0.3 (8.0.324.11423), X64 RyuJIT AVX2

LaunchCount=9
Type Method Mean Error
WithGlobalSetup DefaultClass 0.0 ns 0.0 ns
WithoutGlobalSetup DefaultClass 0.0 ns 0.0 ns
// * Warnings *
ZeroMeasurement
  WithGlobalSetup.DefaultClass: LaunchCount=9    -> The method duration is indistinguishable from the empty method duration
  WithoutGlobalSetup.DefaultClass: LaunchCount=9 -> The method duration is indistinguishable from the empty method duration

@timcassell timcassell merged commit 4ab69be into master Mar 25, 2024
12 checks passed
@timcassell timcassell deleted the fix-stackalloc-alignment branch March 25, 2024 15:16
@timcassell timcassell added this to the v0.14.0 milestone Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

default(object) takes more time than expected in net7.0
3 participants