Replace `BufferCache` with `ArrayPool<T>` #8841

onkrot · 2024-02-24T05:43:14Z

Fixes #

Main PR

Description

Replace internal BufferCache with generic ArrayPool<T>

Customer Impact

Small performance improvement

Regression

None

Testing

None

Risk

Low risk

Microsoft Reviewers: Open in CodeFlow

pchaurasia14 · 2024-02-24T11:22:49Z

@onkrot - Thank you for this PR. Looks nice. Were you able to benchmark the perf impact of this change at your end?

onkrot · 2024-02-25T05:20:17Z

For the benchmark, I have taken several considerations:

Single-threaded environment
No other code interacts with buffer cache
Amount of glyphs per font can be arbitrary but not exceed 65536

So, with all the above considerations, I have come up with this benchmark:

using BenchmarkDotNet.Attributes;
using MS.Internal.FontCache;
using System.Buffers;

namespace Bench
{
    public class BenchCache
    {
        [Params(100, 200)]
        public int FontCount { get; set; }

        readonly Random random = new(0);

        [Benchmark]
        public void BufferCacheUint()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = BufferCache.GetUInts(random.Next(65536));
                BufferCache.ReleaseUInts(array);
            }
        }

        [Benchmark]
        public void BufferCacheUshort()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = BufferCache.GetUShorts(random.Next(65536));
                BufferCache.ReleaseUShorts(array);
            }
        }

        [Benchmark]
        public void BufferCacheGlyph()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = BufferCache.GetGlyphMetrics(random.Next(65536));
                BufferCache.ReleaseGlyphMetrics(array);
            }
        }

        [Benchmark]
        public void ArrayPoolUint()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = ArrayPool<uint>.Shared.Rent(random.Next(65536));
                ArrayPool<uint>.Shared.Return(array);
            }
        }

        [Benchmark]
        public void ArrayPoolUshort()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = ArrayPool<ushort>.Shared.Rent(random.Next(65536));
                ArrayPool<ushort>.Shared.Return(array);
            }
        }

        [Benchmark]
        public void ArrayPoolGlyph()
        {
            for (int i = 0; i < FontCount; i++)
            {
                var array = ArrayPool<GlyphMetrics>.Shared.Rent(random.Next(65536));
                ArrayPool<GlyphMetrics>.Shared.Return(array);
            }
        }
    }
}

which has yielded these results:

// * Summary *

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3155/23H2/2023Update/SunValley3)
AMD Ryzen 9 7950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.100-preview.1.24101.4
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Method	FontCount	Mean	Error	StdDev
BufferCacheUint	100	115.958 us	1.3134 us	1.0967 us
BufferCacheUshort	100	80.768 us	1.4098 us	1.3187 us
BufferCacheGlyph	100	1,220.442 us	23.6995 us	42.1259 us

ArrayPoolUint	100	1.324 us	0.0022 us	0.0020 us
ArrayPoolUshort	100	1.326 us	0.0030 us	0.0024 us
ArrayPoolGlyph	100	1.318 us	0.0022 us	0.0019 us

BufferCacheUint	200	231.818 us	1.8254 us	1.4251 us
BufferCacheUshort	200	162.415 us	1.3721 us	1.0712 us
BufferCacheGlyph	200	2,415.900 us	47.8346 us	103.9884 us

ArrayPoolUint	200	2.646 us	0.0076 us	0.0064 us
ArrayPoolUshort	200	2.807 us	0.0101 us	0.0084 us
ArrayPoolGlyph	200	2.637 us	0.0060 us	0.0050 us

onkrot · 2024-02-25T06:37:48Z

In an ideal scenario (with the number of elements < 1024) BufferCache is faster:

Method	FontCount	Mean	Error	StdDev	Allocated
BufferCacheUint	100	1.075 us	0.0042 us	0.0038 us	-
BufferCacheUshort	100	1.086 us	0.0156 us	0.0130 us	-
BufferCacheGlyph	100	1.037 us	0.0032 us	0.0027 us	-
ArrayPoolUint	100	1.323 us	0.0037 us	0.0029 us	-
ArrayPoolUshort	100	1.496 us	0.0024 us	0.0020 us	-
ArrayPoolGlyph	100	1.366 us	0.0250 us	0.0234 us	-
BufferCacheUint	200	2.093 us	0.0057 us	0.0047 us	-
BufferCacheUshort	200	2.205 us	0.0065 us	0.0061 us	-
BufferCacheGlyph	200	2.358 us	0.0446 us	0.0478 us	-
ArrayPoolUint	200	2.731 us	0.0450 us	0.0399 us	-
ArrayPoolUshort	200	2.638 us	0.0068 us	0.0053 us	-
ArrayPoolGlyph	200	2.676 us	0.0076 us	0.0060 us	-

onkrot · 2024-02-26T17:12:25Z

The results for increased max array size for BufferCache up to 1024 * 1024 (default max size for ArrayPool<T>.Shared)

Method	FontCount	Mean	Error	StdDev	Allocated
BufferCacheUint	100	600.2 ns	7.01 ns	6.56 ns	-
BufferCacheUshort	100	596.4 ns	11.35 ns	11.15 ns	-
BufferCacheGlyph	100	747.6 ns	11.35 ns	10.62 ns	-
ArrayPoolUint	100	729.9 ns	11.08 ns	10.37 ns	-
ArrayPoolUshort	100	731.0 ns	8.61 ns	8.05 ns	-
ArrayPoolGlyph	100	732.3 ns	11.12 ns	10.40 ns	-
BufferCacheUint	200	1,200.8 ns	23.06 ns	21.57 ns	-
BufferCacheUshort	200	1,195.4 ns	23.28 ns	22.86 ns	-
BufferCacheGlyph	200	1,187.0 ns	17.25 ns	16.13 ns	-
ArrayPoolUint	200	1,533.5 ns	1.18 ns	0.99 ns	-
ArrayPoolUshort	200	1,466.3 ns	1.93 ns	1.81 ns	-
ArrayPoolGlyph	200	1,456.8 ns	18.23 ns	17.05 ns	-

lindexi · 2024-03-01T11:29:47Z

src/Microsoft.DotNet.Wpf/src/PresentationCore/System/Windows/Media/Typeface.cs

@@ -632,7 +642,7 @@ out idealWidth
            GlyphTypeface glyphTypeface = TryGetGlyphTypeface();
            Invariant.Assert(glyphTypeface != null);

-            MS.Internal.Text.TextInterface.GlyphMetrics[] glyphMetrics = BufferCache.GetGlyphMetrics(charBufferRange.Length);
+            MS.Internal.Text.TextInterface.GlyphMetrics[] glyphMetrics = ArrayPool<MS.Internal.Text.TextInterface.GlyphMetrics>.Shared.Rent(charBufferRange.Length);


Thank you for your contribution. Sorry, I don't believe that placing GlyphMetrics into the ArrayPool is a genuine point of optimization. The reason is that GlyphMetrics is an array type that is only used in this context, unlike the ushort or byte, and thus it cannot be reused with other modules through ArrayPool. In order to create a shared GlyphMetrics array, it would be necessary to create additional objects such as SharedArrayPool<GlyphMetrics>, which seems to have a higher cost than benefit. More importantly, the GlyphMetrics array is an object that is frequently used and then released. It can be assumed that virtually all modern WPF applications have text, meaning they constantly require the acquisition and return of GlyphMetrics arrays. In this case, it might be worth considering using a local variable to cache it. Perhaps the cost of using a local variable for caching is the lowest, and the benefit is the highest.

Type Count Size ReferenceSize

SharedArrayPool+Partitions<GlyphMetrics>[] 1 240 240

GlyphMetrics[] 1 152 152

ConditionalWeakTable+Entry<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object>[] 1 152 152

ConditionalWeakTable+Container<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object> 1 56 264

ConditionalWeakTable<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object> 1 40 328

SharedArrayPool<GlyphMetrics> 1 40 608

Ok, let me group proposed solutions:

GlyphMetrics[] - approach similar to BufferCache without automatic release

ConditionalWeakTable - automatic release of ArrayPool? SharedArrayPool already has a mechanism to detect unused arrays and free them on garbage collection event

SharedArrayPool - proposed solution

Also, perhaps we could use stackalloc array here if the number of elements is not too high (maybe 150 (for 4096 / sizeof(GlyphMetrics)))

ThomasGoulet73 · 2024-03-05T02:41:12Z

@onkrot Thank you for this PR. Unfortunately, your benchmark is not representative because BufferCache will not cache arrays with more than 1024 items (See the limit here). So in your benchmark BufferCache mostly never uses the cached arrays and allocates on the heap instead. I would also discourage the use of Random in a benchmark, it can make the results unreliable.

Running your benchmark with a fixed buffer size of 1024 on my machine shows that BufferCache is faster than ArrayPool (Probably caused by overhead of ArrayPool being a generic cache compared to BufferCache with its 3 supported array types).

| Method            | FontCount | Mean       | Error   | StdDev  |
|------------------ |---------- |-----------:|--------:|--------:|
| BufferCacheUint   | 100       |   601.3 ns | 1.65 ns | 1.47 ns |
| BufferCacheUshort | 100       |   601.0 ns | 1.12 ns | 1.05 ns |
| BufferCacheGlyph  | 100       |   590.1 ns | 1.30 ns | 1.16 ns |
| ArrayPoolUint     | 100       |   761.9 ns | 1.69 ns | 1.58 ns |
| ArrayPoolUshort   | 100       |   741.3 ns | 1.76 ns | 1.65 ns |
| ArrayPoolGlyph    | 100       |   743.2 ns | 1.69 ns | 1.58 ns |
| BufferCacheUint   | 200       | 1,208.8 ns | 1.38 ns | 1.29 ns |
| BufferCacheUshort | 200       | 1,210.1 ns | 2.28 ns | 2.02 ns |
| BufferCacheGlyph  | 200       | 1,199.1 ns | 1.71 ns | 1.60 ns |
| ArrayPoolUint     | 200       | 1,522.8 ns | 2.72 ns | 2.41 ns |
| ArrayPoolUshort   | 200       | 1,615.8 ns | 3.82 ns | 3.57 ns |
| ArrayPoolGlyph    | 200       | 1,480.6 ns | 1.75 ns | 1.56 ns |

ThomasGoulet73 · 2024-03-05T05:15:14Z

After looking back at this PR it seems like I missed the fact that you already mention the 1024 items limit so you can disregard my comment. Though I think my suggestion of removing the use of Random in your benchmark still applies.

For this PR, it looks like the benefit is improved performance for larger arrays at the cost of reduced performance for smaller arrays ?

onkrot · 2024-03-08T09:51:55Z

@ThomasGoulet73 As the benchmark shows - yes. But the BufferCache has another flaw I could not catch up with in the benchmarks: the cached array's linear growth. So with bad input, it will allocate a new array whenever we need it. ArrayPool on the other hand has exponential buckets that can handle this task with fewer allocations.

lindexi · 2024-03-08T11:39:43Z

@onkrot With the common business logic, It hard to get the bad input.

Replace BufferCache with ArrayPool<T>

2c5eaeb

onkrot requested a review from a team as a code owner February 24, 2024 05:43

dotnet-policy-service bot requested review from dipeshmsft and singhashish-wpf February 24, 2024 05:43

dotnet-policy-service bot assigned onkrot Feb 24, 2024

dotnet-policy-service bot added PR metadata: Label to tag PRs, to facilitate with triage Community Contribution A label for all community Contributions labels Feb 24, 2024

lindexi reviewed Mar 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `BufferCache` with `ArrayPool<T>` #8841

Replace `BufferCache` with `ArrayPool<T>` #8841

onkrot commented Feb 24, 2024 •

edited by dotnet-policy-service bot

pchaurasia14 commented Feb 24, 2024

onkrot commented Feb 25, 2024 •

edited

onkrot commented Feb 25, 2024 •

edited

onkrot commented Feb 26, 2024

lindexi Mar 1, 2024 •

edited

onkrot Mar 9, 2024

ThomasGoulet73 commented Mar 5, 2024

ThomasGoulet73 commented Mar 5, 2024

onkrot commented Mar 8, 2024 •

edited

lindexi commented Mar 8, 2024

Type	Count	Size	ReferenceSize
`SharedArrayPool+Partitions<GlyphMetrics>[]`	1	240	240
`GlyphMetrics[]`	1	152	152
`ConditionalWeakTable+Entry<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object>[]`	1	152	152
`ConditionalWeakTable+Container<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object>`	1	56	264
`ConditionalWeakTable<SharedArrayPool+ThreadLocalArray<GlyphMetrics>[], Object>`	1	40	328
`SharedArrayPool<GlyphMetrics>`	1	40	608

Replace BufferCache with ArrayPool<T> #8841

Are you sure you want to change the base?

Replace BufferCache with ArrayPool<T> #8841

Conversation

onkrot commented Feb 24, 2024 • edited by dotnet-policy-service bot

Description

Customer Impact

Regression

Testing

Risk

Microsoft Reviewers: Open in CodeFlow

pchaurasia14 commented Feb 24, 2024

onkrot commented Feb 25, 2024 • edited

onkrot commented Feb 25, 2024 • edited

onkrot commented Feb 26, 2024

lindexi Mar 1, 2024 • edited

Choose a reason for hiding this comment

onkrot Mar 9, 2024

Choose a reason for hiding this comment

ThomasGoulet73 commented Mar 5, 2024

ThomasGoulet73 commented Mar 5, 2024

onkrot commented Mar 8, 2024 • edited

lindexi commented Mar 8, 2024

Replace `BufferCache` with `ArrayPool<T>` #8841

Replace `BufferCache` with `ArrayPool<T>` #8841

onkrot commented Feb 24, 2024 •

edited by dotnet-policy-service bot

onkrot commented Feb 25, 2024 •

edited

onkrot commented Feb 25, 2024 •

edited

lindexi Mar 1, 2024 •

edited

onkrot commented Mar 8, 2024 •

edited