Extend PayloadReader APIs to meet the WinForms requirements #11341

adamsitnik · 2024-05-09T18:02:45Z

@JeremyKuhne I was working on the MSRCs today and managed to make only a small progress with transition to SafePayloadReader APIs. What I have here compiles, but you can revert the last revert to have a temporary state where some APIs have been transitioned and there are compiler errors.

Microsoft Reviewers: Open in CodeFlow

- RecordMap needs to remain internal, but implement IReadOnlyDictionary<int, SerializationRecord> so it can be returned from a public method without exposing the type itself - introduce new PayloadReader.Read overload that can return RecordMap via out parameter - make SerializationRecord.ObjectId public - make MemberReferenceRecord public, expose Reference property

- extend ArrayRecord with ElementTypeName and ElementTypeLibraryName public properties

…needed for writing to System.Windows.Forms where it's used (System.Windows.Forms compiles)

- stop using internal APIs - fix the bugs: - update the record map, so mapping system class of common primtive types to PrimitiveTypeRecord<T> is visible also to those who access it via Id - add missing support for mapping to PrimitiveTypeRecord for IntPtr and UIntPtr - use GetSerializationRecord when unwrapping - remove FormatterTypeStyle.TypesWhenNeeded test (not supported) - remove tests for internal APIs tested in PayloadReader project: - RecordMapTests - NullRecordTests - ClassInfoTests - MemberTypeInfo

- check records by Ids to detect custom comparers for Hashtable - ObjectRecordDeserializer needs to handle ArrayRecords and raw primitive values - add support for jagged arrays with custom offsets (just to rejct them later) - materialize arrays of primitive types using the provided API - when mapping System Classes to PrimitiveTypeRecord<T> the Ids need to be preserved - don't iterate over the same member twice - support arrays of nullable primitives (don't represent them as ArrayRecord<ClassRecord> because they consists of MemberPrimitiveTypedRecord and NullsRecords)

…eds a discussion whether we want to support that or not)

… using any of the internal APIs

Winforms.sln

...rivate.Windows.Core/src/System/Windows/Forms/BinaryFormat/BinaryFormattedObjectExtensions.cs

...e.Windows.Core/src/System/Windows/Forms/BinaryFormat/Deserializer/ArrayRecordDeserializer.cs

adamsitnik · 2024-05-14T18:18:02Z

...c/System/Windows/Forms/BinaryFormat/Deserializer/ClassRecordSerializationInfoDeserializer.cs

            }
+            while (_memberNamesIterator.MoveNext());


the order is no longer guaranteed to be preserved and I know it looks less clean than before, but I really hope that I don't need to expose an array (we are using dictionary to store the member names for security reasons:

https://github.com/adamsitnik/SafePayloadReader/blob/f9beee7176ce0bf29fdea794d4f79de5c024103c/System.Runtime.Serialization.BinaryFormat/Infos/ClassInfo.cs#L40-L45

It does have to be maintained for this to be compatible.

I'm not sure guarding against a lot of member names and member data is that useful. Are they really going to be able to DOS with this? Wouldn't an upper cap on number of members be more practical? I can't imagine any real-world class having more than 64 members that get serialized, picking some large number beyond that might be more useful.

It does have to be maintained for this to be compatible.

I tried to write a failing test for that and I've failed, it seems that the order is preserved as long as we don't mutate the dictionary afterwards: https://stackoverflow.com/a/64596145/5852046 But is it safe to rely on that assumption? I am not convinced.

We could use OrderedDictionary which offers O(1) lookups, but the inserts can be O(n) at worst, but the same is true for Dictionary. (source)

Wouldn't an upper cap on number of members be more practical?

I was thinking about introducing such limit: https://github.com/adamsitnik/SafePayloadReader/blob/f9beee7176ce0bf29fdea794d4f79de5c024103c/System.Runtime.Serialization.BinaryFormat/PayloadOptions.cs#L14

@GrabYourPitchforks Thoughts?

...stem.Private.Windows.Core/src/System/Windows/Forms/BinaryFormat/Deserializer/Deserializer.cs

adamsitnik · 2024-05-14T18:22:20Z

src/System.Private.Windows.Core/tests/BinaryFormatTests/FormatTests/Common/BasicObjectTests.cs

@@ -75,7 +75,7 @@ public void DeserializeStoredObjects(object value, TypeSerializableValue[] seria
            // Explicitly not supporting offset arrays
            from value in BinaryFormatterTests.RawSerializableObjects()
            from FormatterAssemblyStyle assemblyFormat in new[] { FormatterAssemblyStyle.Full, FormatterAssemblyStyle.Simple }
-            from FormatterTypeStyle typeFormat in new[] { FormatterTypeStyle.TypesAlways, FormatterTypeStyle.TypesWhenNeeded, FormatterTypeStyle.XsdString }
+            from FormatterTypeStyle typeFormat in new[] { FormatterTypeStyle.TypesAlways/*, FormatterTypeStyle.TypesWhenNeeded, FormatterTypeStyle.XsdString*/ }


@JeremyKuhne this is important and we need to discuss whether we want to support it (I hope not, as according to my understanding it requires loading types which Payload Reader needs to avoid)

I don't want to support it, just made sure it was possible to do so.

The thing we would want to add back here is FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdString. That always forces strings to go as records.

Note that with that option tests fail again. There is what I believe to be a bug in the serializer where it will put out duplicate negative ids. My RecordMap allows for that as they aren't supposed to be referenced. I didn't try to constrain further, this may only happen for the root record, or there might be some other constraint.

I don't want to support it, just made sure it was possible to do so.

Awesome! I've changed the code to throw NotSupportedException and removed the related code.

FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdString

Thanks for sharing that! I had no idea that FormatterTypeStyle an be used as [Flags] (it's not marked as such...)

There is what I believe to be a bug in the serializer where it will put out duplicate negative ids

Thanks a lot for the hint, it has saved me a lot of debugging

src/System.Private.Windows.Core/tests/BinaryFormatTests/FormatTests/Common/PrimitiveTests.cs

adamsitnik · 2024-05-14T18:23:43Z

...ystem.Private.Windows.Core/tests/BinaryFormatTests/FormatTests/FormattedObject/ArrayTests.cs

-    [MemberData(nameof(ArrayInfo_ParseSuccessData))]
-    public void ArrayInfo_Parse_Success(Stream stream, int expectedId, int expectedLength)
-    {
-        using BinaryReader reader = new(stream);


I've removed these and similar tests as they belong to Payload Reader test suite

They need to be fully covered, so should at least have a tracking issue to make sure we don't lose any of these scenarios.

...m.Private.Windows.Core/tests/BinaryFormatTests/FormatTests/FormattedObject/HashTableTests.cs

src/System.Runtime.Serialization.BinaryFormat/Records/ArrayRecord.cs

src/System.Runtime.Serialization.BinaryFormat/Records/BinaryArrayRecord.cs

src/System.Runtime.Serialization.BinaryFormat/Records/RectangularOrCustomOffsetArrayRecord.cs

src/System.Runtime.Serialization.BinaryFormat/Records/SerializationRecord.cs

src/System.Runtime.Serialization.BinaryFormat/Records/SystemClassWithMembersAndTypesRecord.cs

src/System.Runtime.Serialization.BinaryFormat/Utils/RecordMap.cs

src/System.Windows.Forms/src/Properties/AssemblyInfo.cs

src/System.Windows.Forms/src/System/Windows/Forms/BinaryFormat/ArrayInfo.cs

JeremyKuhne · 2024-05-14T19:52:14Z

...ate.Windows.Core/src/System/Windows/Forms/BinaryFormat/BinaryFormattedObject.TypeResolver.cs

-                return _lastType;
-            }
-
-            _lastLibraryId = libraryId;


Last retrieved is an important perf optimization, we should try and keep that.

libraryId == _lastLibraryId was much faster when the API was accepting Id libraryId (integer). Now the public API provides AssemblyNameInfo and we need to compare strings to check for equality (in this particular case).
I doubt it would move the needle now, as what we have (a dictionary lookup) is very fast (O(1))

JeremyKuhne · 2024-05-14T20:04:07Z

...e.Windows.Core/src/System/Windows/Forms/BinaryFormat/Deserializer/ArrayRecordDeserializer.cs

-            _array = _arrayType is BinaryArrayType.Rectangular
-                ? Array.CreateInstance(_elementType, binaryArray.Lengths.ToArray())
-                : Array.CreateInstance(_elementType, arrayRecord.Length);
+        int[] lengths = new int[arrayRecord.Rank];


This should be stackalloc'ed, should have made that more explicit in my own code.

This should be stackalloc'ed

But there is no overload for Array.CreateInstance that accepts a span (only arrays?)?

From apisof.net:

A feature request is in order, perhaps. You can allocate arrays on the stack with https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-12#inline-arrays.

....Windows.Core/src/System/Windows/Forms/BinaryFormat/Deserializer/ObjectRecordDeserializer.cs

JeremyKuhne · 2024-05-14T20:22:31Z

src/System.Private.Windows.Core/tests/BinaryFormatTests/FormatTests/Common/NullRecordTests.cs

-
-namespace FormatTests.Common;
-
-public abstract class NullRecordTests<T> : SerializationTest<T> where T : ISerializer


These bad input tests need to be retained. As a quick hack you can make them just have the raw data streams.

JeremyKuhne · 2024-05-14T21:51:07Z

src/System.Runtime.Serialization.BinaryFormat/Records/ArraySinglePrimitiveRecord.cs

ReadPrimitiveTypes should be replaced with the implementation I have in BinaryReaderExtensions.ReadPrimitiveArray. ToArray needs to just pass back the existing array. Critical for the allocations. We need to be able to do the same for BinaryArray when it is primitive as well (think byte[,]), but that can come as a work item as long as the object model allows for it. I seem to recall some single dimensional primitive arrays ending up in BinaryArray as well, but I'm not positive.

ToArray needs to just pass back the existing array

I am going to offer that in the API proposal. Something like MoveToImmutable or a boolean flag for ToArray. Do you have any preferences?

ReadPrimitiveTypes should be replaced with the implementation I have in BinaryReaderExtensions.ReadPrimitiveArray

That implementation may be faster, but it has some bugs:

It assumes that every Stream can return Length and Position while non-seekable streams will simply throw:

winforms/src/System.Private.Windows.Core/src/System/IO/BinaryReaderExtensions.cs

Lines 51 to 55 in 31176a4

internal static long Remaining(this BinaryReader reader)

{

Stream stream = reader.BaseStream;

return stream.Length - stream.Position;

}

For certain types it works only for big endian:

winforms/src/System.Private.Windows.Core/src/System/IO/BinaryReaderExtensions.cs

Line 129 in 31176a4

if (sizeof(T) != 1 && !BitConverter.IsLittleEndian)

My goal is to get the API design accepted and merged. Once we get there we can talk about perf (if there are not other more important things to work on). What is important is that all the mentioned optimizations can be achieved with current public API shape.

- FormatterTypeStyle.TypeAlways is a must have, remove unsupported RecordTypes from the public enum and throw NotSupportedException when they are provided - add support for Formatters.FormatterTypeStyle.XsdString - enable tests for FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdStringr

…Name

JeremyKuhne · 2024-05-15T16:26:54Z

...e.Windows.Core/src/System/Windows/Forms/BinaryFormat/Deserializer/ArrayRecordDeserializer.cs

+            ArrayType.Rectangular => _elementType.MakeArrayType(arrayRecord.Rank),
+            _ => _elementType.MakeArrayType()
+        };
+        // Tricky part: for arrays of classes/structs the following record allocates and array of class records


@adamsitnik Is this allocating an additional array or is it just returning the internal data structure?

adamsitnik requested a review from a team as a code owner May 9, 2024 18:02

dotnet-policy-service bot assigned adamsitnik May 9, 2024

adamsitnik requested a review from JeremyKuhne May 9, 2024 18:05

adamsitnik added 3 commits May 10, 2024 18:02

extend PayloadReader public API surface:

297381c

- extend ArrayRecord with ElementTypeName and ElementTypeLibraryName public properties

switch to the new API (System.Private.Windows.Core compiles)

25e123e

remove code that won't be needed for reading, move code that will be …

c6f612d

…needed for writing to System.Windows.Forms where it's used (System.Windows.Forms compiles)

adamsitnik force-pushed the switch branch from a385c85 to c6f612d Compare May 10, 2024 18:00

adamsitnik added 7 commits May 13, 2024 18:19

disable some of the FormatterTypeStyle.TypesWhenNeeded tests (this ne…

7f409ae

…eds a discussion whether we want to support that or not)

move the payload reader code to separate project to ensure we are not…

85c01ff

… using any of the internal APIs

silence the compiler warnings

7b7d485

MemberReferenceRecord does not need to be public

c88443f

rename ClassRecord.GetObject to GetRawValue

c63a18c