Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend PayloadReader APIs to meet the WinForms requirements #11341

Open
wants to merge 14 commits into
base: SafePayloadReader
Choose a base branch
from

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented May 9, 2024

@JeremyKuhne I was working on the MSRCs today and managed to make only a small progress with transition to SafePayloadReader APIs. What I have here compiles, but you can revert the last revert to have a temporary state where some APIs have been transitioned and there are compiler errors.

Microsoft Reviewers: Open in CodeFlow

- RecordMap needs to remain internal, but implement IReadOnlyDictionary<int, SerializationRecord> so it can be returned from a public method without exposing the type itself
- introduce new PayloadReader.Read overload that can return RecordMap via out parameter
- make SerializationRecord.ObjectId public
- make MemberReferenceRecord public, expose Reference property
- extend ArrayRecord with ElementTypeName and ElementTypeLibraryName public properties
…needed for writing to System.Windows.Forms where it's used (System.Windows.Forms compiles)
- stop using internal APIs
- fix the bugs:
	- update the record map, so mapping system class of common primtive types to PrimitiveTypeRecord<T> is visible also to those who access it via Id
	- add missing support for mapping to PrimitiveTypeRecord for IntPtr and UIntPtr
	- use GetSerializationRecord when unwrapping
- remove FormatterTypeStyle.TypesWhenNeeded test (not supported)
- remove tests for internal APIs tested in PayloadReader project:
	- RecordMapTests
	- NullRecordTests
	- ClassInfoTests
	- MemberTypeInfo
- check records by Ids to detect custom comparers for Hashtable
- ObjectRecordDeserializer needs to handle ArrayRecords and raw primitive values
- add support for jagged arrays with custom offsets (just to rejct them later)
- materialize arrays of primitive types using the provided API
- when mapping System Classes to PrimitiveTypeRecord<T> the Ids need to be preserved
- don't iterate over the same member twice
- support arrays of nullable primitives (don't represent them as ArrayRecord<ClassRecord> because they consists of MemberPrimitiveTypedRecord and NullsRecords)
…eds a discussion whether we want to support that or not)
}
while (_memberNamesIterator.MoveNext());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order is no longer guaranteed to be preserved and I know it looks less clean than before, but I really hope that I don't need to expose an array (we are using dictionary to store the member names for security reasons:

https://github.com/adamsitnik/SafePayloadReader/blob/f9beee7176ce0bf29fdea794d4f79de5c024103c/System.Runtime.Serialization.BinaryFormat/Infos/ClassInfo.cs#L40-L45

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have to be maintained for this to be compatible.

I'm not sure guarding against a lot of member names and member data is that useful. Are they really going to be able to DOS with this? Wouldn't an upper cap on number of members be more practical? I can't imagine any real-world class having more than 64 members that get serialized, picking some large number beyond that might be more useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does have to be maintained for this to be compatible.

I tried to write a failing test for that and I've failed, it seems that the order is preserved as long as we don't mutate the dictionary afterwards: https://stackoverflow.com/a/64596145/5852046 But is it safe to rely on that assumption? I am not convinced.

We could use OrderedDictionary which offers O(1) lookups, but the inserts can be O(n) at worst, but the same is true for Dictionary. (source)

Wouldn't an upper cap on number of members be more practical?

I was thinking about introducing such limit: https://github.com/adamsitnik/SafePayloadReader/blob/f9beee7176ce0bf29fdea794d4f79de5c024103c/System.Runtime.Serialization.BinaryFormat/PayloadOptions.cs#L14

@GrabYourPitchforks Thoughts?

@@ -75,7 +75,7 @@ public void DeserializeStoredObjects(object value, TypeSerializableValue[] seria
// Explicitly not supporting offset arrays
from value in BinaryFormatterTests.RawSerializableObjects()
from FormatterAssemblyStyle assemblyFormat in new[] { FormatterAssemblyStyle.Full, FormatterAssemblyStyle.Simple }
from FormatterTypeStyle typeFormat in new[] { FormatterTypeStyle.TypesAlways, FormatterTypeStyle.TypesWhenNeeded, FormatterTypeStyle.XsdString }
from FormatterTypeStyle typeFormat in new[] { FormatterTypeStyle.TypesAlways/*, FormatterTypeStyle.TypesWhenNeeded, FormatterTypeStyle.XsdString*/ }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeremyKuhne this is important and we need to discuss whether we want to support it (I hope not, as according to my understanding it requires loading types which Payload Reader needs to avoid)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to support it, just made sure it was possible to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing we would want to add back here is FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdString. That always forces strings to go as records.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with that option tests fail again. There is what I believe to be a bug in the serializer where it will put out duplicate negative ids. My RecordMap allows for that as they aren't supposed to be referenced. I didn't try to constrain further, this may only happen for the root record, or there might be some other constraint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to support it, just made sure it was possible to do so.

Awesome! I've changed the code to throw NotSupportedException and removed the related code.

FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdString

Thanks for sharing that! I had no idea that FormatterTypeStyle an be used as [Flags] (it's not marked as such...)

There is what I believe to be a bug in the serializer where it will put out duplicate negative ids

Thanks a lot for the hint, it has saved me a lot of debugging

[MemberData(nameof(ArrayInfo_ParseSuccessData))]
public void ArrayInfo_Parse_Success(Stream stream, int expectedId, int expectedLength)
{
using BinaryReader reader = new(stream);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed these and similar tests as they belong to Payload Reader test suite

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be fully covered, so should at least have a tracking issue to make sure we don't lose any of these scenarios.

return _lastType;
}

_lastLibraryId = libraryId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last retrieved is an important perf optimization, we should try and keep that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libraryId == _lastLibraryId was much faster when the API was accepting Id libraryId (integer). Now the public API provides AssemblyNameInfo and we need to compare strings to check for equality (in this particular case).
I doubt it would move the needle now, as what we have (a dictionary lookup) is very fast (O(1))

_array = _arrayType is BinaryArrayType.Rectangular
? Array.CreateInstance(_elementType, binaryArray.Lengths.ToArray())
: Array.CreateInstance(_elementType, arrayRecord.Length);
int[] lengths = new int[arrayRecord.Rank];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be stackalloc'ed, should have made that more explicit in my own code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be stackalloc'ed

But there is no overload for Array.CreateInstance that accepts a span (only arrays?)?

From apisof.net:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A feature request is in order, perhaps. You can allocate arrays on the stack with https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-12#inline-arrays.


namespace FormatTests.Common;

public abstract class NullRecordTests<T> : SerializationTest<T> where T : ISerializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bad input tests need to be retained. As a quick hack you can make them just have the raw data streams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadPrimitiveTypes should be replaced with the implementation I have in BinaryReaderExtensions.ReadPrimitiveArray. ToArray needs to just pass back the existing array. Critical for the allocations. We need to be able to do the same for BinaryArray when it is primitive as well (think byte[,]), but that can come as a work item as long as the object model allows for it. I seem to recall some single dimensional primitive arrays ending up in BinaryArray as well, but I'm not positive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToArray needs to just pass back the existing array

I am going to offer that in the API proposal. Something like MoveToImmutable or a boolean flag for ToArray. Do you have any preferences?

ReadPrimitiveTypes should be replaced with the implementation I have in BinaryReaderExtensions.ReadPrimitiveArray

That implementation may be faster, but it has some bugs:

  1. It assumes that every Stream can return Length and Position while non-seekable streams will simply throw:

    internal static long Remaining(this BinaryReader reader)
    {
    Stream stream = reader.BaseStream;
    return stream.Length - stream.Position;
    }

  2. For certain types it works only for big endian:

if (sizeof(T) != 1 && !BitConverter.IsLittleEndian)

My goal is to get the API design accepted and merged. Once we get there we can talk about perf (if there are not other more important things to work on). What is important is that all the mentioned optimizations can be achieved with current public API shape.

- FormatterTypeStyle.TypeAlways is a must have, remove unsupported RecordTypes from the public enum and throw NotSupportedException when they are provided
- add support for Formatters.FormatterTypeStyle.XsdString
- enable tests for FormatterTypeStyle.TypesAlways | FormatterTypeStyle.XsdStringr
ArrayType.Rectangular => _elementType.MakeArrayType(arrayRecord.Rank),
_ => _elementType.MakeArrayType()
};
// Tricky part: for arrays of classes/structs the following record allocates and array of class records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik Is this allocating an additional array or is it just returning the internal data structure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants