[API Proposal]: Method for creating a List #102100

Mr0N · 2024-05-10T21:31:36Z

Background and motivation

runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs

Line 25 in dc98263

internal T[] _items; // Do not rename (binary serialization)

I suggest adding a method to the List class that would allow creating a new instance of List without copying the array, but simply inserting the array into the middle of the List

API Proposal

var obj = new int[10];
var array = List<int>.CreateAndInsertArrayToList(obj);
class List<T>
{
    public static List<T> CreateAndInsertArrayToList(T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
    }
}

API Usage

Alternative Designs

No response

Risks

No response

dotnet-policy-service · 2024-05-10T21:32:01Z

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

MihaZupan · 2024-05-10T22:13:40Z

The same idea has been rejected in #80311

CyrusNajmabadi · 2024-05-11T01:29:16Z

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

huoyaoyuan · 2024-05-11T02:23:47Z

This can only be possible if we implement and enforce some ownership moving semantic.

Mr0N · 2024-05-11T05:57:21Z

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

Mr0N · 2024-05-11T05:59:16Z

This can only be possible if we implement and enforce some ownership moving semantic.

Well, it's faster if there's an array, for example, and I won't be using it anymore, but I need a List.

Mr0N · 2024-05-11T06:02:02Z

The same idea has been rejected in #80311

Well, it seems like a completely different api

MihaZupan · 2024-05-11T06:13:22Z

Different name, the same functionality

huoyaoyuan · 2024-05-11T06:20:18Z

it's faster

It need to be safe in the first place, to guarantee nobody can misuse the array after.

ImmutableArray.CreateUnsafe does the same thing for ImmutableArray, but clearly put into the dangerous zone. Further more, ImmutableArray gets higher priority for performance than List because of its heavy usage in compiler.

Mr0N · 2024-05-11T06:27:51Z

It need to be safe in the first place, to guarantee nobody can misuse the array after.

Well, then it's already the problem of whoever will be using it; the method doesn't guarantee safety in using the array for anyone. For instance, you can pass the array by "ref" to emphasize this.

class List<T>
{
    public static List<T> CreateAndInsertArrayToList(ref T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
    }
}

Mr0N · 2024-05-11T06:35:34Z

ImmutableArray.CreateUnsafe does the same thing for ImmutableArray, but clearly put into the dangerous zone. Further more, ImmutableArray gets higher priority for performance than List because of its heavy usage in compiler.

Well, there are fewer functions than in a List

huoyaoyuan · 2024-05-11T06:37:50Z

Well, then it's already the problem of whoever will be using it

If the caller need to pay attention around it, the method should be explicitly noted as unsafe: name it like CreateUnsafe, or put it under some unsafe place like CollectionsMarshal.

Unsafe operation also requires more judgement about its benefit. In this case, why the caller gets an array in the first place? Can it be avoided and let the caller add items into a list?

This comes the difference with ImmutableArray:
For ImmutableArray, there wasn't a path for a user to create one ImmutableArray with least overhead possible: ImmutableArray.Add create new instances for each element. ImmutableArrayBuilder, and the space for adding elements with undetermined count.
For List, there is an optimal path for caller: use new List(capacity) to create and set the capacity, then use Add, AddRange, or CollectionsMarshal.AsSpan together with CollectionsMarshal.SetCount to set the elements.

Hope this helps you understanding the judgement.

Mr0N · 2024-05-11T06:42:18Z

For List, there is an optimal path for caller: use new List(capacity) to create and set the capacity, then use Add, AddRange, or CollectionsMarshal.AsSpan together with CollectionsMarshal.SetCount to set the elements.

Sure, but I don't like that, for example, I have an array, I won't use it anymore, I create a list, and it takes this array and rewrites it into a new array and puts it into the list. So, resources are spent on copying, and additional memory is used.

Well, there are all sorts of arrays, but most people use lists.

Mr0N · 2024-05-11T06:44:15Z

If the caller need to pay attention around it, the method should be explicitly noted as unsafe: name it like CreateUnsafe, or put it under some unsafe place like CollectionsMarshal.

Well, I don't know about the name, but it's better that this class is in List.

Mr0N · 2024-05-11T06:46:38Z

Unsafe operation also requires more judgement about its benefit. In this case, why the caller gets an array in the first place? Can it be avoided and let the caller add items into a list?

For this, there's ToList if you need to use the array further.

CyrusNajmabadi · 2024-05-11T16:00:18Z

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

That doesn't address my point. This would require arrays to having an array as the backing field. It would prevent changing that in the future.

Mr0N · 2024-05-11T17:36:05Z

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

That doesn't address my point. This would require arrays to having an array as the backing field. It would prevent changing that in the future.

Why replace an array? And what can be used instead of an array?

CyrusNajmabadi · 2024-05-11T18:16:33Z

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

Mr0N · 2024-05-11T18:34:25Z

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

copy the array from the method I'm talking about into a new storage . Although the array is in a List there since 2000

Mr0N · 2024-05-11T18:36:10Z

class List<T>
{
    public static List<T> CreateAndInsertArrayToList(ref T[] array)
    {
        return new List<T>(array);
    }
    LinkedList _items;
    private List(T[] array)
    {
        foreach(var item in array){
        .....
        }
    }
}

MihaZupan · 2024-05-11T18:37:13Z

That already exists - the List(IEnumerable<T> collection) constructor, which will copy the items for you.

Mr0N · 2024-05-11T18:39:32Z

That already exists - the List(IEnumerable<T> collection) constructor, which will copy the items for you.

I mention in case of a change in the array to some abstract data storage type of the future; however, I highly doubt such a thing will exist

CyrusNajmabadi · 2024-05-11T19:04:17Z

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

copy the array from the method I'm talking about into a new storage .

That would break the very semantics of the method you're proposing. If you want to copy, there are methods for that.

Although the array is in a List there since 2000

As an impl detail. It can be changed if runtime wants to optimize this (which would have wide benefits). This API would prevent that

CyrusNajmabadi · 2024-05-11T19:42:24Z

however, I highly doubt such a thing will exist

It seems very reasonable for this to happen. I would save an extra allocation in the most commonly used collection type.

danmoseley · 2024-05-11T20:08:19Z

a length and the elements inline without an array indirection.

Do we have a proposal for that? It would need GC and runtime work too special case lists similar to strings?

CyrusNajmabadi · 2024-05-11T21:44:23Z

It would need GC and runtime work too special case lists similar to strings

Yes. I would def expect that. But a growable/shrinkable, low overhead, array impl seems like a good fit (esp. given how widespread usage of List<T> is.

Mr0N · 2024-05-12T06:58:07Z

It seems very reasonable for this to happen. I would save an extra allocation in the most commonly used collection type.

If you replace the array with something else, it will disrupt the operation of the List; this collection may not work properly in some cases. If there are any new super modern types besides the array, most likely you will need to write a new List.

CyrusNajmabadi · 2024-05-12T07:25:23Z

If you replace the array with something else, it will disrupt the operation of the List;

Why would it disrupt anything?

this collection may not work properly in some cases.

What cases would it not work properly in?

If there are any new super modern types besides the array, most likely you will need to write a new List.

Why would you need to do that. The impl of List<T> is not specified. Only the contracts of the methods it exposes. None of that contract requires that the backing system be an array.

CyrusNajmabadi · 2024-05-12T07:26:53Z

@Mr0N You'll see that in no way does the documentation of List<T> require an array in any way. You could implement it in any number of ways. And the runtime could certainly optimize this to be not backed by an array if the benefit was thought to be high enough.

Mr0N · 2024-05-12T10:49:32Z

@Mr0N You'll see that in no way does the documentation of List<T> require an array in any way. You could implement it in any number of ways. And the runtime could certainly optimize this to be not backed by an array if the benefit was thought to be high enough.

Yes, they still use reflection.
And if, for example, you change the array to something else, what guarantee is there that this second thing will work for 100 thousand programs that use it? Each program has different usage of this class. There's multihread and so on.

colejohnson66 · 2024-05-12T11:26:54Z

Think of it this way: if the backing array was changed to LinkedList<T>, the contract surface would still function fine (despite the horrendous performance degradation). The API surface makes almost no guarantees about having allocation-less behavior.

However, your proposals entire purpose is to provide a constructor that avoids allocation. Changing the backing array to a LinkedList<T> would violate the guarantee that your proposal provides, as a LinkedList<T> have to be allocated then and elements copied into it. It would end up being the same as using the IEnumerable<T> constructor.

You even mention this yourself by saying the constructor would change from _items=items to just insert one-by-one. You ignore the API contract your proposal implies.

In other words, your proposal would force the array-backed items to be “part” of the API surface. The team would be unable to replace it without violating the contract.

On the contrary, the IEnumerable<T> constructor makes no guarantees about allocations. Changing the backing array to a LinkedList<T> would still fulfill the contract (or lack thereof) of that constructor.

To put it shortly: your proposal is not just an API guarantee (which is easy to maintain), but a performance guarantee. If this API were to exist, the only way to maintain its performance is to force the backing store away to always be an array. Just because it’s been that way for ~25 years doesn’t mean it won’t change.

Lastly, why is this API necessary? This feels like an XY problem. Is performance of new List<int>(myArray) that bad in your application? If so, you should consider the possibility that your architecture is flawed. Specifically, instead of allocating an array and “inserting” it into a list, just allocate the list in the first place and use CollectionsMarshal.SetCount.

Mr0N · 2024-05-12T12:36:52Z

Think of it this way: if the backing array was changed to LinkedList<T>, the contract surface would still function fine (despite the horrendous performance degradation). The API surface makes almost no guarantees about having allocation-less behavior.

However, your proposals entire purpose is to provide a constructor that avoids allocation. Changing the backing array to a LinkedList<T> would violate the guarantee that your proposal provides, as a LinkedList<T> have to be allocated then and elements copied into it. It would end up being the same as using the IEnumerable<T> constructor.

You even mention this yourself by saying the constructor would change from _items=items to just insert one-by-one. You ignore the API contract your proposal implies.

In other words, your proposal would force the array-backed items to be “part” of the API surface. The team would be unable to replace it without violating the contract.

On the contrary, the IEnumerable<T> constructor makes no guarantees about allocations. Changing the backing array to a LinkedList<T> would still fulfill the contract (or lack thereof) of that constructor.

To put it shortly: your proposal is not just an API guarantee (which is easy to maintain), but a performance guarantee. If this API were to exist, the only way to maintain its performance is to force the backing store away to always be an array. Just because it’s been that way for ~25 years doesn’t mean it won’t change.

Lastly, why is this API necessary? This feels like an XY problem. Is performance of new List<int>(myArray) that bad in your application? If so, you should consider the possibility that your architecture is flawed. Specifically, instead of allocating an array and “inserting” it into a list, just allocate the list in the first place and use CollectionsMarshal.SetCount.

No contracts are being ignored here. I propose a static method and a private constructor.
A method that will create a new instance of the class without memory allocation

colejohnson66 · 2024-05-12T12:42:54Z

A method that will create a new instance of the class without memory allocation

That is the definition of an API contract. Saying that the user can use it to avoid allocation is a guarantee the runtime would then have to keep.

In other words, how will you avoid allocations when the backing array no longer exists? As I mentioned, should it be replaced with LinkedList<T>, your API would then have to allocate, violating the contract.

Mr0N · 2024-05-12T13:03:42Z

A method that will create a new instance of the class without memory allocation

That is the definition of an API contract. Saying that the user can use it to avoid allocation is a guarantee the runtime would then have to keep.

In other words, how will you avoid allocations when the backing array no longer exists? As I mentioned, should it be replaced with LinkedList<T>, your API would then have to allocate, violating the contract.

If we change the name of the array inside the List now, somes percentage of programs will error because some may be using reflection.

Sure, over time, we can change the array to a LinkedList, but how do we prove that this implementation definitely adheres to the current contract? Because there are various situations, such as behavior in multithreading and so on.

colejohnson66 · 2024-05-12T13:08:56Z

Reflection for private fields is not part of the API contract. It's an implementation detail, and implementation details are not part of the contract. If implementation details were part of the contract, performance improvements in many scenarios would be impossible.

Programs that use reflection to access the array are misbehaving.

CyrusNajmabadi · 2024-05-12T15:55:02Z

Yes, they still use reflection.

Who is "they"?

Using reflection to get at internal impl details and take a dependency on them is not a supported scenario.

And if, for example, you change the array to something else, what guarantee is there that this second thing will work for 100 thousand programs that use it?

The documented contract of List<T> that I linked to. That's the guarantee.

There's multihread and so on.

Why would this be relevant? Either programs are using the stated contract or not. If they are, they will continue to work. If they are not, then they may break with any change at any time.

CyrusNajmabadi · 2024-05-12T15:57:30Z

but how do we prove that this implementation definitely adheres to the current contract?

Simple. The contract is specified in the documentation. You then test the new impl against that documentation.

The runtime already has tons of tests that do this already.

That's how the runtime already makes changes today, including completely revising the internal implementation of types. If that breaks someone using reflection, that's fine as that's not a supported scenario.

CyrusNajmabadi · 2024-05-12T16:00:26Z

Because there are various situations, such as behavior in multithreading and so on.

The runtime already states what are contracts and what are not. List, for example, provides no guarantees around unsynchronized multi threaded reading and writing. Since there is no contract there, if code takes a dependency on behavior such that it breaks off the impl changes, that's a problem with that code, not the impl.

Again, your view here would preclude the runtime making any changes (including optimizing things). They can and do make changes all the time to impl details that are not part of the contract of a type.

Mr0N · 2024-05-12T16:43:05Z

Doesn't this constructor bind the array to a List?
List<string> dinosaurs = new List<string>(4);
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.-ctor?view=net-8.0#system-collections-generic-list-1-ctor(system-int32)

Mr0N · 2024-05-12T16:47:15Z

This constructor sets the initial length of the array; my method sets the initial array.

CyrusNajmabadi · 2024-05-12T16:47:22Z

Doesn't this constructor bind the array to a List?
List<string> dinosaurs = new List<string>(4);

No. There is no array there at all. All that says is that it:

Initializes a new instance of the List class that is empty and has the specified initial capacity.

The capacity of a List is the number of elements that the List can hold. As elements are added to a List, the capacity is automatically increased as required by reallocating the internal array.

If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List.

The capacity can be decreased by calling the TrimExcess method or by setting the Capacity property explicitly. Decreasing the capacity reallocates memory and copies all the elements in the List.

This constructor is an O(1) operation.

And all that capacity says is:

Gets or sets the total number of elements the internal data structure can hold without resizing.

All that matters with that is:

Retrieving the value of this property is an O(1) operation; setting the property is an O(n) operation, where n is the new capacity.

CyrusNajmabadi · 2024-05-12T16:47:48Z

This constructor sets the initial length of the array; my method sets the initial array.

Again, as an implementation detail. It could pick a different implementation as long as it abides by the stated contract which people can depend on.

Mr0N · 2024-05-12T16:55:47Z

No. There is no array there at all. All that says is that it:

Well, it's as if the programmers who wrote it were thinking about an array. If there's something else there, it might not work correctly.

CyrusNajmabadi · 2024-05-12T17:16:30Z

Yes. That's a 'remark'. A remark gives info, but not a contract.

If there's something else there, it might not work correctly.

It would still work correctly, as long as it abides by the stated contract. Again, this is how the runtiem works. They change internal impl details all the time. Especially for performance. I'm not sure how much more plainly or clearly i can state this.

Mr0N · 2024-05-12T17:29:45Z

Yes. That's a 'remark'. A remark gives info, but not a contract.

If there's something else there, it might not work correctly.

It would still work correctly, as long as it abides by the stated contract. Again, this is how the runtiem works. They change internal impl details all the time. Especially for performance. I'm not sure how much more plainly or clearly i can state this.

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

Mr0N · 2024-05-12T17:35:42Z

public static List<T> CreateAndInsertArrayToListUnsafe(ref T[] array)

colejohnson66 · 2024-05-12T17:44:54Z

How do you expect the team to change the implementation when your proposal, by design, requires a specific implementation?

Mr0N · 2024-05-12T17:50:11Z

class List<T>
{
    public static List<T> CreateAndInsertArrayToListUnsafe(ref T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
        //Here, in theory, we provide additional information for initialization, increase the size, and so on.
    }
}

CyrusNajmabadi · 2024-05-12T18:27:41Z

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

Mr0N · 2024-05-12T19:02:40Z

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

The difference between this and that is that there's IEnumerable there, while here it's an array, and this method is "Unsafe," meaning it's much faster than that constructor, only it's unsafe.

CyrusNajmabadi · 2024-05-12T19:21:48Z

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

The difference between this and that is that there's IEnumerable there, while here it's an array, and this method is "Unsafe," meaning it's much faster than that constructor, only it's unsafe.

But this would now mean we added a method directly documented to be fast, and it would become slow in a future release, causing code to fall off a perf cliff.

No code could take advantage of this as the only reason to would be perf gain, but that perf could easily become a perf-pitfall.

If a method exists for performance, it needs to not regress, as otherwise people can't use it.

That's why ImmutableArray allows this. Because there is a contract about its impl, and guarantees around the marshaling methods. These won't change because it would be a disastrous perf problem for all the code that took a dependency on the documented performance characteristics there.

Clockwork-Muse · 2024-05-12T22:09:40Z

and this method is "Unsafe," meaning it's much faster than that constructor

As a side note, unsafe doesn't automatically make code faster, if that's what you're thinking. All it means is that certain constraints might be relaxed, or that certain behavior has to be carefully observed. There are a number of APIs where there's no speed gain (just that certain operations become possible at all).

Mr0N added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 10, 2024

dotnet-issue-labeler bot added the area-System.Collections label May 10, 2024

dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 10, 2024

Mr0N changed the title ~~[API Proposal]:~~ [API Proposal]: Method for creating a List May 10, 2024

dotnet deleted a comment from Mr0N May 12, 2024

eiriktsarpalis closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024

dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label May 13, 2024

[API Proposal]: Method for creating a List #102100

[API Proposal]: Method for creating a List #102100

Comments

Mr0N commented May 10, 2024

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

dotnet-policy-service bot commented May 10, 2024

MihaZupan commented May 10, 2024

CyrusNajmabadi commented May 11, 2024

huoyaoyuan commented May 11, 2024

Mr0N commented May 11, 2024 • edited

Mr0N commented May 11, 2024

Mr0N commented May 11, 2024

MihaZupan commented May 11, 2024

huoyaoyuan commented May 11, 2024 • edited

Mr0N commented May 11, 2024

Mr0N commented May 11, 2024

huoyaoyuan commented May 11, 2024

Mr0N commented May 11, 2024

Mr0N commented May 11, 2024

Mr0N commented May 11, 2024

CyrusNajmabadi commented May 11, 2024

Mr0N commented May 11, 2024

CyrusNajmabadi commented May 11, 2024

Mr0N commented May 11, 2024

Mr0N commented May 11, 2024

MihaZupan commented May 11, 2024

Mr0N commented May 11, 2024

CyrusNajmabadi commented May 11, 2024 • edited

CyrusNajmabadi commented May 11, 2024

danmoseley commented May 11, 2024

CyrusNajmabadi commented May 11, 2024

Mr0N commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

Mr0N commented May 12, 2024

colejohnson66 commented May 12, 2024 • edited

Mr0N commented May 12, 2024

colejohnson66 commented May 12, 2024 • edited

Mr0N commented May 12, 2024

colejohnson66 commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

Mr0N commented May 12, 2024

Mr0N commented May 12, 2024

CyrusNajmabadi commented May 12, 2024 • edited

CyrusNajmabadi commented May 12, 2024

Mr0N commented May 12, 2024

CyrusNajmabadi commented May 12, 2024

Mr0N commented May 12, 2024

Mr0N commented May 12, 2024 • edited

colejohnson66 commented May 12, 2024

Mr0N commented May 12, 2024 • edited

CyrusNajmabadi commented May 12, 2024

Mr0N commented May 12, 2024

CyrusNajmabadi commented May 12, 2024 • edited

Clockwork-Muse commented May 12, 2024

Mr0N commented May 11, 2024 •

edited

huoyaoyuan commented May 11, 2024 •

edited

CyrusNajmabadi commented May 11, 2024 •

edited

colejohnson66 commented May 12, 2024 •

edited

colejohnson66 commented May 12, 2024 •

edited

CyrusNajmabadi commented May 12, 2024 •

edited

Mr0N commented May 12, 2024 •

edited

Mr0N commented May 12, 2024 •

edited

CyrusNajmabadi commented May 12, 2024 •

edited