Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Method for creating a List #102100

Closed
Mr0N opened this issue May 10, 2024 · 52 comments
Closed

[API Proposal]: Method for creating a List #102100

Mr0N opened this issue May 10, 2024 · 52 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections

Comments

@Mr0N
Copy link

Mr0N commented May 10, 2024

Background and motivation

internal T[] _items; // Do not rename (binary serialization)

I suggest adding a method to the List class that would allow creating a new instance of List without copying the array, but simply inserting the array into the middle of the List

API Proposal

var obj = new int[10];
var array = List<int>.CreateAndInsertArrayToList(obj);
class List<T>
{
    public static List<T> CreateAndInsertArrayToList(T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
    }
}

API Usage

Alternative Designs

No response

Risks

No response

@Mr0N Mr0N added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

@Mr0N Mr0N changed the title [API Proposal]: [API Proposal]: Method for creating a List May 10, 2024
@MihaZupan
Copy link
Member

The same idea has been rejected in #80311

@CyrusNajmabadi
Copy link
Member

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

@huoyaoyuan
Copy link
Member

This can only be possible if we implement and enforce some ownership moving semantic.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

This can only be possible if we implement and enforce some ownership moving semantic.

Well, it's faster if there's an array, for example, and I won't be using it anymore, but I need a List.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

The same idea has been rejected in #80311

Well, it seems like a completely different api

@MihaZupan
Copy link
Member

Different name, the same functionality

@huoyaoyuan
Copy link
Member

huoyaoyuan commented May 11, 2024

it's faster

It need to be safe in the first place, to guarantee nobody can misuse the array after.

ImmutableArray.CreateUnsafe does the same thing for ImmutableArray, but clearly put into the dangerous zone. Further more, ImmutableArray gets higher priority for performance than List because of its heavy usage in compiler.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

It need to be safe in the first place, to guarantee nobody can misuse the array after.

Well, then it's already the problem of whoever will be using it; the method doesn't guarantee safety in using the array for anyone. For instance, you can pass the array by "ref" to emphasize this.

class List<T>
{
    public static List<T> CreateAndInsertArrayToList(ref T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
    }
}

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

ImmutableArray.CreateUnsafe does the same thing for ImmutableArray, but clearly put into the dangerous zone. Further more, ImmutableArray gets higher priority for performance than List because of its heavy usage in compiler.

Well, there are fewer functions than in a List

@huoyaoyuan
Copy link
Member

Well, then it's already the problem of whoever will be using it

If the caller need to pay attention around it, the method should be explicitly noted as unsafe: name it like CreateUnsafe, or put it under some unsafe place like CollectionsMarshal.

Unsafe operation also requires more judgement about its benefit. In this case, why the caller gets an array in the first place? Can it be avoided and let the caller add items into a list?

This comes the difference with ImmutableArray:
For ImmutableArray, there wasn't a path for a user to create one ImmutableArray with least overhead possible: ImmutableArray.Add create new instances for each element. ImmutableArrayBuilder, and the space for adding elements with undetermined count.
For List, there is an optimal path for caller: use new List(capacity) to create and set the capacity, then use Add, AddRange, or CollectionsMarshal.AsSpan together with CollectionsMarshal.SetCount to set the elements.

Hope this helps you understanding the judgement.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

For List, there is an optimal path for caller: use new List(capacity) to create and set the capacity, then use Add, AddRange, or CollectionsMarshal.AsSpan together with CollectionsMarshal.SetCount to set the elements.

Sure, but I don't like that, for example, I have an array, I won't use it anymore, I create a list, and it takes this array and rewrites it into a new array and puts it into the list. So, resources are spent on copying, and additional memory is used.

Well, there are all sorts of arrays, but most people use lists.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

If the caller need to pay attention around it, the method should be explicitly noted as unsafe: name it like CreateUnsafe, or put it under some unsafe place like CollectionsMarshal.

Well, I don't know about the name, but it's better that this class is in List.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

Unsafe operation also requires more judgement about its benefit. In this case, why the caller gets an array in the first place? Can it be avoided and let the caller add items into a list?

For this, there's ToList if you need to use the array further.

@CyrusNajmabadi
Copy link
Member

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

That doesn't address my point. This would require arrays to having an array as the backing field. It would prevent changing that in the future.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

Wouldn't this mean that List would be required to have an array as the backing store? That seems problematic for any future development in this space.

user creates a List not through the constructor but through a static method, they should understand that this array will be used in the List.

That doesn't address my point. This would require arrays to having an array as the backing field. It would prevent changing that in the future.

Why replace an array? And what can be used instead of an array?

@CyrusNajmabadi
Copy link
Member

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

copy the array from the method I'm talking about into a new storage . Although the array is in a List there since 2000

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

class List<T>
{
    public static List<T> CreateAndInsertArrayToList(ref T[] array)
    {
        return new List<T>(array);
    }
    LinkedList _items;
    private List(T[] array)
    {
        foreach(var item in array){
        .....
        }
    }
}

@MihaZupan
Copy link
Member

That already exists - the List(IEnumerable<T> collection) constructor, which will copy the items for you.

@Mr0N
Copy link
Author

Mr0N commented May 11, 2024

That already exists - the List(IEnumerable<T> collection) constructor, which will copy the items for you.

I mention in case of a change in the array to some abstract data storage type of the future; however, I highly doubt such a thing will exist

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented May 11, 2024

Any other impl.

Or a specialized one from the runtime. For example a length and the elements inline without an array indirection.

Your proposal would force the impl to stay an array forever.

copy the array from the method I'm talking about into a new storage .

That would break the very semantics of the method you're proposing. If you want to copy, there are methods for that.

Although the array is in a List there since 2000

As an impl detail. It can be changed if runtime wants to optimize this (which would have wide benefits). This API would prevent that

@CyrusNajmabadi
Copy link
Member

however, I highly doubt such a thing will exist

It seems very reasonable for this to happen. I would save an extra allocation in the most commonly used collection type.

@danmoseley
Copy link
Member

a length and the elements inline without an array indirection.

Do we have a proposal for that? It would need GC and runtime work too special case lists similar to strings?

@CyrusNajmabadi
Copy link
Member

It would need GC and runtime work too special case lists similar to strings

Yes. I would def expect that. But a growable/shrinkable, low overhead, array impl seems like a good fit (esp. given how widespread usage of List<T> is.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

It seems very reasonable for this to happen. I would save an extra allocation in the most commonly used collection type.

If you replace the array with something else, it will disrupt the operation of the List; this collection may not work properly in some cases. If there are any new super modern types besides the array, most likely you will need to write a new List.

@CyrusNajmabadi
Copy link
Member

If you replace the array with something else, it will disrupt the operation of the List;

Why would it disrupt anything?

this collection may not work properly in some cases.

What cases would it not work properly in?

If there are any new super modern types besides the array, most likely you will need to write a new List.

Why would you need to do that. The impl of List<T> is not specified. Only the contracts of the methods it exposes. None of that contract requires that the backing system be an array.

@CyrusNajmabadi
Copy link
Member

@Mr0N You'll see that in no way does the documentation of List<T> require an array in any way. You could implement it in any number of ways. And the runtime could certainly optimize this to be not backed by an array if the benefit was thought to be high enough.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

@Mr0N You'll see that in no way does the documentation of List<T> require an array in any way. You could implement it in any number of ways. And the runtime could certainly optimize this to be not backed by an array if the benefit was thought to be high enough.

Yes, they still use reflection.
And if, for example, you change the array to something else, what guarantee is there that this second thing will work for 100 thousand programs that use it? Each program has different usage of this class. There's multihread and so on.

@colejohnson66
Copy link

colejohnson66 commented May 12, 2024

Think of it this way: if the backing array was changed to LinkedList<T>, the contract surface would still function fine (despite the horrendous performance degradation). The API surface makes almost no guarantees about having allocation-less behavior.

However, your proposals entire purpose is to provide a constructor that avoids allocation. Changing the backing array to a LinkedList<T> would violate the guarantee that your proposal provides, as a LinkedList<T> have to be allocated then and elements copied into it. It would end up being the same as using the IEnumerable<T> constructor.

You even mention this yourself by saying the constructor would change from _items=items to just insert one-by-one. You ignore the API contract your proposal implies.

In other words, your proposal would force the array-backed items to be “part” of the API surface. The team would be unable to replace it without violating the contract.

On the contrary, the IEnumerable<T> constructor makes no guarantees about allocations. Changing the backing array to a LinkedList<T> would still fulfill the contract (or lack thereof) of that constructor.

To put it shortly: your proposal is not just an API guarantee (which is easy to maintain), but a performance guarantee. If this API were to exist, the only way to maintain its performance is to force the backing store away to always be an array. Just because it’s been that way for ~25 years doesn’t mean it won’t change.

Lastly, why is this API necessary? This feels like an XY problem. Is performance of new List<int>(myArray) that bad in your application? If so, you should consider the possibility that your architecture is flawed. Specifically, instead of allocating an array and “inserting” it into a list, just allocate the list in the first place and use CollectionsMarshal.SetCount.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

Think of it this way: if the backing array was changed to LinkedList<T>, the contract surface would still function fine (despite the horrendous performance degradation). The API surface makes almost no guarantees about having allocation-less behavior.

However, your proposals entire purpose is to provide a constructor that avoids allocation. Changing the backing array to a LinkedList<T> would violate the guarantee that your proposal provides, as a LinkedList<T> have to be allocated then and elements copied into it. It would end up being the same as using the IEnumerable<T> constructor.

You even mention this yourself by saying the constructor would change from _items=items to just insert one-by-one. You ignore the API contract your proposal implies.

In other words, your proposal would force the array-backed items to be “part” of the API surface. The team would be unable to replace it without violating the contract.

On the contrary, the IEnumerable<T> constructor makes no guarantees about allocations. Changing the backing array to a LinkedList<T> would still fulfill the contract (or lack thereof) of that constructor.

To put it shortly: your proposal is not just an API guarantee (which is easy to maintain), but a performance guarantee. If this API were to exist, the only way to maintain its performance is to force the backing store away to always be an array. Just because it’s been that way for ~25 years doesn’t mean it won’t change.

Lastly, why is this API necessary? This feels like an XY problem. Is performance of new List<int>(myArray) that bad in your application? If so, you should consider the possibility that your architecture is flawed. Specifically, instead of allocating an array and “inserting” it into a list, just allocate the list in the first place and use CollectionsMarshal.SetCount.

No contracts are being ignored here. I propose a static method and a private constructor.
A method that will create a new instance of the class without memory allocation

@colejohnson66
Copy link

colejohnson66 commented May 12, 2024

A method that will create a new instance of the class without memory allocation

That is the definition of an API contract. Saying that the user can use it to avoid allocation is a guarantee the runtime would then have to keep.

In other words, how will you avoid allocations when the backing array no longer exists? As I mentioned, should it be replaced with LinkedList<T>, your API would then have to allocate, violating the contract.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

A method that will create a new instance of the class without memory allocation

That is the definition of an API contract. Saying that the user can use it to avoid allocation is a guarantee the runtime would then have to keep.

In other words, how will you avoid allocations when the backing array no longer exists? As I mentioned, should it be replaced with LinkedList<T>, your API would then have to allocate, violating the contract.

If we change the name of the array inside the List now, somes percentage of programs will error because some may be using reflection.

Sure, over time, we can change the array to a LinkedList, but how do we prove that this implementation definitely adheres to the current contract? Because there are various situations, such as behavior in multithreading and so on.

@colejohnson66
Copy link

Reflection for private fields is not part of the API contract. It's an implementation detail, and implementation details are not part of the contract. If implementation details were part of the contract, performance improvements in many scenarios would be impossible.

Programs that use reflection to access the array are misbehaving.

@CyrusNajmabadi
Copy link
Member

Yes, they still use reflection.

Who is "they"?

Using reflection to get at internal impl details and take a dependency on them is not a supported scenario.

And if, for example, you change the array to something else, what guarantee is there that this second thing will work for 100 thousand programs that use it?

The documented contract of List<T> that I linked to. That's the guarantee.

There's multihread and so on.

Why would this be relevant? Either programs are using the stated contract or not. If they are, they will continue to work. If they are not, then they may break with any change at any time.

@CyrusNajmabadi
Copy link
Member

but how do we prove that this implementation definitely adheres to the current contract?

Simple. The contract is specified in the documentation. You then test the new impl against that documentation.

The runtime already has tons of tests that do this already.

That's how the runtime already makes changes today, including completely revising the internal implementation of types. If that breaks someone using reflection, that's fine as that's not a supported scenario.

@CyrusNajmabadi
Copy link
Member

Because there are various situations, such as behavior in multithreading and so on.

The runtime already states what are contracts and what are not. List, for example, provides no guarantees around unsynchronized multi threaded reading and writing. Since there is no contract there, if code takes a dependency on behavior such that it breaks off the impl changes, that's a problem with that code, not the impl.

Again, your view here would preclude the runtime making any changes (including optimizing things). They can and do make changes all the time to impl details that are not part of the contract of a type.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

Doesn't this constructor bind the array to a List?
List<string> dinosaurs = new List<string>(4);
https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.-ctor?view=net-8.0#system-collections-generic-list-1-ctor(system-int32)

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

This constructor sets the initial length of the array; my method sets the initial array.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented May 12, 2024

Doesn't this constructor bind the array to a List?
List<string> dinosaurs = new List<string>(4);

No. There is no array there at all. All that says is that it:

Initializes a new instance of the List class that is empty and has the specified initial capacity.

The capacity of a List is the number of elements that the List can hold. As elements are added to a List, the capacity is automatically increased as required by reallocating the internal array.

If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List.

The capacity can be decreased by calling the TrimExcess method or by setting the Capacity property explicitly. Decreasing the capacity reallocates memory and copies all the elements in the List.

This constructor is an O(1) operation.

And all that capacity says is:

Gets or sets the total number of elements the internal data structure can hold without resizing.

All that matters with that is:

Retrieving the value of this property is an O(1) operation; setting the property is an O(n) operation, where n is the new capacity.

@CyrusNajmabadi
Copy link
Member

This constructor sets the initial length of the array; my method sets the initial array.

Again, as an implementation detail. It could pick a different implementation as long as it abides by the stated contract which people can depend on.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

No. There is no array there at all. All that says is that it:

image

Well, it's as if the programmers who wrote it were thinking about an array. If there's something else there, it might not work correctly.

@dotnet dotnet deleted a comment from Mr0N May 12, 2024
@CyrusNajmabadi
Copy link
Member

Yes. That's a 'remark'. A remark gives info, but not a contract.

If there's something else there, it might not work correctly.

It would still work correctly, as long as it abides by the stated contract. Again, this is how the runtiem works. They change internal impl details all the time. Especially for performance. I'm not sure how much more plainly or clearly i can state this.

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

Yes. That's a 'remark'. A remark gives info, but not a contract.

If there's something else there, it might not work correctly.

It would still work correctly, as long as it abides by the stated contract. Again, this is how the runtiem works. They change internal impl details all the time. Especially for performance. I'm not sure how much more plainly or clearly i can state this.

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

public static List<T> CreateAndInsertArrayToListUnsafe(ref T[] array)

@colejohnson66
Copy link

How do you expect the team to change the implementation when your proposal, by design, requires a specific implementation?

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

class List<T>
{
    public static List<T> CreateAndInsertArrayToListUnsafe(ref T[] array)
    {
        return new List<T>(array);
    }
    T[] _items;
    private List(T[] array)
    {
        _items = array;
        //Here, in theory, we provide additional information for initialization, increase the size, and so on.
    }
}

@CyrusNajmabadi
Copy link
Member

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

@Mr0N
Copy link
Author

Mr0N commented May 12, 2024

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

The difference between this and that is that there's IEnumerable there, while here it's an array, and this method is "Unsafe," meaning it's much faster than that constructor, only it's unsafe.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented May 12, 2024

let them change the implementation; the semantics of this method are that it takes the initial data for "List", which is stored in an array

List already has a way to do that. It is the list constructor that takes another collection.

The semantics of that constructor are exactly that it takes the initial data for "List",

The difference between this and that is that there's IEnumerable there, while here it's an array, and this method is "Unsafe," meaning it's much faster than that constructor, only it's unsafe.

But this would now mean we added a method directly documented to be fast, and it would become slow in a future release, causing code to fall off a perf cliff.

No code could take advantage of this as the only reason to would be perf gain, but that perf could easily become a perf-pitfall.

If a method exists for performance, it needs to not regress, as otherwise people can't use it.

That's why ImmutableArray allows this. Because there is a contract about its impl, and guarantees around the marshaling methods. These won't change because it would be a disastrous perf problem for all the code that took a dependency on the documented performance characteristics there.

@Clockwork-Muse
Copy link
Contributor

and this method is "Unsafe," meaning it's much faster than that constructor

As a side note, unsafe doesn't automatically make code faster, if that's what you're thinking. All it means is that certain constraints might be relaxed, or that certain behavior has to be carefully observed. There are a number of APIs where there's no speed gain (just that certain operations become possible at all).

@eiriktsarpalis eiriktsarpalis closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests

8 participants