Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[System.Linq] Consider adding runtime checks for IReadOnlyCollection<T> in input sources #42254

Open
cmeyertons opened this issue Sep 15, 2020 · 24 comments
Labels
area-System.Linq tenet-performance Performance related issue
Milestone

Comments

@cmeyertons
Copy link

There are many places in the Linq / Collection code that leverage detecting if an IEnumerable<T> is an ICollection<T> to perform optimizations (e.g. presizing a new array, etc.)

List.cs

Because ICollection<T> implements IReadonlyCollection<T>, IReadonlyCollection<T> should be exclusively used in these scenarios to support custom IReadonlyCollection<T> implementations that don't necessary want to expose Add(T item)

Currently, collection authors have to implement ICollection to take advantage of the performance gains and leave Add throwing NotImplementedException to convey proper usage.

@cmeyertons cmeyertons added the tenet-performance Performance related issue label Sep 15, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Sep 15, 2020
@ghost
Copy link

ghost commented Sep 15, 2020

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@stephentoub
Copy link
Member

stephentoub commented Sep 15, 2020

Because ICollection<T> implements IReadonlyCollection<T>

It doesn't.

@EgorBo
Copy link
Member

EgorBo commented Sep 15, 2020

If I recall correctly such suggestions to add (not to replace ICollection) fast paths for IReadOnlyCollection here and there were rejected several times because such casts to covariant interfaces were super slow, however these performance issues were fixed as far as I know (cast caches, inlined checks) so maybe it worth checking if we can add them in some places?

@danmoseley
Copy link
Member

@davidwrighton are these kinds of "optimistic checks for interfaces" indeed much cheaper now than in the past?

@EgorBo
Copy link
Member

EgorBo commented Sep 15, 2020

cc @VSadov

@cmeyertons
Copy link
Author

Because ICollection<T> implements IReadonlyCollection<T>

It doesn't.

As I was! Egg on my face for sure. Apologies, i thought this would be a drop-in request. Thanks for the quick replies

@EgorBo
Copy link
Member

EgorBo commented Sep 15, 2020

@danmosemsft a quick benchmark:

static IEnumerable<string> strings = new List<string>();

[Benchmark]
public bool IsCollection() =>   strings is ICollection<string>;

[Benchmark]
public bool IsReadOnlyList() => strings is IReadOnlyCollection<string>;

.NET Core 2.2:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  2.637 ns | 0.0038 ns | 0.0035 ns |
| IsReadOnlyList  | 41.492 ns | 0.0911 ns | 0.0808 ns |

.NET Core 3.0:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  1.069 ns | 0.0008 ns | 0.0007 ns |
| IsReadOnlyList  | 40.578 ns | 0.0316 ns | 0.0264 ns |

.NET 5.0:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|   IsCollection  |  1.121 ns | 0.0010 ns | 0.0009 ns |
| IsReadOnlyList  |  2.976 ns | 0.0094 ns | 0.0088 ns |

Related PR: dotnet/coreclr#23548

@VSadov
Copy link
Member

VSadov commented Sep 15, 2020

Covariant interfaces are not super slow now.

Cost can vary for both regular interface casts and for fancy ones. Regular interface cast is a linear search, but typically does not need to search far. Cached cast may need to deal with hash collisions, but typically just gets a cached value.

As a veeery rough estimate a fancy cast can be counted as a 2X of a regular interface cast.

In the past the cost of complicated casts was technically unbounded. As you nest variant generics, the cost would go up and considerably. Thus they were avoided by library owners.

@danmoseley
Copy link
Member

Thanks @EgorBo that is indeed much faster.

@davidwrighton
Copy link
Member

It is definitely faster than before, but there is still a non-zero cost to performance when making the suggested change.

I have a few possible concerns here.

  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?
  2. If we mitigate concern WIP: repo consolidation scouting kick-off - make clr build locally on Windows #1 by having checks for both ICollection<T> and IReadOnlyCollection<T> how much of a penalty does making the LINQ functions larger have?
  3. Do we have any concerns around customers who may have implemented IReadOnlyCollection<T> in such a way that it does not match with the behavior of IEnumerable<T>? My guess is that we would not treat such scenarios specially, but it is a real possibility that customers with custom written collections may have incorrect implementations of code that hasn't been tested.
  4. As @VSadov notes, the performance impact is now much less severe, but its not nothing.

@EgorBo
Copy link
Member

EgorBo commented Sep 15, 2020

It is definitely faster than before, but there is still a non-zero cost to performance when making the suggested change.

I have a few possible concerns here.

  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?
  2. If we mitigate concern WIP: repo consolidation scouting kick-off - make clr build locally on Windows #1 by having checks for both ICollection<T> and IReadOnlyCollection<T> how much of a penalty does making the LINQ functions larger have?
  3. Do we have any concerns around customers who may have implemented IReadOnlyCollection<T> in such a way that it does not match with the behavior of IEnumerable<T>? My guess is that we would not treat such scenarios specially, but it is a real possibility that customers with custom written collections may have incorrect implementations of code that hasn't been tested.
  4. As @VSadov notes, the performance impact is now much less severe, but its not nothing.

A good example is Linq's Count: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Linq/src/System/Linq/Count.cs#L11-L46

IReadOnlyCollection<T> check used to apply a penalty for the O(N) foreach-based fallback (note other fast paths), but now that penalty ~15 times smaller.
But of course it depends on how often users have IROC there - I am not in charge to answer 🙂
Currently Count for IReadOnlyCollection<T> input leads to O(N) loop which can be quite slow for large collections.

@AndyAyersMS
Copy link
Member

There were also perf issues with limited numbers of "fast" dictionary slots (see #11971) that should now be (largely) mitigated by the dynamic dictionary expansion added in 5.0.

@EgorBo
Copy link
Member

EgorBo commented Sep 15, 2020

I found quite a few complains or rejected attempts to optimize LINQ for IReadOnlyCollection<T>:

#28651 - LINQ results implicit support for IReadOnlyCollection
#27517 - Performance: Make constructor List(IEnumerable collection) know about IReadOnlyCollection
#26679 - Linq ToDictionary() should presize for IReadOnlyCollection
#14366 - System.Linq performance improvement suggestions (mentions IROC<>)
#24793 - Respect IReadOnlyList in the BCL
#23910 - Add optimized path for IReadOnlyCollection/IReadOnlyList in System.Linq
#18714 - Consider checking for IReadOnlyCollection in Enumerable.ToArray
#27516 - Performance of LINQ .Any() - type check to leverage .Count property? (mentions IROC)
#27517 - Performance: Make constructor List(IEnumerable collection) know about IReadOnlyCollection
dotnet/corefx#28472 - Check for IReadOnlyCollection
#43001 - LINQ IEnumerable extension methods should add special case IReadOnlyCollection<T>

@huoyaoyuan
Copy link
Member

  1. What about customers that only implement ICollection<T> and not IReadOnlyCollection<T>?

This can be expanded to different scenarios:

  1. The collection is implemented by ICollection<T>, but exposed as IReadOnlyCollection<T> in public surface.
    This should be very common. Linq will not be impact here.
  2. The collection is immutable. It only implements IReadOnlyCollection<T>.
    This depends on the actual implementation type:
  • ReadOnlyCollection<T> (including ReadOnlyObservableCollection<T>): while it's designed to be read-only, it still implement the non-readonly interfaces. Not the case.
  • ImmutableArray<T>: it has it's own extension methods of linq to avoid boxing. Won't worry about the default linq implementation.
  • Custom readonly collection: Though this would be definitely impact, it should be a relative uncommon scenario.
  1. The collection is implemented by ICollection<TDerived>, but exposed as IReadOnlyCollection<TBase>, and gets linq called with <TBase>.
    This should be the scenario that's most probably get performance impact. Covariant interface check are slower, but it powers this scenario.

@stephentoub stephentoub added this to the Future milestone Sep 18, 2020
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Sep 18, 2020
@eiriktsarpalis
Copy link
Member

There are a few other interfaces used to determine IEnumerable counts, also not in a subtype relationship with either ICollection<T> or IReadOnlyCollection<T>. Would it make sense to include those as well?

Next step should be to enumerate a list of methods that could benefit from specialization.

@ghost
Copy link

ghost commented Oct 30, 2020

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@adamsitnik
Copy link
Member

Related to #31001

@weitzhandler
Copy link
Contributor

Related: #23337.

@eiriktsarpalis eiriktsarpalis changed the title Reconsider code paths where ICollection<T> is only used to access count for IReadonlyCollection<T> [System.Linq] Consider adding runtime checks for IReadOnlyCollection<T> in input sources Oct 29, 2021
@BlinD-HuNTeR
Copy link

BlinD-HuNTeR commented Jan 6, 2022

Hello everyone! I'm not sure if someone else thought about this before, but I just had an idea that could solve this problem. Why not introduce a new, non-generic interface to the BCL named "ICountable", with nothing more than a "Count" property? Then just make ICollection, ICollection<T> and IReadOnlyCollection<T> all implement this interface. That would easily solve the problem with covariant casts, since we don't even have type parameters anymore. And we could even simplify all the code paths with just a test for "ICountable".

@huoyaoyuan
Copy link
Member

Why not introduce a new, non-generic interface

Adding more interfaces can make things a mess and worse. Not all classes will implement the new interface, so an additional interface check may be required.

@elgonzo
Copy link

elgonzo commented Jan 13, 2022

Is there any progress on the issue?

Even with .NET 6.0, Linq functions which should be able to take advantage of indexed random-access collections, such as Skip(int), don't seem to be able to handle custom read-only collections that for example implement IReadOnlyList<T> but not IList<T> (...and why should they?) without unnecessarily poor performance.

@davidwrighton
Copy link
Member

No, there hasn't been any progress. The general conclusion is that we can't add new interface checks here without changing the interface checking mechanism. We've been kicking around the idea of an optimized type switch operation for a few years, but it would almost certainly make the most common case a little bit slower in exchange for allowing more scenarios to have roughly equivalent performance. However, we haven't built out that low level feature enough to see the practical impact on changing the common patterns in the Linq codebase.

@eiriktsarpalis
Copy link
Member

Which is why we intentionally skipped checks for IReadOnlyCollection<T> in the new TryGetNonEnumeratedCount method (see #54764).

One possible alternative avenue to explore is introducing a common base interface for exposing the count, which should be possible using DIMs. Here's a sketch of that idea. We've generally resisted retrofitting old interfaces with DIMs so far though, since they can be susceptible to both source and runtime breaking changes.

@Ultrafeel
Copy link

Ultrafeel commented Dec 22, 2023

@elgonzo consider this: ICollection<TSource>.IsReadOnly. Why not simply implement ICollection<TSource>?

custom read-only collections that for example implement IReadOnlyList but not IList (...and why should they?)

P.S. I must admit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Linq tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests