Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortedSet.IsSubsetOf not working as expected #102118

Open
vicsharp-shibusa opened this issue May 11, 2024 · 5 comments · May be fixed by #102249
Open

SortedSet.IsSubsetOf not working as expected #102118

vicsharp-shibusa opened this issue May 11, 2024 · 5 comments · May be fixed by #102249

Comments

@vicsharp-shibusa
Copy link

Description

I don't believe that SortedSet.IsSubsetOf is working as expected. Given this code, shouldn't ss.IsSubsetOf(l) return True?

I'm suspicious that the problem is in my CompareTo function, but I can't see the problem.

I also added a separate loop thinking that the set had to be found sorted inside the other IEnumerable . . . but that doesn't work either.

Can anyone explain why this behavior is happening?

List<Point> l = new();
SortedSet<Point> ss = new();
HashSet<Point> h = new();

for (int i = 0; i < 100; i++)
{
    var p = new Point(i, i);
    l.Add(p);
    l.Add(p);
    h.Add(p);
    h.Add(p);
    ss.Add(p);
    ss.Add(p);
}

// additional loop to get a sorted set.
for (int i = 0; i < 100; i++)
{
    l.Add(new Point(i, i));
}

Console.WriteLine(l.Count); // 300
Console.WriteLine(h.Count); // 100
Console.WriteLine(ss.Count); // 100
Console.WriteLine(h.IsSubsetOf(l));  // True (as expected)
Console.WriteLine(h.IsProperSubsetOf(l)); // False

// This is the line in question
Console.WriteLine(ss.IsSubsetOf(l)); // False ?????????

Console.WriteLine(ss.IsProperSubsetOf(l)); // False

readonly struct Point : IComparable<Point>, IEquatable<Point>
{
    public Point(double x, double y)
    {
        X = x;
        Y = y;
    }

    public double X { get; }
    public double Y { get; }

    public int CompareTo(Point other)
    {
        if (X == other.X)
        {
            return Y.CompareTo(other.Y);
        }
        else
        {
            return X.CompareTo(other.X);
        }
    }

    public bool Equals(Point other)
    {
        return X == other.X && Y == other.Y;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(X, Y);
    }
}

Reproduction Steps

List<Point> l = new();
SortedSet<Point> ss = new();

for (int i = 0; i < 100; i++)
{
    var p = new Point(i, i);
    l.Add(p);
    l.Add(p);
    ss.Add(p);
}

Console.WriteLine(ss.IsSubsetOf(l));

Expected behavior

ss.IsSubsetOf(l) should be True

Actual behavior

ss.IsSubsetOf(l)) is False

Regression?

No response

Known Workarounds

If I remove the second l.Add(p), it works as expected.

Configuration

Windows 11, .NET 8, x64

$ dotnet --list-sdks
8.0.204 [C:\Program Files\dotnet\sdk]

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 11, 2024
@gurustron
Copy link

gurustron commented May 11, 2024

The problem seems to come from the assumption that underlying tree is always balanced.

Minimal repro (for me, .NET 8):

HashSet<int> h = new();
SortedSet<int> ss = new();
var num = 18;
for (int i = 0; i < num; i++)
{
	h.Add(i);
	ss.Add(i);
}

var data = Enumerable.Range(0, num).Append(17);
var hashSetEquals = h.SetEquals(data);
var sortedSetEquals = ss.SetEquals(data);
Console.WriteLine(hashSetEquals);
Console.WriteLine(sortedSetEquals);

Demo @sharplab.io

@gurustron
Copy link

gurustron commented May 11, 2024

From my analysis @Stackoverflow (please correct me if I got something wrong):

As far as I can see simple element count is used to determine the size for BitHelper:

int originalLastIndex = Count;
int intArrayLength = BitHelper.ToIntArrayLength(originalLastIndex);
BitHelper bitHelper = intArrayLength <= StackAllocThreshold ?
    new BitHelper(span.Slice(0, intArrayLength), clear: true) :
    new BitHelper(new int[intArrayLength], clear: false);

But index of the found item is determined the following way:

internal virtual int InternalIndexOf(T item)
{
    Node? current = root;
    int count = 0;
    while (current != null)
    {
        int order = comparer.Compare(item, current.Item);
        if (order == 0)
        {
            return count;
        }

        current = order < 0 ? current.Left : current.Right;
        count = order < 0 ? (2 * count + 1) : (2 * count + 2);
    }

    return -1;
}

Which "overflows" the calculated BitHelper's capacity which results in bitHelper.IsMarked always returning false:

internal bool IsMarked(int bitPosition)
{
  Debug.Assert(bitPosition >= 0);

  uint bitArrayIndex = (uint)bitPosition / IntSize;

  // Workaround for https://github.com/dotnet/runtime/issues/72004
  Span<int> span = _span;
  return
	  bitArrayIndex < (uint)span.Length &&
	  (span[(int)bitArrayIndex] & (1 << ((int)((uint)bitPosition % IntSize)))) != 0;
}

Which lead to this code always marking and counting duplicates:

int index = InternalIndexOf(item);
if (index >= 0)
{
    if (!bitHelper.IsMarked(index))
    {
	    if (hashSet.ContainsKey(item))
	    {
		    Console.WriteLine($"Here {item}");
	    }
	    else
	    {
		    hashSet.Add(item, index);
	    }
	    // item hasn't been seen yet
	    bitHelper.MarkBit(index);
	    uniqueFoundCount++;
    }
}

@danmoseley
Copy link
Member

Same on .NET Framework?

@gurustron
Copy link

gurustron commented May 12, 2024

@danmoseley seems so:

https://dotnetfiddle.net/zXkZuS

Also it seems that this happens not only in the "unbalanced" case. If number of elements is divisible by 32 (the bit size of int used in the calculation):

HashSet<int> h = new HashSet<int>();
SortedSet<int> ss = new SortedSet<int>();
var num = 32;
for (int i = 0; i < num; i++)
{
    h.Add(i);
    ss.Add(i);
}

var data = Enumerable.Repeat(0, 2).SelectMany(i => Enumerable.Range(0, num));
var hashSetEquals = h.SetEquals(data);
var sortedSetEquals = ss.SetEquals(data);
Console.WriteLine(hashSetEquals);
Console.WriteLine(sortedSetEquals);

// balance set
ss = new SortedSet<int>(h.ToList());  // ss = new SortedSet<int>(ss.ToList());
sortedSetEquals = ss.SetEquals(data);
Console.WriteLine("Balanced SortedSet:");
Console.WriteLine(sortedSetEquals); // FALSE too

Demo @sharplab.io

https://dotnetfiddle.net/x9YQHT

Also it seems that for all collection sizes bigger than 64 it happens also. Checked several of them:

https://dotnetfiddle.net/B4mSJp

@Smaug123
Copy link
Contributor

By the way, the following property-based test catches this instantly:

open FsCheck
open System.Collections.Generic
open NUnit.Framework

[<TestFixture>]
module Foo =
    [<Test>]
    let ``SortedSet test, positive case`` () =
        let property (sub : byte list) (super : byte list) =
            let super = sub @ super // or `super @ sub`; any standard way to construct a superset will demonstrate the problem, as will randomising this list
            (SortedSet sub).IsSubsetOf super
            |> (=) ((Set.ofList sub).IsSubsetOf (Set.ofList super))

        Check.QuickThrowOnFailure property

There's already a usage of FsCheck in the .NET runtime tests; it would be very little effort to add dramatically more coverage to the existing tests by converting them to property-based tests.

@eiriktsarpalis eiriktsarpalis added bug and removed untriaged New issue has not been triaged by the area owner labels May 13, 2024
@eiriktsarpalis eiriktsarpalis added this to the Future milestone May 13, 2024
lilinus added a commit to lilinus/runtime that referenced this issue May 14, 2024
@lilinus lilinus linked a pull request May 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants