Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set and Map performance and correctness #46

Open
BridgeAR opened this issue Aug 18, 2017 · 14 comments
Open

Set and Map performance and correctness #46

BridgeAR opened this issue Aug 18, 2017 · 14 comments
Labels

Comments

@BridgeAR
Copy link

I was curious how well chai would handle big sets and maps and checked a couple of things and also stumbled across this PR and I am sorry to say that these benchmarks are somewhat flawed.

The Node.js assert module was never used for comparison because the benchmark required a in-existing function that would be replaced with deep-eql in that case. The result is that the node and deep-eql results are actually both from the same library so they should be pretty much identical. However, that is not always the case and they partially differ by a big margin e.g.

buffer                               x 1,166,637 ops/sec ±6.09% (73 runs sampled)
buffer                        (node) x 611,182 ops/sec ±2.47% (65 runs sampled)
object literal                       x 160,121 ops/sec ±2.89% (81 runs sampled)
object literal                (node) x 392,734 ops/sec ±2.38% (85 runs sampled)

My main concern with these benchmarks is though, that only very simply objects are used. Having nice numbers for those is awesome but they are normally not an issue. If a complex object is used though, this can be really bad.

// Set with 10000 numbers
const actual = new Set(Array(1e4).fill(1).map((_, i) => len - i - 1))
const expected = new Set(Array(1e4).fill(1).map((_, i) => i))

This time compared to Node.js 8.4 (this time for real)

set                                  x 3.89 ops/sec ±1.26% (58 runs sampled)
set                           (node) x 1,387 ops/sec ±0.33% (138 runs sampled)

So Node.js is more then 350 times as fast in this case.

While checking Sets and Maps further I also noticed the following:
When using Sets twice the work is done that is needed in general because the key and the value are both the value when returned in a forEach.

Worse though: comparing Sets or Maps with Objects does not even work properly!

const deepEql = require('./')
const a = new Set([{}, {a:5}])
const b = new Set([{a:5}, {}])
deepEql(a,b) // false

Sorting an array with objects is not possible and that is why this bug exists.

I suggest to have a look at a PR from me that would improve the situation and fix the bug as well nodejs/node#14258.

As chai currently does not have a loose equal comparison the algorithm should be pretty straight forward (otherwise there is quite some extra handling for that).

@meeber meeber added the bug label Aug 21, 2017
@meeber
Copy link
Contributor

meeber commented Aug 21, 2017

Thanks for the issue @BridgeAR. Definitely looks like you've uncovered some problems here. Chai is a 100% community-driven project; we'd love a PR if you can carve out the time!

@BridgeAR
Copy link
Author

I can not promise that I have time for that anytime soon. I have a couple of things that I am working on at the moment.

@escKeyStroke
Copy link

Writing tests for the project I'm working on, I found:

deepEql( new Set([new Set([1,2]), new Set([3,4])]) ,
         new Set([new Set([4,3]), new Set([2,1])]) ); // returns false

So I had to switch to Node's assert.deepEqual
I also learned that Node was not able to compare sets properly until v8.x: nodejs/node#13347

@keithamus
Copy link
Member

@escKeyStroke you want that to return true? Sets are ordered, why would it return true?

@BridgeAR
Copy link
Author

@keithamus

deepEql(new Set([1, 2]), new Set([2, 1])) // passes

This is highly inconsistent to the object version and about Sets being ordered - it is true that it is possible to iterate over the elements in insertion order but due to the nature of Map and Set I would argue it is way more logical to see those as being unordered (as Node.js currently does).

@keithamus
Copy link
Member

This should be fixed then. I feel like we have had this discussion before but I'd challenge that sets should only equal with the same order. Can you give a use-case for why you'd want to compare two unordered sets?

@BridgeAR
Copy link
Author

I personally do not have a use case at hand as I personally never needed to compare sets or maps. But in general I can imagine that people just want to collect data in a set / map and compare this with predefined values in a test and I would argue they are equal, no matter the order of the objects.

@escKeyStroke
Copy link

Take my bicycle, disassemble it and put the pieces in a box. When I (or the app I'm writing, or whatever) see inside the box, I will recognize my bike no matter the order in which you put the pieces inside.

I am using Set in the sense of https://en.wikipedia.org/wiki/Set_(mathematics) "the order in which the elements of a set are listed is irrelevant", which I guess is the sense in which ECMAScript specified.

A discussion you might want to participate in:
"the ECMAScript spec does not specify an OrderedSet but a similar data structure exists in multiple other languages ... the popular ImmutableJS provides an OrderedSet".
https://stackoverflow.com/questions/33089695/how-can-i-sort-an-es6-set

@keithamus
Copy link
Member

Okay, I am enough convinced. However this may be derailing the original issue. Let's focus on fixing the original issue and hopefully fix the other ones alongside.

@keithamus
Copy link
Member

One of the issues we may have here to do with performance - is that IE11 has no way to iterate on sets other than forEach. We could use feature detection here, but keep in mind we must maintain compatibility with IE11 for this.

@meeber
Copy link
Contributor

meeber commented Oct 20, 2017

It's a judgement call but I agree that this module shouldn't consider insertion order when comparing deep equality of Sets and Maps.

@olsonpm
Copy link

olsonpm commented Mar 1, 2018

Just wanted to add that javascript objects never had iteration order defined by the spec which proved to cause a lot of bugs. This is why they specified iteration order for set and map - to prevent this conversation.

Although maps and sets have an iteration order, they are not ordered themselves and cannot be re-ordered without removing and re-inserting values. I feel it would be very unexpected to compare values by insertion order for these constructs.

@BridgeAR
Copy link
Author

BridgeAR commented Mar 1, 2018

@keithamus about performance concerns: this is indeed bad. I would probably do a feature detection in this case. That should be easy. Another way to improve the performance for forEach for the case that they are unequal is to use try catch and throw undefined or null (note: it is important to throw that instead of an error because the stack trace is unnecessary overhead).

For the fast path you can pretty much use my implementation as it is in Node.js. Only the loose equal comparison should be removed in that case (and that makes the implementation much easier). With that implementation the complexity for object cases is lower. Primitive cases are actually O(n) in there.

@nicbou
Copy link

nicbou commented Jan 21, 2022

Am I missing something, or are Sets always unequal, regardless of order?

assert.equal(new Set(), new Set()) returns false for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants