Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when copying between Koloboke sets #63

Open
gauravagarwal opened this issue Jan 8, 2018 · 2 comments
Open

Performance issues when copying between Koloboke sets #63

gauravagarwal opened this issue Jan 8, 2018 · 2 comments

Comments

@gauravagarwal
Copy link

gauravagarwal commented Jan 8, 2018

Hi, due to the way the iteration is performed and new entries are added, If one iterates over one set and copies to another set, the performance is very slow.
Consider the following code snippet:

            final Random r = new Random();
            final int num = (int) (1024 * 1024 * 2.1);
            final HashLongSet set1 = HashLongSets.newMutableSet();
            for (int i = 0; i < num; i++) {
                final long oid = r.nextLong();
                set1.add(oid);
            }

            System.out.println("populated first set..");

            final HashLongSet set2 = HashLongSets.newMutableSet();
            final LongCursor cursor = set1.cursor();
            while (cursor.moveNext()) {
                set2.add(cursor.elem());
            }
            System.out.println("populated first set..");

Is there any way to accelerate the population of second set in this case? I understand that if I knew the expected set size upfront, I could have used it on second set construction and made things faster - but that is not always possible - I could have inserted some conditions in between that determined which output set the value needs to be inserted to, or thrown away completely.

@leventov
Copy link
Owner

I don't understand what you are asking. Please try to re-formulate.

What is "very slow", to what figure do you compare? What do you expect from set copying?

@gauravagarwal
Copy link
Author

Hi - I have better understanding of the problem now and understand that this probably cannot be improved as it is.
What I meant is following:

input: set1
for k in set1 {
 if (cond1(k) {s2.add(k)}
 else if (cond2(k) {s3.add(k)}
 else if (cond3(k) {s4.add(k)}
}

With the above code structure, the addition will be much slow due to the fact that we are iterating over the elements in set1 in reverse order and then when adding to set2, set3, set4, it is leading to a lot of collisions.

Each addition call degenerates into an O(n) operation where n being current size of the set being added to.

I was hoping if there could be an efficient set iteration order that does not lead to the degenerate set add() performance when adding to another koloboke set.

Please let me know if this makes sense; if not, I will post a code snippet that exactly compiles and provides sense of absolute "slowness" I am referring to.

PS:
I got around this problem by copying the elements from set1 into a long[] and then shuffling the long[] and iterating over it (instead of iterating over the set cursor() directly) when adding to set2/set3/set4...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants