Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create bench comparison to CRoaring #204

Open
saik0 opened this issue Feb 11, 2022 · 3 comments
Open

Create bench comparison to CRoaring #204

saik0 opened this issue Feb 11, 2022 · 3 comments

Comments

@saik0
Copy link
Contributor

saik0 commented Feb 11, 2022

No description provided.

@saik0 saik0 mentioned this issue Feb 11, 2022
16 tasks
@Ted-Jiang
Copy link

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊

1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

@tonyabracadabra
Copy link
Contributor

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊

1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

Thanks for making this! Can I conclude that croaring-rs is empirically faster?

@Ted-Jiang
Copy link

i have done some test result of use roaring-rs and croaring-rs in datafusion doing count_distinct one col (use same logic: insert one value one time). Just FYI😊
1million_rows_10thousand_distinct.parquet

bitmap distinct 

(roaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.052 seconds

(croaring-rs)
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 10000                           |
+---------------------------------+
1 row in set. Query took 0.038 seconds.

1million_1million.parquet

roaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.175 seconds(roaring-rs).

croaring-rs
+---------------------------------+
| BITMAPCOUNTDISTINCT(test.value) |
+---------------------------------+
| 631504                          |
+---------------------------------+
1 row in set. Query took 0.052 seconds (croaring-rs).

Thanks for making this! Can I conclude that croaring-rs is empirically faster?

In my test case, yes. use ffi get better performance, but was last year version, maybe there will be huge improvement in rust version now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants