Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant time len #182

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

saik0
Copy link
Contributor

@saik0 saik0 commented Feb 7, 2022

Fixes #45
Closes #105

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the constant len in select too?

Comment on lines +20 to +22
fn new(bitmap: &RoaringBitmap) -> Iter {
let size_hint = bitmap.len();
Iter { inner: bitmap.containers.iter().flatten(), size_hint }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain to me the fact #179 removes the size_hint implementation and here we continue to track the length of the bitmap? I feel lost.

Copy link
Contributor Author

@saik0 saik0 Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#176 Has the reasoning

The code it replaced would unconditionally panic. To guess the original author's intent: it was a runtime check that the count is tracked at the outer iterator. Panicking broke some new proptests, so we implemented the correct behavior. However, it is still true that the size hint is unreachable from the exposed API. The iterator is doing extra work the end user can never observe.

I suspect the regression went unnoticed since #125 plus this issue is still a net perf gain.

Proposing we just remove the size hints altogether and return the default (0, None). It will satisfy the validity requirements of tests without needlessly slowing down iteration.

We track the len of the RoaringBitmap here, therefore the the BitmapStore iter size_hint is unreachable by the external api. However, it should not panic, because it is reachable by tests, so it can return the default (0, None) which indicates no size hint.

Comment on lines 58 to +62
pub fn push(&mut self, value: u64) -> bool {
let (hi, lo) = util::split(value);
self.map.entry(hi).or_insert_with(RoaringBitmap::new).push(lo)
let pushed = self.map.entry(hi).or_insert_with(RoaringBitmap::new).push(lo);
self.len += pushed as u64;
pushed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushes value in the treemap only if it is greater than the current maximum value.

This is wrong BTW 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! Please open an issue.

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the new constant len in select too?

@saik0
Copy link
Contributor Author

saik0 commented Feb 7, 2022

I'm not sure what the right approach to testing this is.

@Kerollmops
Copy link
Member

Kerollmops commented Feb 7, 2022

I'm not sure what the right approach to testing this is.

Do you mean benchmarking the constant length or checking that select which uses len() is valid?
You already added tests for the select method so... Nothing more to do.

However, checking that the length is valid is maybe a little bit complex, we could maybe add an iter().count() == .len() after all of our proptests or something?

@saik0
Copy link
Contributor Author

saik0 commented Feb 7, 2022

However, checking that the length is valid is maybe a little bit complex

This is what I meant. I'm trying to think of a clean way to debug_assert that each time the len changes we compare it against the computed len

@Kerollmops
Copy link
Member

This is what I meant. I'm trying to think of a clean way to debug_assert that each time the len changes we compare it against the computed len

Wouldn't it be ok to directly do debug_assert_eq!(self.iter().count(), self.len())? As the macro expansion is done at compile-time, therefore the call to iter/count is only done for debug builds.

I just hope that we haven't implemented the count method on our iterator to only call self.len() 😂

@Kerollmops
Copy link
Member

Kerollmops commented Feb 7, 2022

I just checked in the playground and the macro expansion indeed works like that: it didn't panic when compiling in release.

@saik0
Copy link
Contributor Author

saik0 commented Feb 7, 2022

I just hope that we haven't implemented the count method on our iterator to only call self.len()

Of course it does. Why should it recompute the cached len?

Edit: It's an exact size iterator whose size hint is initialized by RoaringBitmap::len
I believe ExactSizeIterator overrides count to return the exact size

@Kerollmops
Copy link
Member

Of course it does. Why should it recompute the cached len?

Ok so we should trick by doing a fold instead.

@saik0
Copy link
Contributor Author

saik0 commented Feb 7, 2022

Of course it does. Why should it recompute the cached len?

Ok so we should trick by doing a fold instead.

We'd probably have to wrap it in a black_box. LLVM is very clever.

@@ -375,8 +386,6 @@ impl RoaringBitmap {
/// assert_eq!(rb.rank(10), 2)
/// ```
pub fn rank(&self, value: u32) -> u64 {
// if len becomes cached for RoaringBitmap: return len if len > value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Should have been return len if len > max

Computing max is not constant time for BitmapStore

This was referenced Feb 10, 2022
@saik0
Copy link
Contributor Author

saik0 commented Feb 10, 2022

Thinking we should assert invariants #197 before we merge this to be sure we didn't introduce a regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

len should take constant-time
2 participants