Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify In-Place operation semantics #308

Open
jacksonrnewhouse opened this issue Jun 2, 2021 · 4 comments
Open

Clarify In-Place operation semantics #308

jacksonrnewhouse opened this issue Jun 2, 2021 · 4 comments

Comments

@jacksonrnewhouse
Copy link

Been fooling around with optimizing in-place operations, in particular if we can do better with run containers. My interest is in solutions that are performant both in terms of calculations and allocations. Right now, the semantics for container calls like ior() and iand() are not well specified. They always return a container, that container will always be the union/intersection and it will usually, but not always, be the receiving struct. This ambiguity is mainly handled by always using the returned value, such as at https://github.com/RoaringBitmap/roaring/blob/master/roaring.go#L731 or https://github.com/RoaringBitmap/roaring/blob/master/roaring.go#L554. However, in at least one place we perform a naked ior(), which breaks if the underlying container is not updated with the union result.

My suggestion is to change the semantics so that iand() etc. return nil when the container was actually updated in-place, and a non-nil result value when it wasn't. The downside is that the code would be slightly more verbose. For example, instead of

c1 := rb.highlowcontainer.getWritableContainerAtIndex(pos1)
c2 := x2.highlowcontainer.getContainerAtIndex(pos2)
diff := c1.iand(c2)
if !diff.isEmpty() {
	rb.highlowcontainer.replaceKeyAndContainerAtIndex(intersectionsize, s1, diff, false)
	intersectionsize++
}

it'd be

c1 := rb.highlowcontainer.getWritableContainerAtIndex(pos1)
c2 := x2.highlowcontainer.getContainerAtIndex(pos2)
diff := c1.iand(c2)
if diff == nil && !c1.isEmpty {
        intersectionsize++
} else if !diff.isEmpty() {
	rb.highlowcontainer.replaceKeyAndContainerAtIndex(intersectionsize, s1, diff, false)
	intersectionsize++
}

The main advantages are that it would be more explicit and not have to always reset the value in the bitmap.

It'd be a good piece of work to refactor everywhere, so figured I'd get feedback first.

@lemire
Copy link
Member

lemire commented Jun 3, 2021

The AndAny function may, in fact, be broken. I have opened an issue #309

@lemire
Copy link
Member

lemire commented Jun 3, 2021

Right now, the semantics for container calls like ior() and iand() are not well specified.

This may not be explicit but I think it is well defined. Both ior() and iand() should return the container with the desired answer (always), with the possibility that this container could be the referent in some cases.

... which is not to say that we can't do better... maybe we should....

The main advantages are that it would be more explicit and not have to always reset the value in the bitmap.

Can you elaborate?

It is typically faster and simpler to unconditionally set a value. Additional branches make the code harder to read, to maintain and typically slower.

I am not saying you are wrong. However, it seems like if we are going to refactor (a time intensive proposal), one would like to get simpler code as a result... not more complicated code.

if we really want to make it explicit, then I submit to you that we can think of other means, like...

inplace, answer := mycontainer.ior(othercontainer)

That would be quite explicit then...

Please understand that I am not arguing... I just want to push back a little so I can better understand what you are proposing.

@jacksonrnewhouse
Copy link
Author

I realized AndAny's issues when I refactored array.ior(run) to potentially return a new run container. At first I thought I had found a bug but because of when those unchecked ior calls are made it will always work. BUT, that is only because of a bunch of implementation dependent choices, and the general contract allows for implementations that break it.

Then, I looked at the various implementations, and they seemed to generally attempt to modify the receiver. I'd also be happy to just make explicit that receiver may not be modified

@lemire
Copy link
Member

lemire commented Jun 3, 2021

I'd also be happy to just make explicit that receiver may not be modified

That is the contract for or and and. The difference with iand and ior is that you allow the receiver/referent to be modified. If it was modified, then it is returned, otherwise a new container is returned. With or and and, a new container is always returned. It is crucially important in some cases to be able to do that... otherwise you always need to make copies...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants