Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pick generator specialized for indexed sequences #874

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lambdani
Copy link

Hi! Would there be interest in a pick generator specialized for IndexedSeqs? When choosing k elements from a sequence with n elements, the idea is to choose an element in the inclusive range [0,n-1], then another one in [0,n-2]... up to [0,n-k]. Then these indices must be translated to the whole range [0,n-1] while avoiding repetitions. For this, one can use a modified version of an order statistic tree that selects the i-th non negative integer not present in the tree.

This should pick k elements in O(k log k) time, using O(k) extra space for the tree. Additionally, the elements should be permuted in random order.

The names are horrible but I couldn't come up with better ones. Any help with that would be appreciated if you think it's worth to add this generator to Scalacheck. What do you think?

@satorg
Copy link
Contributor

satorg commented Apr 2, 2022

@lambdani just to clarify: is the suggested behavior different from this one?

def shuffledPick[T](n: Int, seq: IndexedSeq[T]): Gen[collection.Seq[T]] = {
  Arbitrary.arbitrary[Long].flatMap { seed =>
    val shuffledSeq = new scala.util.Random(seed).shuffle(seq)
    Gen.pick(n, shuffledSeq)
  }
}

I mean, I realize it works in a different way, but not sure if there's a difference in the results they both will be providing.

@lambdani
Copy link
Author

lambdani commented Apr 2, 2022

No, it should have the same behavior. The only difference should be asymptotic efficiency (O(k log k) vs O(n)). But I don't know if it's faster in practice for enough use cases, and even then if it's worth the added complexity. I could try to write some benchmarks, but it's OK if you think it's not worth it :-).

Added tests to check red-black tree invariants.
Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A quick benchmark would be interesting to see approximately what size collection is the break-even point.

*
* The elements are guaranteed to be permuted in random order.
*/
def indexedPick[T](n: Int, l: IndexedSeq[T]): Gen[collection.Seq[T]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names that sort close to their relatives improve discoverability: how about pickIndexed?

/** A generator that randomly picks a given number of elements from an IndexedSeq
*
* The elements are guaranteed to be permuted in random order.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick comment on the runtime improvement over pick would be helpful. Perhaps also that it doesn't repeat elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants