Add pick generator specialized for indexed sequences #874

lambdani · 2022-02-27T19:37:10Z

Hi! Would there be interest in a pick generator specialized for IndexedSeqs? When choosing k elements from a sequence with n elements, the idea is to choose an element in the inclusive range [0,n-1], then another one in [0,n-2]... up to [0,n-k]. Then these indices must be translated to the whole range [0,n-1] while avoiding repetitions. For this, one can use a modified version of an order statistic tree that selects the i-th non negative integer not present in the tree.

This should pick k elements in O(k log k) time, using O(k) extra space for the tree. Additionally, the elements should be permuted in random order.

The names are horrible but I couldn't come up with better ones. Any help with that would be appreciated if you think it's worth to add this generator to Scalacheck. What do you think?

satorg · 2022-04-02T06:35:26Z

@lambdani just to clarify: is the suggested behavior different from this one?

def shuffledPick[T](n: Int, seq: IndexedSeq[T]): Gen[collection.Seq[T]] = {
  Arbitrary.arbitrary[Long].flatMap { seed =>
    val shuffledSeq = new scala.util.Random(seed).shuffle(seq)
    Gen.pick(n, shuffledSeq)
  }
}

I mean, I realize it works in a different way, but not sure if there's a difference in the results they both will be providing.

lambdani · 2022-04-02T09:10:45Z

No, it should have the same behavior. The only difference should be asymptotic efficiency (O(k log k) vs O(n)). But I don't know if it's faster in practice for enough use cases, and even then if it's worth the added complexity. I could try to write some benchmarks, but it's OK if you think it's not worth it :-).

Added tests to check red-black tree invariants.

rossabaker

Thanks! A quick benchmark would be interesting to see approximately what size collection is the break-even point.

rossabaker · 2022-08-02T19:56:57Z

src/main/scala/org/scalacheck/Gen.scala

+   *
+   * The elements are guaranteed to be permuted in random order.
+   */
+  def indexedPick[T](n: Int, l: IndexedSeq[T]): Gen[collection.Seq[T]] = {


Names that sort close to their relatives improve discoverability: how about pickIndexed?

rossabaker · 2022-08-02T19:57:47Z

src/main/scala/org/scalacheck/Gen.scala

+  /** A generator that randomly picks a given number of elements from an IndexedSeq
+   *
+   * The elements are guaranteed to be permuted in random order.
+   */


A quick comment on the runtime improvement over pick would be helpful. Perhaps also that it doesn't repeat elements.

Add pick generator specialized for indexed sequences

e153167

Fix error in red-back tree implementation

f9d22b4

Added tests to check red-black tree invariants.

rossabaker reviewed Aug 2, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pick generator specialized for indexed sequences #874

Add pick generator specialized for indexed sequences #874

lambdani commented Feb 27, 2022

satorg commented Apr 2, 2022

lambdani commented Apr 2, 2022 •

edited

rossabaker left a comment

rossabaker Aug 2, 2022

rossabaker Aug 2, 2022

Add pick generator specialized for indexed sequences #874

Are you sure you want to change the base?

Add pick generator specialized for indexed sequences #874

Conversation

lambdani commented Feb 27, 2022

satorg commented Apr 2, 2022

lambdani commented Apr 2, 2022 • edited

rossabaker left a comment

Choose a reason for hiding this comment

rossabaker Aug 2, 2022

Choose a reason for hiding this comment

rossabaker Aug 2, 2022

Choose a reason for hiding this comment

lambdani commented Apr 2, 2022 •

edited