Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ArrayOps bugs (by avoiding ArraySeq#array, which does not guarantee element type) #9641

Merged
merged 1 commit into from Jun 2, 2021

Conversation

som-snytt
Copy link
Contributor

@som-snytt som-snytt commented May 25, 2021

In a way,

Fixes scala/bug#12403

(which regressed in #9365)

@scala-jenkins scala-jenkins added this to the 2.13.7 milestone May 25, 2021
@SethTisue SethTisue added library:collections PRs involving changes to the standard collection library prio:blocker release blocker (used only by core team, only near release time) release-notes worth highlighting in next release notes labels May 25, 2021
@SethTisue
Copy link
Member

SethTisue commented Jun 1, 2021

On the face of it, this looks like a quick merge. But (after discussion with Dale) I actually want to slow down and understand if a better fix is possible.

According to the Scaladoc for ArraySeq, ArraySeq is within its rights to be using an Array[Object] here:

/** The underlying array. Its element type does not have to be equal to the element type of this ArraySeq. A primitive

  • ArraySeq can be backed by an array of boxed values and a reference ArraySeq can be backed by an array of a supertype
  • or subtype of the element type. */

so if empty is within contract here, why is it wrong for intersect to use it?

Must empty be a footgun?

@SethTisue
Copy link
Member

SethTisue commented Jun 1, 2021

in ArrayOps.scala we see:

  def intersect[B >: A](that: Seq[B]): Array[A] =
    mutable.ArraySeq.make(xs).intersect(that).array.asInstanceOf[Array[A]]

this seems incorrect to me -- it assumes behavior from ArraySeq that ArraySeq explicitly does not promise -- if we fixed this instead, then Martijn's code would be correct and the change in this PR would not be needed. (right?)

one thing I don't understand here yet is why ArraySeq claims this freedom for itself. you can't make an ArraySeq without a ClassTag, so why does ArraySeq need the freedom to perhaps be using an Array[java.lang.Double] or Array[AnyRef] internally instead of the Array[Double] one would normally expect?

@SethTisue
Copy link
Member

maybe the @scala/collections crew would like to pile on, here

@NthPortal
Copy link
Contributor

I am inclined to agree with Seth that the thing that needs fixing is the ArrayOps code, not the base implementation of intersect. I'm especially not fond of the fact that there's no apparent reason to use newSpecificBuilder.result() over empty, and that someone might innocently refactor it back, to much confusion.

@som-snytt
Copy link
Contributor Author

Yes, my comment was it fixes the issue "in a way" because there are alternatives and the quick fix is not compelling.

ArrayOps says "The remaining methods are provided for completeness but they delegate to mutable.ArraySeq implementations which may not provide the best possible performance." So all those calls could rebuild the desired array as necessary without violating the lightbend support contract.

The original sin is

  // This is reused for all calls to empty.
  private[this] val EmptyArraySeq  = new ofRef[AnyRef](new Array[AnyRef](0))
  def empty[T : ClassTag]: ArraySeq[T] = EmptyArraySeq.asInstanceOf[ArraySeq[T]]

#6648
scala/bug#10851
where lrytz asks

Also, can we avoid this problem here:

scala> a.array
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [I

Here is the unused (-Xlint would have told you that) tag:

scala> ArraySeq.make(Array(42)).array.getClass.getComponentType
val res16: Class[_] = int

scala> ArraySeq.empty(ArraySeq.make(Array(42)).elemTag).array.getClass.getComponentType
val res17: Class[_] = class java.lang.Object

@johnynek
Copy link
Contributor

johnynek commented Jun 1, 2021

First, is it important to do the def empty optimization? It seems strange to me that we accept ClassTag[A] on that method, but then discard it. I am skeptical that allocation is really costing us much, and I would consider just always using the ClassTag that comes in (note both immutable and mutable do discard the ClassTag in empty).

Second, the ArrayOps intersect implementation should be rewritten to avoid the cast IMO. The cast is assuming something is too strong. I think something as simple as:

val bs = that.toSet
val resultSize = count(bs)
val result = new Array[A](resultSize)
var idx = 0
var cnt = 0
while (cnt < resultSize) {
  val a = apply(idx)
  if (bs(a)) {
    result(cnt) = a
    cnt += 1
  }
  idx += 1
}
result

Third, the change in this PR also seems worthwhile to me, except I think we should weaken the tests so that we aren't testing the classtypes (since I don't think that is part of the signature).

@som-snytt
Copy link
Contributor Author

Thanks, I'll do a quickie PR with re-building the result in ArrayOps as necessary. I don't mind doing it as an exercise, if that is also not desirable.

@som-snytt
Copy link
Contributor Author

I don't have a knowledgeable opinion about empty and caching it.

Let me add that "knowledgeable" is a very strange word. What does it even mean?

@dwijnand
Copy link
Member

dwijnand commented Jun 1, 2021

The original sin is

  // This is reused for all calls to empty.
  private[this] val EmptyArraySeq  = new ofRef[AnyRef](new Array[AnyRef](0))
  def empty[T : ClassTag]: ArraySeq[T] = EmptyArraySeq.asInstanceOf[ArraySeq[T]]

It was less sinful originally when it was def empty[T <: AnyRef]: WrappedArray[T] = ....

First, is it important to do the def empty optimization? It seems strange to me that we accept ClassTag[A] on that method, but then discard it. I am skeptical that allocation is really costing us much, and I would consider just always using the ClassTag that comes in (note both immutable and mutable do discard the ClassTag in empty).

Yes and no? fdb1e69 states it avoids allocating 100+ million empty arrays but at the same time how it didn't make as much as a difference as it sounds like it should. 🤷🏼‍♂️

src/library/scala/collection/ArrayOps.scala Outdated Show resolved Hide resolved
src/library/scala/collection/ArrayOps.scala Outdated Show resolved Hide resolved
src/library/scala/collection/ArrayOps.scala Outdated Show resolved Hide resolved
@NthPortal
Copy link
Contributor

please note that the problematic method is ArraySeq#empty not ArraySeq.empty

@NthPortal
Copy link
Contributor

thoughts about adding a private intersect (and diff, etc.) method in one of the companion objects that takes the collections and a factory or builder, so that ArrayOps doesn't need to mutable.ArraySeq#intersect returning an array with the same element type, but we also don't duplicate code?

Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM otherwise, arraySeq.array is doing what it says, so such workarounds are needed if we want to use it.

I'm also fine if someone wants to come up with a solution that doesn't use ArraySeq.

src/library/scala/collection/ArrayOps.scala Outdated Show resolved Hide resolved
@som-snytt
Copy link
Contributor Author

I simplified to just do @johnynek toArray. Did I over-simplify? Not copying the result may be unexpected if it is exactly the input array. Do we expect (xs op ys) ne xs?

I leveraged the comment about "these are for completeness not efficiency".

@NthPortal 's last comment went over my head, so I'll leave that for the collections collective.

@lrytz
Copy link
Member

lrytz commented Jun 2, 2021

Do we expect (xs op ys) ne xs?

Yes, I think so, but toArray does create a copy, so it should be good.

Copy link
Member

@SethTisue SethTisue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR is now mergeable as-is. Along the way some questions and suggestions came up about possible future PRs in the same area, but this PR seems like the right minimal fix to me.

Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the title of the PR to "Avoid using ArraySeq#array inside of ArrayOps" or something similar.

@SethTisue SethTisue changed the title Avoid using ArraySeq.empty with fixed element type Fix ArrayOps bugs (by avoiding using ArraySeq#array, which does not guarantee element type) Jun 2, 2021
@lrytz lrytz merged commit 074cae1 into scala:2.13.x Jun 2, 2021
@johnynek
Copy link
Contributor

johnynek commented Jun 2, 2021

what's the story with changes like this wrt to scala 3?

I guess it has the same bug, right?

@dwijnand
Copy link
Member

dwijnand commented Jun 2, 2021

Periodically scala 3 bumps their scala-library version. So the Scala 3 release after 2.13.7 will have this.

@som-snytt som-snytt deleted the issue/12403 branch June 2, 2021 20:55
@SethTisue SethTisue changed the title Fix ArrayOps bugs (by avoiding using ArraySeq#array, which does not guarantee element type) Fix ArrayOps bugs (by avoiding ArraySeq#array, which does not guarantee element type) Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library prio:blocker release blocker (used only by core team, only near release time) release-notes worth highlighting in next release notes
Projects
None yet
8 participants