Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ArrayBuffer#mutationCount more precisely #9786

Merged
merged 2 commits into from Oct 28, 2021

Conversation

NthPortal
Copy link
Contributor

Update ArrayBuffer#mutationCount only when elements
of the buffer are changed or moved, and not when the
backing array is resized without changing the collection.

s.c.m.PriorityQueue, which uses an ArrayBuffer as part
of its implementation, does not track mutation perfectly.

Add insertAll benchmarks for ArrayBuffer.

Update `ArrayBuffer#mutationCount` only when elements
of the buffer are changed or moved, and not when the
backing array is resized without changing the collection.

`s.c.m.PriorityQueue`, which uses an `ArrayBuffer` as part
of its implementation, does not track mutation perfectly.
@NthPortal NthPortal added the library:collections PRs involving changes to the standard collection library label Oct 13, 2021
@scala-jenkins scala-jenkins added this to the 2.13.7 milestone Oct 13, 2021
@NthPortal NthPortal requested a review from a team October 15, 2021 19:04
@SethTisue SethTisue modified the milestones: 2.13.7, 2.13.8 Oct 26, 2021
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@NthPortal
Copy link
Contributor Author

@SethTisue do you want to hold off on merging this until after 2.13.7 releases? or are you okay with merging it now and getting it into 2.13.7?

I very weakly would prefer these changes to make 2.13.7, because they're tweaks to #9258, which is new to 2.13.7. however, I don't feel strongly about it

@SethTisue SethTisue modified the milestones: 2.13.8, 2.13.7 Oct 28, 2021
@SethTisue SethTisue merged commit 8fde4e3 into scala:2.13.x Oct 28, 2021
@NthPortal NthPortal deleted the topic/ArrayBuffer-cleanup branch October 28, 2021 00:29
@NthPortal
Copy link
Contributor Author

thank you Seth 💜

dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Nov 13, 2021
…wn from a test in MLEventSuite

### What changes were proposed in this pull request?

This PR is to mitigate `ConcurrentModificationException` sometimes thrown from a test.
Recently, I notice the exception is thrown from the following part of the test `pipeline read/write events` in `MLEventSuite` when Scala 2.13 is used.
```
events.map(JsonProtocol.sparkEventToJson).foreach { event =>
  assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
}
```

We can also find this issue from the scheduled build.
https://github.com/apache/spark/runs/4196812399?check_suite_focus=true#step:9:17616

I think the root cause is the `ArrayBuffer` (`events`) is updated asynchronously by the following part.
```
private val listener: SparkListener = new SparkListener {
  override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
    case e: MLEvent => events.append(e)
    case _ =>
  }
}
```
You can easily reproduce this issue by applying the following diff to the commit hash 4d29bec.
```
diff --git a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
index f2343b7a88..ff63639e00 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala
 -42,7 +42,9  class MLEventsSuite
   private val events = mutable.ArrayBuffer.empty[MLEvent]
   private val listener: SparkListener = new SparkListener {
     override def onOtherEvent(event: SparkListenerEvent): Unit = event match {
-      case e: MLEvent => events.append(e)
+      case e: MLEvent =>
+        events.append(e)
+        Thread.sleep(500)
       case _ =>
     }
   }
 -235,11 +237,13  class MLEventsSuite
       }
       // Test if they can be ser/de via JSON protocol.
       assert(events.nonEmpty)
-      events.map(JsonProtocol.sparkEventToJson).foreach { event =>
-        assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
-      }
+        events.map { x =>
+          Thread.sleep(500)
+          JsonProtocol.sparkEventToJson(x)
+        }.foreach { event =>
+          assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent])
+        }
       sc.listenerBus.waitUntilEmpty(timeoutMillis = 10000)
-
       events.clear()
       val pipelineReader = Pipeline.read
       assert(events.isEmpty)
```

This is a kind of race condition but I think we can mitigate by retrying.

Actually, I have never seen this issue when I used Scala 2.13.5 and recently we upgrade to 2.13.7.
Scala 2.13.7 includes an update to detect `ConcurrentModificationException` more precisely.
scala/scala#9786

### Why are the changes needed?

For test stability.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I manually modified the test code, inserting sleep like the diff shown above, and confirmed no ConcurrentModificationException is thrown.

Closes #34583 from sarutak/fix-concurrent-modifiation-exception-mlevent.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Dec 17, 2021
…hrown from tests in SparkContextSuite

### What changes were proposed in this pull request?

This PR fixes an issue that some tests in `SparkContextSuite` can throw `ConcurrentModificationException` with Scala 2.13.
https://github.com/apache/spark/runs/4543047740?check_suite_focus=true#step:9:20851
The cause seems to be same as SPARK-37315 (#34583).
> Scala 2.13.7 includes an update to detect ConcurrentModificationException more precisely.
scala/scala#9786

You can easily reproduce this issue by applying the following diff to `master`.
```
diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
index bc809f11cc..e5dde84c6e 100644
--- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala
 -1130,9 +1130,11  class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
       sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?" +
         "invalidParam1=foo&invalidParam2=boo")
       assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar")))
-      assert(logAppender.loggingEvents.exists(_.getRenderedMessage.contains(
-        "Invalid parameters `invalidParam1,invalidParam2` found in Ivy URI query " +
-          "`invalidParam1=foo&invalidParam2=boo`.")))
+      assert(logAppender.loggingEvents.exists { x =>
+        Thread.sleep(1000)
+        x.getRenderedMessage.contains(
+          "Invalid parameters `invalidParam1,invalidParam2` found in Ivy URI query " +
+            "`invalidParam1=foo&invalidParam2=boo`.")})
     }
   }
```

### Why are the changes needed?

Fix the flaky test.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Locally checked that no `ConcurrentModificationException` is thrown even if a sleep is inserted like the diff shown above.

Closes #34922 from sarutak/fix-concurrent-access-issue.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon pushed a commit to apache/spark that referenced this pull request Apr 7, 2023
…cums

### What changes were proposed in this pull request?
This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats.

This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as  sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422

### Why are the changes needed?
The current code has a data race.

### Does this PR introduce _any_ user-facing change?
It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13.

### How was this patch tested?
This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13.

Closes #40663 from eejbyfeldt/SPARK-39696.

Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit to apache/spark that referenced this pull request Apr 7, 2023
…cums

### What changes were proposed in this pull request?
This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats.

This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as  sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422

### Why are the changes needed?
The current code has a data race.

### Does this PR introduce _any_ user-facing change?
It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13.

### How was this patch tested?
This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13.

Closes #40663 from eejbyfeldt/SPARK-39696.

Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6ce0822)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Apr 7, 2023
…cums

### What changes were proposed in this pull request?
This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats.

This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as  sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422

### Why are the changes needed?
The current code has a data race.

### Does this PR introduce _any_ user-facing change?
It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13.

### How was this patch tested?
This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13.

Closes #40663 from eejbyfeldt/SPARK-39696.

Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6ce0822)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit b2ff4c4)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…cums

### What changes were proposed in this pull request?
This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats.

This fix of using of using `CopyOnWriteArrayList` is cherry picked from apache#37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here apache#37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as  sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422

### Why are the changes needed?
The current code has a data race.

### Does this PR introduce _any_ user-facing change?
It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13.

### How was this patch tested?
This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13.

Closes apache#40663 from eejbyfeldt/SPARK-39696.

Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 6ce0822)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
library:collections PRs involving changes to the standard collection library
Projects
None yet
4 participants