Update `ArrayBuffer#mutationCount` more precisely #9786

NthPortal · 2021-10-13T08:45:08Z

Update ArrayBuffer#mutationCount only when elements
of the buffer are changed or moved, and not when the
backing array is resized without changing the collection.

s.c.m.PriorityQueue, which uses an ArrayBuffer as part
of its implementation, does not track mutation perfectly.

Add insertAll benchmarks for ArrayBuffer.

Update `ArrayBuffer#mutationCount` only when elements of the buffer are changed or moved, and not when the backing array is resized without changing the collection. `s.c.m.PriorityQueue`, which uses an `ArrayBuffer` as part of its implementation, does not track mutation perfectly.

lrytz

LGTM 👍

NthPortal · 2021-10-28T00:17:18Z

@SethTisue do you want to hold off on merging this until after 2.13.7 releases? or are you okay with merging it now and getting it into 2.13.7?

I very weakly would prefer these changes to make 2.13.7, because they're tweaks to #9258, which is new to 2.13.7. however, I don't feel strongly about it

NthPortal · 2021-10-28T00:43:49Z

thank you Seth 💜

…wn from a test in MLEventSuite ### What changes were proposed in this pull request? This PR is to mitigate `ConcurrentModificationException` sometimes thrown from a test. Recently, I notice the exception is thrown from the following part of the test `pipeline read/write events` in `MLEventSuite` when Scala 2.13 is used. ``` events.map(JsonProtocol.sparkEventToJson).foreach { event => assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) } ``` We can also find this issue from the scheduled build. https://github.com/apache/spark/runs/4196812399?check_suite_focus=true#step:9:17616 I think the root cause is the `ArrayBuffer` (`events`) is updated asynchronously by the following part. ``` private val listener: SparkListener = new SparkListener { override def onOtherEvent(event: SparkListenerEvent): Unit = event match { case e: MLEvent => events.append(e) case _ => } } ``` You can easily reproduce this issue by applying the following diff to the commit hash 4d29bec. ``` diff --git a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala index f2343b7a88..ff63639e00 100644 --- a/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala -42,7 +42,9 class MLEventsSuite private val events = mutable.ArrayBuffer.empty[MLEvent] private val listener: SparkListener = new SparkListener { override def onOtherEvent(event: SparkListenerEvent): Unit = event match { - case e: MLEvent => events.append(e) + case e: MLEvent => + events.append(e) + Thread.sleep(500) case _ => } } -235,11 +237,13 class MLEventsSuite } // Test if they can be ser/de via JSON protocol. assert(events.nonEmpty) - events.map(JsonProtocol.sparkEventToJson).foreach { event => - assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) - } + events.map { x => + Thread.sleep(500) + JsonProtocol.sparkEventToJson(x) + }.foreach { event => + assert(JsonProtocol.sparkEventFromJson(event).isInstanceOf[MLEvent]) + } sc.listenerBus.waitUntilEmpty(timeoutMillis = 10000) - events.clear() val pipelineReader = Pipeline.read assert(events.isEmpty) ``` This is a kind of race condition but I think we can mitigate by retrying. Actually, I have never seen this issue when I used Scala 2.13.5 and recently we upgrade to 2.13.7. Scala 2.13.7 includes an update to detect `ConcurrentModificationException` more precisely. scala/scala#9786 ### Why are the changes needed? For test stability. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I manually modified the test code, inserting sleep like the diff shown above, and confirmed no ConcurrentModificationException is thrown. Closes #34583 from sarutak/fix-concurrent-modifiation-exception-mlevent. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…hrown from tests in SparkContextSuite ### What changes were proposed in this pull request? This PR fixes an issue that some tests in `SparkContextSuite` can throw `ConcurrentModificationException` with Scala 2.13. https://github.com/apache/spark/runs/4543047740?check_suite_focus=true#step:9:20851 The cause seems to be same as SPARK-37315 (#34583). > Scala 2.13.7 includes an update to detect ConcurrentModificationException more precisely. scala/scala#9786 You can easily reproduce this issue by applying the following diff to `master`. ``` diff --git a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala index bc809f11cc..e5dde84c6e 100644 --- a/core/src/test/scala/org/apache/spark/SparkContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/SparkContextSuite.scala -1130,9 +1130,11 class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu sc.addJar("ivy://org.apache.hive:hive-storage-api:2.7.0?" + "invalidParam1=foo&invalidParam2=boo") assert(sc.listJars().exists(_.contains("org.apache.hive_hive-storage-api-2.7.0.jar"))) - assert(logAppender.loggingEvents.exists(_.getRenderedMessage.contains( - "Invalid parameters `invalidParam1,invalidParam2` found in Ivy URI query " + - "`invalidParam1=foo&invalidParam2=boo`."))) + assert(logAppender.loggingEvents.exists { x => + Thread.sleep(1000) + x.getRenderedMessage.contains( + "Invalid parameters `invalidParam1,invalidParam2` found in Ivy URI query " + + "`invalidParam1=foo&invalidParam2=boo`.")}) } } ``` ### Why are the changes needed? Fix the flaky test. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Locally checked that no `ConcurrentModificationException` is thrown even if a sleep is inserted like the diff shown above. Closes #34922 from sarutak/fix-concurrent-access-issue. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…cums ### What changes were proposed in this pull request? This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats. This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422 ### Why are the changes needed? The current code has a data race. ### Does this PR introduce _any_ user-facing change? It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13. ### How was this patch tested? This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13. Closes #40663 from eejbyfeldt/SPARK-39696. Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com> Co-authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…cums ### What changes were proposed in this pull request? This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats. This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422 ### Why are the changes needed? The current code has a data race. ### Does this PR introduce _any_ user-facing change? It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13. ### How was this patch tested? This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13. Closes #40663 from eejbyfeldt/SPARK-39696. Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com> Co-authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 6ce0822) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…cums ### What changes were proposed in this pull request? This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats. This fix of using of using `CopyOnWriteArrayList` is cherry picked from #37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here #37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422 ### Why are the changes needed? The current code has a data race. ### Does this PR introduce _any_ user-facing change? It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13. ### How was this patch tested? This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13. Closes #40663 from eejbyfeldt/SPARK-39696. Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com> Co-authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 6ce0822) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit b2ff4c4) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…cums ### What changes were proposed in this pull request? This PR fixes a data race around concurrent access to `TaskMetrics.externalAccums`. The race occurs between the `executor-heartbeater` thread and the thread executing the task. This data race is not known to cause issues on 2.12 but in 2.13 ~due this change scala/scala#9258 (LuciferYang bisected this to first cause failures in scala 2.13.7 one possible reason could be scala/scala#9786) leads to an uncaught exception in the `executor-heartbeater` thread, which means that the executor will eventually be terminated due to missing hearbeats. This fix of using of using `CopyOnWriteArrayList` is cherry picked from apache#37206 where is was suggested as a fix by LuciferYang since `TaskMetrics.externalAccums` is also accessed from outside the class `TaskMetrics`. The old PR was closed because at that point there was no clear understanding of the race condition. JoshRosen commented here apache#37206 (comment) saying that there should be no such race based on because all accumulators should be deserialized as part of task deserialization here: https://github.com/apache/spark/blob/0cc96f76d8a4858aee09e1fa32658da3ae76d384/core/src/main/scala/org/apache/spark/executor/Executor.scala#L507-L508 and therefore no writes should occur while the hearbeat thread will read the accumulators. But my understanding is that is incorrect as accumulators will also be deserialized as part of the taskBinary here: https://github.com/apache/spark/blob/169f828b1efe10d7f21e4b71a77f68cdd1d706d6/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L87-L88 which will happen while the heartbeater thread is potentially reading the accumulators. This can both due to user code using accumulators (see the new test case) but also when using the Dataframe/Dataset API as sql metrics will also be `externalAccums`. One way metrics will be sent as part of the taskBinary is when the dep is a `ShuffleDependency`: https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/core/src/main/scala/org/apache/spark/Dependency.scala#L85 with a ShuffleWriteProcessor that comes from https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala#L411-L422 ### Why are the changes needed? The current code has a data race. ### Does this PR introduce _any_ user-facing change? It will fix an uncaught exception in the `executor-hearbeater` thread when using scala 2.13. ### How was this patch tested? This patch adds a new test case, that before the fix was applied consistently produces the uncaught exception in the heartbeater thread when using scala 2.13. Closes apache#40663 from eejbyfeldt/SPARK-39696. Lead-authored-by: Emil Ejbyfeldt <eejbyfeldt@liveintent.com> Co-authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 6ce0822) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

NthPortal added 2 commits October 12, 2021 03:45

Add insertAll benchmarks for ArrayBuffer

ca88d99

NthPortal added the library:collections PRs involving changes to the standard collection library label Oct 13, 2021

scala-jenkins added this to the 2.13.7 milestone Oct 13, 2021

NthPortal requested a review from a team October 15, 2021 19:04

SethTisue modified the milestones: 2.13.7, 2.13.8 Oct 26, 2021

lrytz approved these changes Oct 27, 2021

View reviewed changes

SethTisue modified the milestones: 2.13.8, 2.13.7 Oct 28, 2021

SethTisue merged commit 8fde4e3 into scala:2.13.x Oct 28, 2021

NthPortal deleted the topic/ArrayBuffer-cleanup branch October 28, 2021 00:29

sarutak mentioned this pull request Nov 13, 2021

[SPARK-37315][ML][TEST] Mitigate ConcurrentModificationException thrown from a test in MLEventSuite apache/spark#34583

Closed

sarutak mentioned this pull request Dec 17, 2021

[SPARK-37663][CORE][TESTS] Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite apache/spark#34922

Closed

eejbyfeldt mentioned this pull request Apr 5, 2023

[SPARK-39696][CORE] Fix data race in access to TaskMetrics.externalAccums apache/spark#40663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `ArrayBuffer#mutationCount` more precisely #9786

Update `ArrayBuffer#mutationCount` more precisely #9786

NthPortal commented Oct 13, 2021

lrytz left a comment

NthPortal commented Oct 28, 2021

NthPortal commented Oct 28, 2021

Update ArrayBuffer#mutationCount more precisely #9786

Update ArrayBuffer#mutationCount more precisely #9786

Conversation

NthPortal commented Oct 13, 2021

lrytz left a comment

Choose a reason for hiding this comment

NthPortal commented Oct 28, 2021

NthPortal commented Oct 28, 2021

Update `ArrayBuffer#mutationCount` more precisely #9786

Update `ArrayBuffer#mutationCount` more precisely #9786