Array.min in 2.12.12 vs 2.13.3 #12107

kamilkloch · 2020-08-06T14:00:20Z

Scala 2.12.12:

> println(Array(2.0, 1.0, Double.NaN).min)
NaN

Scala 2.13.3:

> println(Array(2.0, 1.0, Double.NaN).min)
1.0

Which one is correct? ;)

Jasper-M · 2020-08-06T14:52:22Z

There's some history on this topic #11844

eed3si9n · 2020-08-06T15:03:31Z

scala/scala#8721 is the PR that came out of it and it should summarize the situation.

NthPortal · 2020-08-09T02:44:51Z

Welcome to Scala 2.12.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_265).
Type in expressions for evaluation. Or try :help.

scala> Array(2.0, 1.0, Double.NaN, 3.0).min
res0: Double = 3.0

2.13.3 is the one that's correct (or at the very least more correct). I don't believe we can backport a fix (though I could be mistaken)

eed3si9n · 2020-08-09T02:56:46Z

Array(2.0, 1.0, Double.NaN, 3.0).min sounds like a clear bug in Scala 2.12.12, which I think we could backport a fix to return NaN.

However, in the original example, I would argue that both are "correct" in the sense that they work as intended.

Scala 2.12.12 conforms to IEEE 754 floating point semantics, like java.lang.Math.min(1.0, Double.NaN) returning NaN.

Scala 2.13.3 on the other hand conforms to total ordering like java.lang.Double.compare being able to sort an array of Double.

…d if built with Scala 2.13 ### What changes were proposed in this pull request? This PR fixes an issue that the histogram and timeline aren't rendered in the `Streaming Query Statistics` page if we built Spark with Scala 2.13. ![before-fix-the-issue](https://user-images.githubusercontent.com/4736016/100612855-f543d700-3356-11eb-90d9-ede57b8b3f4f.png) ![NaN_Error](https://user-images.githubusercontent.com/4736016/100612879-00970280-3357-11eb-97cf-43978bbe2d3a.png) The reason is [`maxRecordRate` can be `NaN`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala#L371) for Scala 2.13. The `NaN` is the result of [`query.recentProgress.map(_.inputRowsPerSecond).max`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala#L372) when the first element of `query.recentProgress.map(_.inputRowsPerSecond)` is `NaN`. Actually, the comparison logic for `Double` type was changed in Scala 2.13. scala/bug#12107 scala/scala#6410 So this issue happens as of Scala 2.13. The root cause of the `NaN` is [here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala#L164). This `NaN` seems to be an initial value of `inputTimeSec` so I think `Double.PositiveInfinity` is suitable rather than `NaN` and this change can resolve this issue. ### Why are the changes needed? To make sure we can use the histogram/timeline with Scala 2.13. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? First, I built with the following commands. ``` $ /dev/change-scala-version.sh 2.13 $ build/sbt -Phive -Phive-thriftserver -Pscala-2.13 package ``` Then, ran the following query (this is brought from #30427 ). ``` import org.apache.spark.sql.streaming.Trigger val query = spark .readStream .format("rate") .option("rowsPerSecond", 1000) .option("rampUpTime", "10s") .load() .selectExpr("*", "CAST(CAST(timestamp AS BIGINT) - CAST((RAND() * 100000) AS BIGINT) AS TIMESTAMP) AS tsMod") .selectExpr("tsMod", "mod(value, 100) as mod", "value") .withWatermark("tsMod", "10 seconds") .groupBy(window($"tsMod", "1 minute", "10 seconds"), $"mod") .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value")) .writeStream .format("console") .trigger(Trigger.ProcessingTime("5 seconds")) .outputMode("append") .start() ``` Finally, I confirmed that the timeline and histogram are rendered. ![after-fix-the-issue](https://user-images.githubusercontent.com/4736016/100612736-c9285600-3356-11eb-856d-7e53cc656c36.png) ``` Closes #30546 from sarutak/ss-nan. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>

SethTisue closed this as completed Aug 7, 2020

sarutak mentioned this issue Nov 30, 2020

[SPARK-33607][SS][WEBUI] Input Rate timeline/histogram aren't rendered if built with Scala 2.13 apache/spark#30546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array.min in 2.12.12 vs 2.13.3 #12107

Array.min in 2.12.12 vs 2.13.3 #12107

kamilkloch commented Aug 6, 2020 •

edited

Jasper-M commented Aug 6, 2020

eed3si9n commented Aug 6, 2020

NthPortal commented Aug 9, 2020

eed3si9n commented Aug 9, 2020

Array.min in 2.12.12 vs 2.13.3 #12107

Array.min in 2.12.12 vs 2.13.3 #12107

Comments

kamilkloch commented Aug 6, 2020 • edited

Jasper-M commented Aug 6, 2020

eed3si9n commented Aug 6, 2020

NthPortal commented Aug 9, 2020

eed3si9n commented Aug 9, 2020

kamilkloch commented Aug 6, 2020 •

edited