{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1715221937.0","currentOid":""},"activityList":{"items":[{"before":"5b965f70c057cb478896feea2456fc59267596df","after":"b5584221cfc2d3cb052c082d8a94b4a00ccf4ed4","ref":"refs/heads/master","pushedAt":"2024-05-12T23:45:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48245][SQL] Fix typo in BadRecordException class doc\n\n### What changes were proposed in this pull request?\nFix typo in `BadRecordException` class doc\n\n### Why are the changes needed?\nTo avoid annoyance\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46542 from vladimirg-db/vladimirg-db/fix-typo-in-bad-record-exception-doc.\n\nAuthored-by: Vladimir Golubev \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48245][SQL] Fix typo in BadRecordException class doc"}},{"before":"57b207774382e3a35345518ede5cfc028885f90b","after":"5b965f70c057cb478896feea2456fc59267596df","ref":"refs/heads/master","pushedAt":"2024-05-12T23:26:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48239][INFRA] Update the release docker image to follow what we use in Github Action jobs\n\n### What changes were proposed in this pull request?\n\nWe have Github Action jobs to test package building and doc generation, but the execution environment is different from what we use for the release process.\n\nThis PR updates the release docker image to follow what we use in Github Action: https://github.com/apache/spark/blob/master/dev/infra/Dockerfile\n\nNote: it's not exactly the same, as I have to do some modification to make it usable for the release process. In the future we should have a better way to unify these two docker files.\n\n### Why are the changes needed?\n\nto make us be able to release\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nmanually\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46534 from cloud-fan/re.\n\nAuthored-by: Wenchen Fan \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48239][INFRA] Update the release docker image to follow what w…"}},{"before":"f699f556d8a09bb755e9c8558661a36fbdb42e73","after":"57b207774382e3a35345518ede5cfc028885f90b","ref":"refs/heads/master","pushedAt":"2024-05-11T12:41:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48240][DOCS] Replace `Local[..]` with `\"Local[...]\"` in the docs\n\n### What changes were proposed in this pull request?\nThe pr aims to replace `Local[..]` with `\"Local[...]\"` in the docs\n\n### Why are the changes needed?\n1.When I recently switched from `bash` to `zsh` and executed command `./bin/spark-shell --master local[8]` on local, the following error will be printed:\n\"image\"\n\n2.Some descriptions in the existing documents have been written as `--master \"local[n]\"`, eg:\nhttps://github.com/apache/spark/blob/f699f556d8a09bb755e9c8558661a36fbdb42e73/docs/index.md?plain=1#L49\n\n3.The root cause is: https://blog.peiyingchi.com/2017/03/20/spark-zsh-no-matches-found-local/\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nYes, with the `zsh` becoming the mainstream of shell, avoid the confusion of spark users when submitting apps with `./bin/spark-shell --master \"local[n]\" ...` or `./bin/spark-sql --master \"local[n]\" ...`, etc\n\n### How was this patch tested?\nManually test\nWhether the user uses `bash` or `zsh`, the above `--master \"local[n]\"` can be executed successfully in the expected way.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46535 from panbingkun/SPARK-48240.\n\nAuthored-by: panbingkun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48240][DOCS] Replace Local[..] with \"Local[...]\" in the docs"}},{"before":"d16a4f4c98d5e6a44ff783e20a9f2f2f80c009f3","after":"1e0fc1ef96aa6f541134224f1ba626f234442e74","ref":"refs/heads/branch-3.4","pushedAt":"2024-05-11T02:54:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script\n\n### What changes were proposed in this pull request?\nThe pr aims to delete the dir `dev/pr-deps` after executing `test-dependencies.sh`.\n\n### Why are the changes needed?\nWe'd better clean the `temporary files` generated at the end.\nBefore:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\nAfter:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46531 from panbingkun/minor_test-dependencies.\n\nAuthored-by: panbingkun \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit f699f556d8a09bb755e9c8558661a36fbdb42e73)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48237][BUILD] Clean up dev/pr-deps at the end of `test-depen…"}},{"before":"c048653435f9b7c832f79d38a504a145a17654c0","after":"e9a1b4254419c751e612cd5e5c56f111b41399e7","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-11T02:54:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script\n\n### What changes were proposed in this pull request?\nThe pr aims to delete the dir `dev/pr-deps` after executing `test-dependencies.sh`.\n\n### Why are the changes needed?\nWe'd better clean the `temporary files` generated at the end.\nBefore:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\nAfter:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46531 from panbingkun/minor_test-dependencies.\n\nAuthored-by: panbingkun \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit f699f556d8a09bb755e9c8558661a36fbdb42e73)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48237][BUILD] Clean up dev/pr-deps at the end of `test-depen…"}},{"before":"d82458f15539eef8df320345a7c2382ca4d5be8a","after":"f699f556d8a09bb755e9c8558661a36fbdb42e73","ref":"refs/heads/master","pushedAt":"2024-05-11T02:54:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48237][BUILD] Clean up `dev/pr-deps` at the end of `test-dependencies.sh` script\n\n### What changes were proposed in this pull request?\nThe pr aims to delete the dir `dev/pr-deps` after executing `test-dependencies.sh`.\n\n### Why are the changes needed?\nWe'd better clean the `temporary files` generated at the end.\nBefore:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\nAfter:\n```\nsh dev/test-dependencies.sh\n```\n\"image\"\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46531 from panbingkun/minor_test-dependencies.\n\nAuthored-by: panbingkun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48237][BUILD] Clean up dev/pr-deps at the end of `test-depen…"}},{"before":"5b3b8a90638c49fc7ddcace69a85989c1053f1ab","after":"d82458f15539eef8df320345a7c2382ca4d5be8a","ref":"refs/heads/master","pushedAt":"2024-05-10T23:31:49.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API\n\n### What changes were proposed in this pull request?\n\nThis is a follow-up PR for https://github.com/apache/spark/pull/46487 to add missing tags for the `dataSource` API.\n\n### Why are the changes needed?\n\nTo address comments from a previous PR.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nExisting test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46530 from allisonwang-db/spark-48205-followup.\n\nAuthored-by: allisonwang-db \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48205][SQL][FOLLOWUP] Add missing tags for the dataSource API"}},{"before":"726ef8aa66ea6e56b739f3b16f99e457a0febb81","after":"5b3b8a90638c49fc7ddcace69a85989c1053f1ab","ref":"refs/heads/master","pushedAt":"2024-05-10T22:48:10.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48236][BUILD] Add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars\n\n### What changes were proposed in this pull request?\n\nThis PR aims to add `commons-lang:commons-lang:2.6` back to support legacy Hive UDF jars . This is a partial revert of SPARK-47018 .\n\n### Why are the changes needed?\n\nRecently, we dropped `commons-lang:commons-lang` during Hive upgrade.\n- #46468\n\nHowever, only Apache Hive 2.3.10 or 4.0.0 dropped it. In other words, Hive 2.0.0 ~ 2.3.9 and Hive 3.0.0 ~ 3.1.3 requires it. As a result, all existing UDF jars built against those versions requires `commons-lang:commons-lang` still.\n\n- https://github.com/apache/hive/pull/4892\n\nFor example, Apache Hive 3.1.3 code:\n- https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java#L21\n```\nimport org.apache.commons.lang.StringUtils;\n```\n\n- https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java#L42\n```\nreturn StringUtils.strip(val, \" \");\n```\n\nAs a result, Maven CIs are broken.\n- https://github.com/apache/spark/actions/runs/9032639456/job/24825599546 (Maven / Java 17)\n- https://github.com/apache/spark/actions/runs/9033374547/job/24835284769 (Maven / Java 21)\n\nThe root cause is that the existing test UDF jar `hive-test-udfs.jar` was built from old Hive (before 2.3.10) libraries which requires `commons-lang:commons-lang:2.6`.\n```\nHiveUDFDynamicLoadSuite:\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDF\n20:21:25.129 WARN org.apache.spark.SparkContext: The JAR file:///home/runner/work/spark/spark/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:33327/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version.\n\n*** RUN ABORTED ***\nA needed class was not found. This could be due to an error in your runpath. Missing class: org/apache/commons/lang/StringUtils\n java.lang.NoClassDefFoundError: org/apache/commons/lang/StringUtils\n at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43)\n at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75)\n at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170)\n at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118)\n at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117)\n at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132)\n at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132)\n at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:184)\n at org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:164)\n at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:185)\n ...\n Cause: java.lang.ClassNotFoundException: org.apache.commons.lang.StringUtils\n at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:445)\n at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:593)\n at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)\n at org.apache.hadoop.hive.contrib.udf.example.GenericUDFTrim2.performOp(GenericUDFTrim2.java:43)\n at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBaseTrim.evaluate(GenericUDFBaseTrim.java:75)\n at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:170)\n at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118)\n at org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117)\n at org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132)\n at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132)\n ...\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nTo support the existing customer UDF jars.\n\n### How was this patch tested?\n\nManually.\n\n```\n$ build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.HiveUDFDynamicLoadSuite test\n...\nHiveUDFDynamicLoadSuite:\n14:21:56.034 WARN org.apache.hadoop.hive.metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0\n\n14:21:56.035 WARN org.apache.hadoop.hive.metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore dongjoon127.0.0.1\n\n14:21:56.041 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException\n\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDF\n14:21:57.576 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version.\n\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDF\n14:21:58.314 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version.\n\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDAF\n14:21:58.943 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version.\n\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (UDAF\n14:21:59.333 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.\n\n14:21:59.364 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist\n\n14:21:59.370 WARN org.apache.hadoop.hive.metastore.HiveMetaStore: Location: file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src specified for non-external table:src\n\n14:21:59.718 WARN org.apache.hadoop.hive.metastore.ObjectStore: Failed to get database global_temp, returning NoSuchObjectException\n\n14:21:59.770 WARN org.apache.spark.SparkContext: The JAR file:///Users/dongjoon/APACHE/spark-merge/sql/hive/src/test/noclasspath/hive-test-udfs.jar at spark://localhost:55526/jars/hive-test-udfs.jar has been added already. Overwriting of added jar is not supported in the current version.\n\n- Spark should be able to run Hive UDF using jar regardless of current thread context classloader (GENERIC_UDTF\n14:22:00.403 WARN org.apache.hadoop.hive.common.FileUtils: File file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src does not exist; Force to delete it.\n\n14:22:00.404 ERROR org.apache.hadoop.hive.common.FileUtils: Failed to delete file:/Users/dongjoon/APACHE/spark-merge/sql/hive/target/tmp/warehouse-49291492-9d48-4360-a354-ace73a2c76ce/src\n\n14:22:00.441 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist\n\n14:22:00.453 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.\n\n14:22:00.537 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist\n\nRun completed in 8 seconds, 612 milliseconds.\nTotal number of tests run: 5\nSuites: completed 2, aborted 0\nTests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0\nAll tests passed.\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nCloses #46528 from dongjoon-hyun/SPARK-48236.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48236][BUILD] Add commons-lang:commons-lang:2.6 back to supp…"}},{"before":"2225aa1dab0fdb358ce032e07057a54aaf4e456f","after":"726ef8aa66ea6e56b739f3b16f99e457a0febb81","ref":"refs/heads/master","pushedAt":"2024-05-10T22:34:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48230][BUILD] Remove unused `jodd-core`\"\n\nThis reverts commit d8151186d79459fbde27a01bd97328e73548c55a.","shortMessageHtmlLink":"Revert \"[SPARK-48230][BUILD] Remove unused jodd-core\""}},{"before":"a6632ffa16f6907eba96e745920d571924bf4b63","after":"2225aa1dab0fdb358ce032e07057a54aaf4e456f","ref":"refs/heads/master","pushedAt":"2024-05-10T18:02:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48144][SQL] Fix `canPlanAsBroadcastHashJoin` to respect shuffle join hints\n\n### What changes were proposed in this pull request?\n\n* Currently, `canPlanAsBroadcastHashJoin` incorrectly returns that a join can be planned as a BHJ, even though the join contains a SHJ.\n* To fix this, add some logic that checks whether the join contains a SHJ hint before checking if the join can be broadcasted.\n* Also made a small refactor to the `JoinSelectionHelperSuite` to make it a bit more readable.\n\n### Why are the changes needed?\n\n* `canPlanAsBroadcastHashJoin` should be in sync with the join selection in `SparkStrategies`. Currently, it is not in sync.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, semi / anti joins that could not have been planned as broadcasts would now not be pushed through aggregates anymore. Generally, this would be a performance improvement.\n\n### How was this patch tested?\n\n* Added UTs to check that a join with a SHJ hint is not marked as being planned as a BHJ.\n* Added tests to keep `canPlanAsBroadcastHashJoin` and the `JoinSelection` codepath in sync.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\n* No\n\nCloses #46401 from fred-db/fix-hint.\n\nAuthored-by: fred-db \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48144][SQL] Fix canPlanAsBroadcastHashJoin to respect shuffl…"}},{"before":"5beaf85cd5ef2b84a67ebce712e8d73d1e7d41ff","after":"a6632ffa16f6907eba96e745920d571924bf4b63","ref":"refs/heads/master","pushedAt":"2024-05-10T16:37:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser\n\n# What changes were proposed in this pull request?\nNew lightweight exception for control-flow between UnivocityParser and FalureSafeParser to speed-up malformed CSV parsing.\n\nThis is a different way to implement these reverted changes: https://github.com/apache/spark/pull/46478\n\nThe previous implementation was more invasive - removing `cause` from `BadRecordException` could break upper code, which unwraps errors and checks the types of the causes. This implementation only touches `FailureSafeParser` and `UnivocityParser` since in the codebase they are always used together, unlike `JacksonParser` and `StaxXmlParser`. Removing stacktrace from `BadRecordException` is safe, since the cause itself has an adequate stacktrace (except pure control-flow cases).\n\n### Why are the changes needed?\nParsing in `PermissiveMode` is slow due to heavy exception construction (stacktrace filling + string template substitution in `SparkRuntimeException`)\n\n### Does this PR introduce _any_ user-facing change?\nNo, since `FailureSafeParser` unwraps `BadRecordException` and correctly rethrows user-facing exceptions in `FailFastMode`\n\n### How was this patch tested?\n- `testOnly org.apache.spark.sql.catalyst.csv.UnivocityParserSuite`\n- Manually run csv benchmark\n- Manually checked correct and malformed csv in sherk-shell (org.apache.spark.SparkException is thrown with the stacktrace)\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46500 from vladimirg-db/vladimirg-db/use-special-lighweight-exception-for-control-flow-between-univocity-parser-and-failure-safe-parser.\n\nAuthored-by: Vladimir Golubev \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48143][SQL] Use lightweight exceptions for control-flow betwee…"}},{"before":"c5b6ec734bd0c47551b59f9de13c6323b80974b2","after":"5beaf85cd5ef2b84a67ebce712e8d73d1e7d41ff","ref":"refs/heads/master","pushedAt":"2024-05-10T15:24:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source exactly once\n\n### What changes were proposed in this pull request?\nFix the flakiness in python streaming source exactly once test. The last executed batch may not be recorded in query progress, which cause the expected rows doesn't match. This fix takes the uncompleted batch into account and relax the condition\n\n### Why are the changes needed?\nFix flaky test.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nTest change.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46481 from chaoqin-li1123/fix_python_ds_test.\n\nAuthored-by: Chaoqin Li \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47793][TEST][FOLLOWUP] Fix flaky test for Python data source e…"}},{"before":"73bb619d45b2d0699ca4a9d251eea57c359f275b","after":"c5b6ec734bd0c47551b59f9de13c6323b80974b2","ref":"refs/heads/master","pushedAt":"2024-05-10T15:22:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI\n\n### What changes were proposed in this pull request?\n\nThis PR makes it do not add log link for unmanaged AM in Spark UI.\n\n### Why are the changes needed?\n\nAvoid start driver error messages:\n```\n24/03/18 04:58:25,022 ERROR [spark-listener-group-appStatus] scheduler.AsyncEventQueue:97 : Listener AppStatusListener threw an exception\njava.lang.NumberFormatException: For input string: \"null\"\n\tat java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) ~[?:?]\n\tat java.lang.Integer.parseInt(Integer.java:668) ~[?:?]\n\tat java.lang.Integer.parseInt(Integer.java:786) ~[?:?]\n\tat scala.collection.immutable.StringLike.toInt(StringLike.scala:310) ~[scala-library-2.12.18.jar:?]\n\tat scala.collection.immutable.StringLike.toInt$(StringLike.scala:310) ~[scala-library-2.12.18.jar:?]\n\tat scala.collection.immutable.StringOps.toInt(StringOps.scala:33) ~[scala-library-2.12.18.jar:?]\n\tat org.apache.spark.util.Utils$.parseHostPort(Utils.scala:1105) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.ProcessSummaryWrapper.(storeTypes.scala:609) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.LiveMiscellaneousProcess.doUpdate(LiveEntity.scala:1045) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.LiveEntity.write(LiveEntity.scala:50) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.AppStatusListener.update(AppStatusListener.scala:1233) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.AppStatusListener.onMiscellaneousProcessAdded(AppStatusListener.scala:1445) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.status.AppStatusListener.onOtherEvent(AppStatusListener.scala:113) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) ~[scala-library-2.12.18.jar:?]\n\tat scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) ~[scala-library-2.12.18.jar:?]\n\tat org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96) ~[spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356) [spark-core_2.12-3.5.1.jar:3.5.1]\n\tat org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96) [spark-core_2.12-3.5.1.jar:3.5.1]\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nManual testing:\n```shell\nbin/spark-sql --master yarn --conf spark.yarn.unmanagedAM.enabled=true\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #45565 from wangyum/SPARK-47441.\n\nAuthored-by: Yuming Wang \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47441][YARN] Do not add log link for unmanaged AM in Spark UI"}},{"before":"7ef0440ef22161a6160f7b9000c70b26c84eecf7","after":"73bb619d45b2d0699ca4a9d251eea57c359f275b","ref":"refs/heads/master","pushedAt":"2024-05-10T14:45:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48235][SQL] Directly pass join instead of all arguments to getBroadcastBuildSide and getShuffleHashJoinBuildSide\n\n### What changes were proposed in this pull request?\n\n* Refactor getBroadcastBuildSide and getShuffleHashJoinBuildSide to pass the join as argument instead of all member variables of the join separately.\n\n### Why are the changes needed?\n\n* Makes to code easier to read.\n\n### Does this PR introduce _any_ user-facing change?\n\n* no\n\n### How was this patch tested?\n\n* Existing UTs\n\n### Was this patch authored or co-authored using generative AI tooling?\n\n* No\n\nCloses #46525 from fred-db/parameter-change.\n\nAuthored-by: fred-db \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48235][SQL] Directly pass join instead of all arguments to get…"}},{"before":"259760a5c5e26e33b2ee46282aeb63e4ea701020","after":"7ef0440ef22161a6160f7b9000c70b26c84eecf7","ref":"refs/heads/master","pushedAt":"2024-05-10T14:39:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48146][SQL] Fix aggregate function in With expression child assertion\n\n### What changes were proposed in this pull request?\n\nIn https://github.com/apache/spark/pull/46034, there was a complicated edge case where common expression references in aggregate functions in the child of a `With` expression could become dangling. An assertion was added to avoid that case from happening, but the assertion wasn't fully accurate as a query like:\n```\nselect\n id between max(if(id between 1 and 2, 2, 1)) over () and id\nfrom range(10)\n```\nwould fail the assertion.\n\nThis PR fixes the assertion to be more accurate.\n\n### Why are the changes needed?\n\nThis addresses a regression in https://github.com/apache/spark/pull/46034.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nAdded unit tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46443 from kelvinjian-db/SPARK-48146-agg.\n\nAuthored-by: Kelvin Jiang \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48146][SQL] Fix aggregate function in With expression child as…"}},{"before":"256a23883d901c78cf82b4c52e3373322309b8d1","after":"259760a5c5e26e33b2ee46282aeb63e4ea701020","ref":"refs/heads/master","pushedAt":"2024-05-10T10:45:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48228][PYTHON][CONNECT][FOLLOWUP] Also apply `_validate_pandas_udf` in MapInXXX\n\n### What changes were proposed in this pull request?\nAlso apply `_validate_pandas_udf` in MapInXXX\n\n### Why are the changes needed?\nto make sure validation in `pandas_udf` is also applied in MapInXXX\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46524 from zhengruifeng/missing_check_map_in_xxx.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48228][PYTHON][CONNECT][FOLLOWUP] Also apply `_validate_pandas…"}},{"before":"d8151186d79459fbde27a01bd97328e73548c55a","after":"256a23883d901c78cf82b4c52e3373322309b8d1","ref":"refs/heads/master","pushedAt":"2024-05-10T08:12:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48232][PYTHON][TESTS] Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build\n\n### What changes were proposed in this pull request?\n\nThis PR avoids importing `scipy.sparse` directly which hangs indeterministically specifically with Python 3.12\n\n### Why are the changes needed?\n\nTo fix the build with Python 3.12 https://github.com/apache/spark/actions/runs/9022174253/job/24804919747\nI was able to reproduce this in my local but a bit indeterministic.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nManually tested in my local.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46522 from HyukjinKwon/SPARK-48232.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48232][PYTHON][TESTS] Fix 'pyspark.sql.tests.connect.test_conn…"}},{"before":"2df494fd4e4e64b9357307fb0c5e8fc1b7491ac3","after":"d8151186d79459fbde27a01bd97328e73548c55a","ref":"refs/heads/master","pushedAt":"2024-05-10T08:09:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48230][BUILD] Remove unused `jodd-core`\n\n### What changes were proposed in this pull request?\n\nRemove a jar that has CVE https://github.com/advisories/GHSA-jrg3-qq99-35g7\n\n### Why are the changes needed?\n\nPreviously, `jodd-core` came from Hive transitive deps, while https://github.com/apache/hive/pull/5151 (Hive 2.3.10) cut it out, so we can remove it from Spark now.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46520 from pan3793/SPARK-48230.\n\nAuthored-by: Cheng Pan \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48230][BUILD] Remove unused jodd-core"}},{"before":"33cac4436e593c9c501c5ff0eedf923d3a21899c","after":"2df494fd4e4e64b9357307fb0c5e8fc1b7491ac3","ref":"refs/heads/master","pushedAt":"2024-05-10T06:03:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48158][SQL] Add collation support for XML expressions\n\n### What changes were proposed in this pull request?\nIntroduce collation awareness for XML expressions: from_xml, schema_of_xml, to_xml.\n\n### Why are the changes needed?\nAdd collation support for XML expressions in Spark.\n\n### Does this PR introduce _any_ user-facing change?\nYes, users should now be able to use collated strings within arguments for XML functions: from_xml, schema_of_xml, to_xml.\n\n### How was this patch tested?\nE2e sql tests.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46507 from uros-db/xml-expressions.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48158][SQL] Add collation support for XML expressions"}},{"before":"dc4911725baa8b9e5f3c095c27a569b98c0bd8a3","after":"c048653435f9b7c832f79d38a504a145a17654c0","ref":"refs/heads/branch-3.5","pushedAt":"2024-05-10T05:55:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion`\n\n### What changes were proposed in this pull request?\n\n`spark.network.remoteReadNioBufferConversion` was introduced in https://github.com/apache/spark/commit/2c82745686f4456c4d5c84040a431dcb5b6cb60b, to allow disable [SPARK-24307](https://issues.apache.org/jira/browse/SPARK-24307) for safety, while during the whole Spark 3 period, there are no negative reports, it proves that [SPARK-24307](https://issues.apache.org/jira/browse/SPARK-24307) is solid enough, I propose to mark it deprecated in 3.5.2 and remove in 4.1.0 or later\n\n### Why are the changes needed?\n\nCode clean up\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46047 from pan3793/SPARK-47847.\n\nAuthored-by: Cheng Pan \nSigned-off-by: Dongjoon Hyun \n(cherry picked from commit 33cac4436e593c9c501c5ff0eedf923d3a21899c)\nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConve…"}},{"before":"8ccc8b92be50b1d5ef932873403e62e28c478781","after":"33cac4436e593c9c501c5ff0eedf923d3a21899c","ref":"refs/heads/master","pushedAt":"2024-05-10T05:55:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConversion`\n\n### What changes were proposed in this pull request?\n\n`spark.network.remoteReadNioBufferConversion` was introduced in https://github.com/apache/spark/commit/2c82745686f4456c4d5c84040a431dcb5b6cb60b, to allow disable [SPARK-24307](https://issues.apache.org/jira/browse/SPARK-24307) for safety, while during the whole Spark 3 period, there are no negative reports, it proves that [SPARK-24307](https://issues.apache.org/jira/browse/SPARK-24307) is solid enough, I propose to mark it deprecated in 3.5.2 and remove in 4.1.0 or later\n\n### Why are the changes needed?\n\nCode clean up\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46047 from pan3793/SPARK-47847.\n\nAuthored-by: Cheng Pan \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47847][CORE] Deprecate `spark.network.remoteReadNioBufferConve…"}},{"before":"9bb15db85e53b69b9c0ba112cd1dd93d8213eea4","after":"8ccc8b92be50b1d5ef932873403e62e28c478781","ref":"refs/heads/master","pushedAt":"2024-05-10T05:07:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of pyspark DataStreamReader methods\n\n### What changes were proposed in this pull request?\n\nThe docstrings of the pyspark DataStream Reader methods `csv()` and `text()` say that the `path` parameter can be a list, but actually when a list is passed an error is raised.\n\n### Why are the changes needed?\n\nDocumentation is wrong.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes. Fixes documentation.\n\n### How was this patch tested?\n\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46416 from chloeh13q/fix/streamread-docstring.\n\nAuthored-by: Chloe He \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48201][DOCS][PYTHON] Make some corrections in the docstring of…"}},{"before":"b371e7dd88009195740f8f5b591447441ea43d0b","after":"9bb15db85e53b69b9c0ba112cd1dd93d8213eea4","ref":"refs/heads/master","pushedAt":"2024-05-10T05:01:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48228][PYTHON][CONNECT] Implement the missing function validation in ApplyInXXX\n\n### What changes were proposed in this pull request?\nImplement the missing function validation in ApplyInXXX\n\nhttps://github.com/apache/spark/pull/46397 fixed this issue for `Cogrouped.ApplyInPandas`, this PR fix remaining methods.\n\n### Why are the changes needed?\nfor better error message:\n\n```\nIn [12]: df1 = spark.range(11)\n\nIn [13]: df2 = df1.groupby(\"id\").applyInPandas(lambda: 1, StructType([StructField(\"d\", DoubleType())]))\n\nIn [14]: df2.show()\n```\n\nbefore this PR, an invalid function causes weird execution errors:\n```\n24/05/10 11:37:36 ERROR Executor: Exception in task 0.0 in stage 10.0 (TID 36)\norg.apache.spark.api.python.PythonException: Traceback (most recent call last):\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py\", line 1834, in main\n process()\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py\", line 1826, in process\n serializer.dump_stream(out_iter, outfile)\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py\", line 531, in dump_stream\n return ArrowStreamSerializer.dump_stream(self, init_stream_yield_batches(), stream)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py\", line 104, in dump_stream\n for batch in iterator:\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/sql/pandas/serializers.py\", line 524, in init_stream_yield_batches\n for series in iterator:\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py\", line 1610, in mapper\n return f(keys, vals)\n ^^^^^^^^^^^^^\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py\", line 488, in \n return lambda k, v: [(wrapped(k, v), to_arrow_type(return_type))]\n ^^^^^^^^^^^^^\n File \"/Users/ruifeng.zheng/Dev/spark/python/lib/pyspark.zip/pyspark/worker.py\", line 483, in wrapped\n result, return_type, _assign_cols_by_name, truncate_return_schema=False\n ^^^^^^\nUnboundLocalError: cannot access local variable 'result' where it is not associated with a value\n\n\tat org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:523)\n\tat org.apache.spark.sql.execution.python.PythonArrowOutput$$anon$1.read(PythonArrowOutput.scala:117)\n\tat org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:479)\n\tat org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)\n\tat scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:601)\n\tat scala.collection.Iterator$$anon$9.hasNext(Iterator.scala:583)\n\tat org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)\n\tat org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)\n\tat org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)\n\tat org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)\n\tat org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:896)\n\n\t...\n```\n\nAfter this PR, the error happens before execution, which is consistent with Spark Classic, and\n much clear\n```\nPySparkValueError: [INVALID_PANDAS_UDF] Invalid function: pandas_udf with function type GROUPED_MAP or the function in groupby.applyInPandas must take either one argument (data) or two arguments (key, data).\n\n```\n\n### Does this PR introduce _any_ user-facing change?\nyes, error message changes\n\n### How was this patch tested?\nadded tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46519 from zhengruifeng/missing_check_in_group.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48228][PYTHON][CONNECT] Implement the missing function validat…"}},{"before":"2d609bfd37ae9a0877fb72d1ba0479bb04a2dad6","after":"b371e7dd88009195740f8f5b591447441ea43d0b","ref":"refs/heads/master","pushedAt":"2024-05-10T04:47:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48224][SQL] Disallow map keys from being of variant type\n\n### What changes were proposed in this pull request?\n\nThis PR disallows map keys from being of variant type. Therefore, SQL statements like `select map(parse_json('{\"a\": 1}'), 1)`, which would work earlier, will throw an exception now.\n\n### Why are the changes needed?\n\nAllowing variant to be the key type of a map can result in undefined behavior as this has not been tested.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, users could use variants as keys in maps earlier. However, this PR disallows this possibility.\n\n### How was this patch tested?\n\nUnit tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46516 from harshmotw-db/map_variant_key.\n\nAuthored-by: Harsh Motwani \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48224][SQL] Disallow map keys from being of variant type"}},{"before":"1138b2a68b5408e6d079bdbce8026323694628e5","after":"2d609bfd37ae9a0877fb72d1ba0479bb04a2dad6","ref":"refs/heads/master","pushedAt":"2024-05-10T04:31:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10\n\n### What changes were proposed in this pull request?\n\nThis PR aims to bump Spark's built-in Hive from 2.3.9 to Hive 2.3.10, with two additional changes:\n\n- due to API breaking changes of Thrift, `libthrift` is upgraded from `0.12` to `0.16`.\n- remove version management of `commons-lang:2.6`, it comes from Hive transitive deps, Hive 2.3.10 drops it in https://github.com/apache/hive/pull/4892\n\nThis is the first part of https://github.com/apache/spark/pull/45372\n\n### Why are the changes needed?\n\nBump Hive to the latest version of 2.3, prepare for upgrading Guava, and dropping vulnerable dependencies like Jackson 1.x / Jodd\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass GA. (wait for sunchao to complete the 2.3.10 release to make jars visible on Maven Central)\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #45372\n\nCloses #46468 from pan3793/SPARK-47018.\n\nLead-authored-by: Cheng Pan \nCo-authored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47018][BUILD][SQL] Bump built-in Hive to 2.3.10"}},{"before":"32b2827b964bd4a4accb60b47ddd6929f41d4a89","after":"1138b2a68b5408e6d079bdbce8026323694628e5","ref":"refs/heads/master","pushedAt":"2024-05-10T03:51:34.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin\n\n### What changes were proposed in this pull request?\n`${java.version}` and `${java.version}` (https://github.com/apache/spark/pull/46024/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R117)\nare equivalent duplicate configuration, so remove `${java.version}`.\nhttps://maven.apache.org/plugins/maven-compiler-plugin/examples/set-compiler-release.html\n\n### Why are the changes needed?\nSimplify the code and facilitates subsequent configuration iterations.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass the CIs.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46024 from zml1206/remove_duplicate_configuration.\n\nAuthored-by: zml1206 \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[MINOR][BUILD] Remove duplicate configuration of maven-compiler-plugin"}},{"before":"a41d0ae79b432e2757379fc56a0ad2755f02e871","after":"32b2827b964bd4a4accb60b47ddd6929f41d4a89","ref":"refs/heads/master","pushedAt":"2024-05-10T03:47:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecated` in `SQLImplicits`\n\n### What changes were proposed in this pull request?\nIn the `sql` module, some functions in `SQLImplicits` have already been marked as `deprecated` in the function comments after SPARK-19089.\n\nThis pr adds `deprecated` type annotation marks to them. Since SPARK-19089 occurred in Spark 2.2.0, the `since` field of `deprecated` is filled in as `2.2.0`.\n\nAt the same time, these `deprecated` marks have also been synchronized to the corresponding functions in `SQLImplicits` in the `connect` module.\n\n### Why are the changes needed?\nMark deprecated functions with `deprecated` in `SQLImplicits`\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass Github Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46029 from LuciferYang/deprecated-SQLImplicits.\n\nLead-authored-by: YangJie \nCo-authored-by: yangjie01 \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-47834][SQL][CONNECT] Mark deprecated functions with `@deprecat…"}},{"before":"9a2818820f11f9bdcc042f4ab80850918911c68c","after":"a41d0ae79b432e2757379fc56a0ad2755f02e871","ref":"refs/heads/master","pushedAt":"2024-05-10T03:23:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition\n\n### What changes were proposed in this pull request?\n\nRename `FIELDS_ALREADY_EXISTS` to `FIELD_ALREADY_EXISTS`.\n\n### Why are the changes needed?\n\nThough it's not meant to be a proper English sentence, `FIELDS_ALREADY_EXISTS` is grammatically incorrect. It should either be \"fields already exist[]\" or \"field[] already exists\". I opted for the latter.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it changes the name of an error condition.\n\n### How was this patch tested?\n\nCI only.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46510 from nchammas/SPARK-48176-field-exists-error.\n\nAuthored-by: Nicholas Chammas \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48176][SQL] Adjust name of FIELD_ALREADY_EXISTS error condition"}},{"before":"012d19d8e9b28f7ce266753bcfff4a76c9510245","after":"9a2818820f11f9bdcc042f4ab80850918911c68c","ref":"refs/heads/master","pushedAt":"2024-05-10T01:58:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Gem lock file\n\n### What changes were proposed in this pull request?\n\nSync the version of Bundler that we are using across various scripts and documentation. Also refresh the Gem lock file.\n\n### Why are the changes needed?\n\nWe are seeing inconsistent build behavior, likely due to the inconsistent Bundler versions.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nCI + the preview release process.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46512 from nchammas/bundler-sync.\n\nAuthored-by: Nicholas Chammas \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48222][INFRA][DOCS] Sync Ruby Bundler to 2.4.22 and refresh Ge…"}},{"before":"71f0eda71bc169a5245f4412ec0957728025a66c","after":"012d19d8e9b28f7ce266753bcfff4a76c9510245","ref":"refs/heads/master","pushedAt":"2024-05-09T23:58:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos\n\n### What changes were proposed in this pull request?\nDocument the requirement of seed in protos\n\n### Why are the changes needed?\nthe seed should be set at client side\n\ndocument it to avoid cases like https://github.com/apache/spark/pull/46456\n\n### Does this PR introduce _any_ user-facing change?\nno\n\n### How was this patch tested?\nci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46518 from zhengruifeng/doc_random.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48227][PYTHON][DOC] Document the requirement of seed in protos"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAER-LCAgA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}