Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48168][SQL] Add bitwise shifting operators support #46440

Closed
wants to merge 16 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented May 7, 2024

What changes were proposed in this pull request?

This PR introduces three bitwise shifting operators as aliases for existing shifting functions.

Why are the changes needed?

The bit shifting functions named in alphabet form vary from one platform to anthor. Take our shiftleft as an example,

  • Hive, shiftleft (where we copied it from)
  • MsSQL Server LEFT_SHIFT
  • MySQL, N/A
  • PostgreSQL, N/A
  • Presto, bitwise_left_shift

The bit shifting operators share a much more common and consistent way for users to port their queries.

For self-consistent with existing bit operators in Spark, AND &, OR |, XOR ^ and NOT ~, we now add <<, >> and >>>.

For other systems that we can refer to:

https://learn.microsoft.com/en-us/sql/t-sql/functions/left-shift-transact-sql?view=sql-server-ver16
https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html

Does this PR introduce any user-facing change?

Yes, new operators were added but no behavior change

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label May 7, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this causes SQL parser errors. Could you make the CI happy? :)

- SPARK-43333: Stable field names when converting Union type *** FAILED *** (266 milliseconds)
[info]   org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'. SQLSTATE: 42601 (line 1, pos 57)
[info] 
[info] == SQL ==
[info] field1 struct<member_array: array<float>, member_map: map<string, int>>
[info] ---------------------------------------------------------^^^

@yaooqinn
Copy link
Member Author

yaooqinn commented May 8, 2024

Thank you @dongjoon-hyun
Let me look into this issue, it might take a while

@yaooqinn yaooqinn marked this pull request as draft May 8, 2024 03:43
@dongjoon-hyun
Copy link
Member

No problem, @yaooqinn . We have enough time for 4.0.0. :)

@dongjoon-hyun
Copy link
Member

It seems that TPCDS golden files are affected still.

[info] *** 23 TESTS FAILED ***
[error] Failed: Total 3499, Failed 23, Errors 0, Passed 3476, Ignored 4
[error] Failed tests:
[error] 	org.apache.spark.sql.SQLQuerySuite
[error] 	org.apache.spark.sql.TPCDSV1_4_PlanStabilitySuite
[error] 	org.apache.spark.sql.TPCDSV2_7_PlanStabilityWithStatsSuite
[error] 	org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite
[error] 	org.apache.spark.sql.TPCDSV2_7_PlanStabilitySuite
[error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful

@yaooqinn yaooqinn marked this pull request as ready for review May 9, 2024 05:03
@yaooqinn
Copy link
Member Author

yaooqinn commented May 9, 2024

Thank you @dongjoon-hyun for providing the logs

@@ -980,9 +981,16 @@ valueExpression
| left=valueExpression operator=AMPERSAND right=valueExpression #arithmeticBinary
| left=valueExpression operator=HAT right=valueExpression #arithmeticBinary
| left=valueExpression operator=PIPE right=valueExpression #arithmeticBinary
| left=valueExpression shiftOperator right=valueExpression #shiftExpression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. #shiftExpression is not aligned.

> SELECT _FUNC_(2, 1);
> SELECT shiftleft(2, 1);
4
> SELECT 2 << 1;
4
""",
since = "1.5.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 1.5.0 doesn't match with the new usage base << exp.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late, I have been figuring out the syntax issues.

I have made some clarifications in the note field.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New SQL syntax looks incorrect to me. Could you revise more?

shiftOperator
: LT LT
| GT GT
| GT GT GT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is wrong because we should not allow spaces between LT and GT.

For example, < < is a syntax error in other DBMS.

postgres=# SELECT 2 < < 1;
ERROR:  syntax error at or near "<"
LINE 1: SELECT 2 < < 1;

But, this PR allows the following.

scala> sql("SELECT 2 < < 1").show()
+--------+
|(2 << 1)|
+--------+
|       4|
+--------+

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a way to retrieve the whitespace back to fail such a case. Tokenize these operators works but it fails to parse nested map and struct types

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule has been refined and whitespace-discrete operators now result in error as expected

@yaooqinn
Copy link
Member Author

cc @cloud-fan

@@ -799,6 +799,9 @@ object FunctionRegistry {
expression[BitwiseNot]("~"),
expression[BitwiseOr]("|"),
expression[BitwiseXor]("^"),
expression[ShiftLeft]("<<", true, Some("4.0.0")),
expression[ShiftRight](">>", true, Some("4.0.0")),
expression[ShiftRightUnsigned](">>>", true, Some("4.0.0")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do they exist in other databases?

cc @srielau

Copy link
Member Author

@yaooqinn yaooqinn May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.sqlite.org/lang_expr.html
https://www.ibm.com/docs/en/i/7.3?topic=expressions-bitwise-left-right-shift-operators

More evidences.

BTW, >>> is not seen from any other systems but added for self-consistency

* GT token and we do nothing.
*/
public void decComplexTypeLevelCounter() {
if (complex_type_level_counter > 0) complex_type_level_counter++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the method comment , shall it be complex_type_level_counter--; ? BTW, is there a test case for it, e.g. array + shiftleft ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thank you!

new tests added in the last commits

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm if tests pass

@ulysses-you
Copy link
Contributor

thanks, merged to master(4.0.0)

@@ -1,2 +1,2 @@
Project [shiftleft(cast(b#0 as int), 2) AS shiftleft(b, 2)#0]
Project [(cast(b#0 as int) << 2) AS (b << 2)#0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the pretty string unchanged? if people explicitly use shiftleft function, then the generated alias name should still be shiftleft.

@cloud-fan
Copy link
Contributor

Have we considered the operator precedence? Does it follow C? If yes let's add tests and document it.
https://en.cppreference.com/w/c/language/operator_precedence

@yaooqinn
Copy link
Member Author

Hi @cloud-fan Thank you for the check.

I have made a follow-up to fix the precedence of these operators, the new precedence is between '+/-' and '&' with a left-to-right logic. #46753

As for the documentation you mentioned, it's orthogonal as there is no existing doc for this topic yet. So, I created SPARK-48426 to follow.

@yaooqinn yaooqinn deleted the SPARK-48168 branch May 27, 2024 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants