-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48168][SQL] Add bitwise shifting operators support #46440
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this causes SQL parser errors. Could you make the CI happy? :)
- SPARK-43333: Stable field names when converting Union type *** FAILED *** (266 milliseconds)
[info] org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'. SQLSTATE: 42601 (line 1, pos 57)
[info]
[info] == SQL ==
[info] field1 struct<member_array: array<float>, member_map: map<string, int>>
[info] ---------------------------------------------------------^^^
Thank you @dongjoon-hyun |
No problem, @yaooqinn . We have enough time for 4.0.0. :) |
…lass sun.util.calendar.ZoneInfo'
…sible: class sun.util.calendar.ZoneInfo'" This reverts commit 6b5a1ae.
It seems that TPCDS golden files are affected still.
|
Thank you @dongjoon-hyun for providing the logs |
@@ -980,9 +981,16 @@ valueExpression | |||
| left=valueExpression operator=AMPERSAND right=valueExpression #arithmeticBinary | |||
| left=valueExpression operator=HAT right=valueExpression #arithmeticBinary | |||
| left=valueExpression operator=PIPE right=valueExpression #arithmeticBinary | |||
| left=valueExpression shiftOperator right=valueExpression #shiftExpression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. #shiftExpression
is not aligned.
> SELECT _FUNC_(2, 1); | ||
> SELECT shiftleft(2, 1); | ||
4 | ||
> SELECT 2 << 1; | ||
4 | ||
""", | ||
since = "1.5.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 1.5.0
doesn't match with the new usage base << exp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late, I have been figuring out the syntax issues.
I have made some clarifications in the note field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New SQL syntax looks incorrect to me. Could you revise more?
shiftOperator | ||
: LT LT | ||
| GT GT | ||
| GT GT GT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is wrong because we should not allow spaces between LT
and GT
.
For example, < <
is a syntax error in other DBMS.
postgres=# SELECT 2 < < 1;
ERROR: syntax error at or near "<"
LINE 1: SELECT 2 < < 1;
But, this PR allows the following.
scala> sql("SELECT 2 < < 1").show()
+--------+
|(2 << 1)|
+--------+
| 4|
+--------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't found a way to retrieve the whitespace back to fail such a case. Tokenize these operators works but it fails to parse nested map and struct types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule has been refined and whitespace-discrete operators now result in error as expected
cc @cloud-fan |
@@ -799,6 +799,9 @@ object FunctionRegistry { | |||
expression[BitwiseNot]("~"), | |||
expression[BitwiseOr]("|"), | |||
expression[BitwiseXor]("^"), | |||
expression[ShiftLeft]("<<", true, Some("4.0.0")), | |||
expression[ShiftRight](">>", true, Some("4.0.0")), | |||
expression[ShiftRightUnsigned](">>>", true, Some("4.0.0")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do they exist in other databases?
cc @srielau
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://learn.microsoft.com/en-us/sql/t-sql/functions/left-shift-transact-sql?view=sql-server-ver16
https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html
I remember attaching these examples in the pull request description, but they disappeared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.sqlite.org/lang_expr.html
https://www.ibm.com/docs/en/i/7.3?topic=expressions-bitwise-left-right-shift-operators
More evidences.
BTW, >>>
is not seen from any other systems but added for self-consistency
* GT token and we do nothing. | ||
*/ | ||
public void decComplexTypeLevelCounter() { | ||
if (complex_type_level_counter > 0) complex_type_level_counter++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the method comment , shall it be complex_type_level_counter--;
? BTW, is there a test case for it, e.g. array + shiftleft
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Thank you!
new tests added in the last commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if tests pass
thanks, merged to master(4.0.0) |
@@ -1,2 +1,2 @@ | |||
Project [shiftleft(cast(b#0 as int), 2) AS shiftleft(b, 2)#0] | |||
Project [(cast(b#0 as int) << 2) AS (b << 2)#0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the pretty string unchanged? if people explicitly use shiftleft
function, then the generated alias name should still be shiftleft
.
Have we considered the operator precedence? Does it follow C? If yes let's add tests and document it. |
Hi @cloud-fan Thank you for the check. I have made a follow-up to fix the precedence of these operators, the new precedence is between '+/-' and '&' with a left-to-right logic. #46753 As for the documentation you mentioned, it's orthogonal as there is no existing doc for this topic yet. So, I created SPARK-48426 to follow. |
What changes were proposed in this pull request?
This PR introduces three bitwise shifting operators as aliases for existing shifting functions.
Why are the changes needed?
The bit shifting functions named in alphabet form vary from one platform to anthor. Take our shiftleft as an example,
The bit shifting operators share a much more common and consistent way for users to port their queries.
For self-consistent with existing bit operators in Spark,
AND &
,OR |
,XOR ^
andNOT ~
, we now add<<
,>>
and>>>
.For other systems that we can refer to:
https://learn.microsoft.com/en-us/sql/t-sql/functions/left-shift-transact-sql?view=sql-server-ver16
https://www.postgresql.org/docs/9.4/functions-bitstring.html
https://dev.mysql.com/doc/refman/8.0/en/bit-functions.html
Does this PR introduce any user-facing change?
Yes, new operators were added but no behavior change
How was this patch tested?
new tests
Was this patch authored or co-authored using generative AI tooling?
no