-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: LIKE expression with invalid string literals returns a parse error instead of panicking #1046
base: main
Are you sure you want to change the base?
fix: LIKE expression with invalid string literals returns a parse error instead of panicking #1046
Conversation
…or instead of panic Signed-off-by: Calum Murray <cmurray@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For reference, here's the spec for like-operation [ref]
like-operation ::= expression not-operator? like-operator string-literal
string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"')
@@ -115,4 +115,11 @@ tests: | |||
result: false | |||
- name: With type coercion from bool (4) | |||
expression: "FALSE LIKE 'fal%'" | |||
result: true | |||
result: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, in cases where the first operand is a string, does the spec mandate that we do a string insensitive compare? I see where it talks about doing so for ceSQL keywords, but not values. Did I miss it? Should we add testcases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I'm not sure if this is covered in the spec. Perhaps we can open an issue in the spec repo and clarify what should happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine if we had a testcase FALSE LIKE 'FAL%'
, the test would fail - but maybe the spec should clarify whether or not that should actually happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duglin thinking about this some more, what is happening here is that the FALSE
value is coerced to a string first, and then the string comparison is made.
Looking at the spec, I think there are two parts we need to clarify (I'll open a PR in a bit):
- That the LIKE expression casts the left operand to a
string
before evaluating the comparison - That boolean values cast to lower case strings
"true"
and"false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually more wondering whether: "foo" = "FOO"
is meant to return true or false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think that would return false
. My understanding is that the case insensitivity is only for the CESQL keywords, not for the actual values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a "spec complete" perspective, is that what we want? Or is it expected that everyone will use a lower() type of built-in function if they need it? Which is fine, I just wanted to double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think having case sensitive values is easier to understand then the other way around. Normally in SQL I would expect that a string comparison on values would only be true if they match exactly, and would use something like lower() or upper() if I wanted a case insenstive comparison. @pierDipi not sure if you have any other ideas here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using mySQL for my xRegistry impl and found out that it does case insensitive compares by default. Go figure! I expected the opposite, like you :-) See: https://makingdatameaningful.com/is-sql-case-sensitive/
I'm ok either way the spec decides to go, but I think it should state its decision explicitly so it's clear. Today, it's kind of implied by the presence of the lower() and upper() funcs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I've only ever used postgres which is case sensitive by default! I agree, let's clarify in the spec that string comparison is case sensitive instead of just implying it with the LOWER() and UPPER() functions.
Aside from my minor question, LGTM |
cc @embano1 |
do we have a testcase for this? |
@duglin yes, the new tck test in this PR runs and passes with this change, but prior to the change it would panic |
Fixes #931
When visiting the nodes of the AST while parsing the LIKE expression, it looks like we assumed that we would have a valid string literal. However, it is possible that there is no valid string literal, in which case we have to handle that and return an error rather than continuing to parse (and subsequently panicking)