Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: LIKE expression with invalid string literals returns a parse error instead of panicking #1046

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Cali0707
Copy link
Contributor

@Cali0707 Cali0707 commented May 1, 2024

Fixes #931

When visiting the nodes of the AST while parsing the LIKE expression, it looks like we assumed that we would have a valid string literal. However, it is possible that there is no valid string literal, in which case we have to handle that and return an error rather than continuing to parse (and subsequently panicking)

…or instead of panic

Signed-off-by: Calum Murray <cmurray@redhat.com>
@Cali0707 Cali0707 requested a review from a team as a code owner May 1, 2024 14:52
@Cali0707
Copy link
Contributor Author

Cali0707 commented May 1, 2024

cc @duglin @pierDipi @lionelvillard

Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

For reference, here's the spec for like-operation [ref]

like-operation ::= expression not-operator? like-operator string-literal

string-literal ::= ( "'" ( [^'] | "\'" )* "'" ) | ( '"' ( [^"] | '\"' )* '"')

@@ -115,4 +115,11 @@ tests:
result: false
- name: With type coercion from bool (4)
expression: "FALSE LIKE 'fal%'"
result: true
result: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, in cases where the first operand is a string, does the spec mandate that we do a string insensitive compare? I see where it talks about doing so for ceSQL keywords, but not values. Did I miss it? Should we add testcases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I'm not sure if this is covered in the spec. Perhaps we can open an issue in the spec repo and clarify what should happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine if we had a testcase FALSE LIKE 'FAL%', the test would fail - but maybe the spec should clarify whether or not that should actually happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duglin thinking about this some more, what is happening here is that the FALSE value is coerced to a string first, and then the string comparison is made.

Looking at the spec, I think there are two parts we need to clarify (I'll open a PR in a bit):

  1. That the LIKE expression casts the left operand to a string before evaluating the comparison
  2. That boolean values cast to lower case strings "true" and "false"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually more wondering whether: "foo" = "FOO" is meant to return true or false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think that would return false. My understanding is that the case insensitivity is only for the CESQL keywords, not for the actual values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a "spec complete" perspective, is that what we want? Or is it expected that everyone will use a lower() type of built-in function if they need it? Which is fine, I just wanted to double check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think having case sensitive values is easier to understand then the other way around. Normally in SQL I would expect that a string comparison on values would only be true if they match exactly, and would use something like lower() or upper() if I wanted a case insenstive comparison. @pierDipi not sure if you have any other ideas here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using mySQL for my xRegistry impl and found out that it does case insensitive compares by default. Go figure! I expected the opposite, like you :-) See: https://makingdatameaningful.com/is-sql-case-sensitive/

I'm ok either way the spec decides to go, but I think it should state its decision explicitly so it's clear. Today, it's kind of implied by the presence of the lower() and upper() funcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I've only ever used postgres which is case sensitive by default! I agree, let's clarify in the spec that string comparison is case sensitive instead of just implying it with the LOWER() and UPPER() functions.

@duglin
Copy link
Contributor

duglin commented May 3, 2024

Aside from my minor question, LGTM

@Cali0707
Copy link
Contributor Author

cc @embano1

@duglin
Copy link
Contributor

duglin commented May 14, 2024

do we have a testcase for this?

@Cali0707
Copy link
Contributor Author

do we have a testcase for this?

@duglin yes, the new tck test in this PR runs and passes with this change, but prior to the change it would panic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Panic if LIKE pattern is not a quoted string
3 participants