Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleSQLGrammar issue with anti slashes at the end of simple quoted string #5854

Open
1 task done
jgarec opened this issue Apr 26, 2024 · 7 comments
Open
1 task done

Comments

@jgarec
Copy link
Contributor

jgarec commented Apr 26, 2024

Search first

Description

Multiple SQL statements with anti slashes In simple quoted string are not splitted as expected.

Steps To Reproduce

Considering this simple sql file :

insert into MY_TABLE(MY_COLUMN) values('\D:\');insert into MY_TABLE(MY_COLUMN) values('D:\');

and this changeset :

databaseChangeLog:
-  changeSet:
    id:  insert-data
    author:  liquibase-docs
    changes:
    -  sqlFile:
        splitStatements:  true
        path:  data/data.sql
        stripComments:  true

Liquibase won't split the 2 statements. It seems not to be a problem for postgresql but it fails for Oracle : ORA-00933: SQL command not properly ended

Expected/Desired Behavior

Multiple statements splitted correctly

Liquibase Version

4.27.0

Database Vendor & Version

Oracle

Liquibase Integration

cli

Liquibase Extensions

No response

OS and/or Infrastructure Type/Provider

No response

Additional Context

UpdateSQL log shows a block of multiple statements :

-- Changeset data/changelog-data.yaml::insert-data::liquibase-docs
SET SEARCH_PATH TO public, "$user","public";

insert into MY_TABLE(MY_COLUMN) values('\D:\');insert into MY_TABLE(MY_COLUMN) values('D:\');

INSERT INTO public.databasechangelog (ID, AUTHOR, FILENAME, DATEEXECUTED, ORDEREXECUTED, MD5SUM, DESCRIPTION, COMMENTS, EXECTYPE, CONTEXTS, LABELS, LIQUIBASE, DEPLOYMENT_ID) VALUES ('insert-data', 'liquibase-docs', 'data/changelog-data.yaml', NOW(), 2, '9:76d7a3a15fd65a8ee3cc5014971c2d31', 'sqlFile path=data/data.sql', '', 'EXECUTED', NULL, NULL, '4.27.0', '4156402276');

Trying to do some debug, here is the result of the parser :

'insert into MY_TABLE(MY_COLUMN) values('\D:\');insert into MY_TABLE(MY_COLUMN) values('D:\');'
    <S_IDENTIFIER>                  : 'insert'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'into'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'MY_TABLE'
    <SYMBOL>                        : '('
    <S_IDENTIFIER>                  : 'MY_COLUMN'
    <SYMBOL>                        : ')'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'values'
    <SYMBOL>                        : '('
    <S_CHAR_LITERAL>                : ''\D:\');insert into MY_TABLE(MY_COLUMN) values(''
    <S_IDENTIFIER>                  : 'D'
    <SYMBOL>                        : ':'
    <SYMBOL>                        : '\'
    <SYMBOL>                        : '''
    <SYMBOL>                        : ')'
    <SYMBOL>                        : ';'

If I remove one specific anti slash, it's ok :

'insert into MY_TABLE(MY_COLUMN) values('\D:');insert into MY_TABLE(MY_COLUMN) values('D:\');'
    <S_IDENTIFIER>                  : 'insert'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'into'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'MY_TABLE'
    <SYMBOL>                        : '('
    <S_IDENTIFIER>                  : 'MY_COLUMN'
    <SYMBOL>                        : ')'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'values'
    <SYMBOL>                        : '('
    <S_CHAR_LITERAL>                : ''\D:''
    <SYMBOL>                        : ')'
    <SYMBOL>                        : ';'
    <S_IDENTIFIER>                  : 'insert'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'into'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'MY_TABLE'
    <SYMBOL>                        : '('
    <S_IDENTIFIER>                  : 'MY_COLUMN'
    <SYMBOL>                        : ')'
    <WHITESPACE>                    : ' '
    <S_IDENTIFIER>                  : 'values'
    <SYMBOL>                        : '('
    <S_CHAR_LITERAL>                : ''D:\''
    <SYMBOL>                        : ')'
    <SYMBOL>                        : ';'

Are you willing to submit a PR?

  • [] I'm willing to submit a PR (Thank you!)
@jgarec jgarec changed the title SimpleSQLGrammar issue with anti slashes in simple quoted string SimpleSQLGrammar issue with anti slashes at the end of simple quoted string Apr 26, 2024
@jasonlyle88
Copy link
Contributor

I've got some other things I have to look into and get sorted, but I will try to take a look at this as well.

I'm also having some issues with my setup right now so I can't do this, but it might be worth doing a quick test of chging the definition of ESC_ANY_CHAR in the liquibase-standard/src/main/javacc/liquibase/util/grammar/SimpleSqlGrammar.jj file to escape any character OTHER than a single quote (I think < #ESC_ANY_CHAR: "\\" ~["'"] >) and see if the tests still all pass and if your issue is resolved!

@jgarec
Copy link
Contributor Author

jgarec commented May 3, 2024

No it breaks some tests:

[ERROR]   SimpleSqlGrammarTest.test:24 Condition not satisfied:

tokens == expected
|      |  |
|      |  ['''',  , sometimesEquals,  , '\'']
|      false
['''',  , sometimesEquals,  , '\', ']

[ERROR]   SimpleSqlGrammarTest.test:24 Condition not satisfied:

tokens == expected
|      |  |
|      |  ['\'',  , sometimesEquals,  , "'"]
|      false
['\', ' sometimesEquals "', "]

[ERROR]   SimpleSqlGrammarTest.test:24 Condition not satisfied:

tokens == expected
|      |  |
|      |  [mysql,  , escaped,  , quotes,  , ' \' ']
|      false
[mysql,  , escaped,  , quotes,  , ' \',  , ']

[ERROR]   SimpleSqlGrammarTest.test:24 Condition not satisfied:

tokens == expected
|      |  |
|      |  [mysql,  , escaped,  , quotes,  , '\'']
|      false
[mysql,  , escaped,  , quotes,  , '\', ']

[ERROR]   SimpleSqlGrammarTest.test:24 Condition not satisfied:

tokens == expected
|      |  |
|      |  [stringwith,  , escquote,  , delim,  , newline,  , 'a\'b;c
|      |  d']
|      false
[stringwith,  , escquote,  , delim,  , newline,  , 'a\', b, ;, c, 
, d, ']

[INFO] 
[ERROR] Tests run: 39, Failures: 5, Errors: 0, Skipped: 0

@jasonlyle88
Copy link
Contributor

jasonlyle88 commented May 3, 2024

Ah, yup, that makes sense. The grammar was designed to think \' is an escaped quote because apparently MySQL allows that form of quote escaping. However, since Oracle (and I think most other DBs) do not allow that, you have a changeset with valid quoting for Oracle that is parsing incorrectly. Unfortunately, the system right now just has a single grammar that is used for all databases the same way. While I did the work to fix issues in the issue/pr you mentions (which is why I was quick to jump on this), I think the core Liquibase team will need to get involved here to decide how to handle the situation where we have rules lashing across different databases.

If you need a work around in the meantime @jgarec , you can switch your strings to use the oracle q quoting syntax since it will move the \ character away from the ' character

insert into MY_TABLE(MY_COLUMN) values(q'[\D:\]');insert into MY_TABLE(MY_COLUMN) values(q'[D:\]');

Basically, the first ; is ignored because the grammar believes it is part of a string, and is therefore not considered to be an eligible endDelimiter. By using the q quoting syntax, the \ character is moved away from the ' character so that the grammar does not get confused and think the \' characters are an escaped quote.

Not ideal having to change your code, but wanted to at least provide a work around if you need one!

@tati-qalified
Copy link
Contributor

Hi @jgarec and @jasonlyle88, thank you for reporting this issue and for the insightful discussion.
I could replicate this bug for Oracle, MySQL, MSSQL and PostgreSQL.
It doesn't affect the execution for MSSQL nor PostgreSQL, but it does for Oracle and MySQL.

For MySQL, \' is processed as the string value of the simple quote, so the statement is considered part of the text. It can be seen by the colours here:
image
Inverting the order of the statements doesn't fix the problem.

For Oracle, the error is not with the escaped character, but with the statements not being split. Running the generated insert into MY_TABLE(MY_COLUMN) values('\D:\');insert into MY_TABLE(MY_COLUMN) values('D:\'); will output an error: SQL Error [933] [42000]: ORA-00933: unexpected token at or near ;.
However, running either statement separately works with no issues.

@jgarec I see that you're willing to submit a PR. Feel free to do so, and you can ask any questions you may have to our development team. Please include some test cases to verify that the fix doesn't break for other DBs - the 4 I mentioned above would be enough to begin with.

Thank you!
Tatiana

@jgarec
Copy link
Contributor Author

jgarec commented May 8, 2024

Hello @tati-qalified, yes I was interested to submit a PR but I think it's too complicated for me to fix this.
Maybe it's necessary to have dedicated grammar depending of the dbms ? Not sure this could be handled with the same grammar

@jasonlyle88
Copy link
Contributor

Hey @tati-qalified ,

Unfortunately, the problem is more nuanced than your observations. If you take a look at the liquibase-standard/src/main/javacc/liquibase/util/grammar/SimpleSqlGrammar.jj file, you will see that there is a comment talking about the \' escaping:

/* Valid in Postgres and MySQL (if NO_BACKSLASH_ESCAPES not enabled), NOT valid in Oracle or MSSQL */

So MySQL can accept that as a way to escape a single quote (or not) depending on a setting for the database. Unfortunately syntax highlighting isn't a good indicator here.

And while you are correct that the Oracle error being presented is because the statements are not split correctly, you are actually addressing a symptom (the failure to split the statement correctly), and not the root cause. The root cause here is that the parsing engine is not considering the first ; character a delimiter, even though it should be considered a delimiter in this case. The ; character is not considered a delimiter because the parsers believes it is part of a quoted string.

This can be seen if you add @jgarec 's example to the unit tests in the liquibase-standard/src/test/groovy/liquibase/util/grammar/SimpleSqlGrammarTest.groovy file. When I updated this unit test last, I had it output each part that the grammar identifies and what type the parser identifies it as.

For all of these reasons, this is why I suggested the core liquibase team needs to get involved here to decide on a course of action. While the easiest solution is to say that the \' quote escaping mechanism should not be supported, this could cause regression issues for consumers with changesets that worked before with this type of quote escaping that are no longer working.

The more correct, yet more complicated to implement solution, is a grammar that is DBMS aware as I stated above and @jgarec believes may be necessary as well based on his response. But how should this be implemented? Should it be a generic grammar that has overrides just for specific DBMS systems? or should it be fully defined grammars for each DBMS? Or a generic grammar that is used for all DBMS systems unless a full grammar is specifically created for a given DBMS?

With the grammar being a very core part of liquibase, these decisions can have long standing impacts on the product going forward. And while I have worked on this section of the code previously, I am not comfortable making a decision about what could be potentially such a large impact on the product as a whole.

Because of all these reasons, I'd love for the team to have some discussion here and additionally I don't believe the triaged label is appropriate.

Let me know if there are any further questions or concerns!

@tati-qalified
Copy link
Contributor

@jasonlyle88 thank you for your feedback. Firstly I want to reassure you that the triaged label doesn't mean that we are done with this issue - it's just an internal label. We welcome any further discussion regarding this bug.

And while you are correct that the Oracle error being presented is because the statements are not split correctly, you are actually addressing a symptom (the failure to split the statement correctly), and not the root cause.

Yes, I was presenting the problems with the output, that are caused by the bug, as it is a blocker for Oracle. I've found it's useful to mention what databases are affected and how. I apologize if my comment wasn't clear.

Usually the technical details about the fix are discussed in the submitted PR, though I understand if in this case it should happen before submitting it. Our development team is aware of this issue and willing to discuss it further. Let me know if you have any other questions or concerns.

Thank you,
Tatiana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants