Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBO - 2023-10-21 - FIX - normalize_query - proper handling of balanced single-quoted strings with escaped quotes inside #803

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bbourgier
Copy link
Contributor

### BBO - 2023-10-21 - FIX - proper handling of balanced single-quoted strings with escaped quotes inside
### Regexp s/\\'//gs removes \' (escaped quotes supposed to be inside quoted strings) from the query string
### BUT
### This simple regexp DOES NOT check for balanced quotes for the outer quoted string
### and the quotes within quotes disturb the next replacement (Regexp s/'[^']*'/\?/gs)
### where we remove everything quoted to replace it with a question mark (?)
### Example#01 - Simple Case: Query string is: call foo('foo1 is \'foo2\'')
### call foo('foo1 is \'foo2\'') >> call foo('foo1 is foo2') >> call foo(?)
### >> In this simple case, the replacement works ok
### Example#02 - BUG Case: Query string is: call foo(3, '\\some.site.com\folder\', 1000);
### The problem here is that we don't want to remove the \' because it really is \'
### and NOT an escaped quote inside another quoted string...
### Replacement sequence is:
### call foo(3, '\\rooterpd1v0166.corp.idemia.com\sftp_boomi$\cps\in\', 1000);
###   >> call foo(3, '\\rooterpd1v0166.corp.idemia.com\sftp_boomi$\cps\in, 1000);
###   >> and that's all :-( Normalization does not work in this case
### Fix:
### In order to properly handle quoted strings with escaped quotes inside AND make sure outer quotes
### are balanced, we'll use a more complicates regexp
### This regexp will replace balanced quoted strings (including empty strings)
### WITH escaped quotes inside by a question mark (?)
### Matching pattern: s/((?<![\\])['"])(?:\1|((?:.(?!(?<![\\])\1))*.?)\1)/gs
### Replace pattern: s/((?<![\\])['"])(?:\1|((?:.(?!(?<![\\])\1))*.?)\1)/\?/gs
###

…rings with escaped quotes inside

### Regexp s/\\'//gs removes \' (escaped quotes supposed to be inside quoted strings) from the query string
### BUT
### This simple regexp DOES NOT check for balanced quotes for the outer quoted string
### and the quotes within quotes disturb the next replacement (Regexp s/'[^']*'/\?/gs)
### where we remove everything quoted to replace it with a question mark (?)
### Example#01 - Simple Case: Query string is: call foo('foo1 is \'foo2\'')
### call foo('foo1 is \'foo2\'') >> call foo('foo1 is foo2') >> call foo(?)
### >> In this simple case, the replacement works ok
### Example#02 - BUG Case: Query string is: call foo(3, '\\some.site.com\folder\', 1000);
### The problem here is that we don't want to remove the \' because it really is \'
### and NOT an escaped quote inside another quoted string...
### Replacement sequence is:
### call foo(3, '\\rooterpd1v0166.corp.idemia.com\sftp_boomi$\cps\in\', 1000);
###   >> call foo(3, '\\rooterpd1v0166.corp.idemia.com\sftp_boomi$\cps\in, 1000);
###   >> and that's all :-( Normalization does not work in this case
### Fix:
### In order to properly handle quoted strings with escaped quotes inside AND make sure outer quotes
### are balanced, we'll use a more complicates regexp
### This regexp will replace balanced quoted strings (including empty strings)
### WITH escaped quotes inside by a question mark (?)
### Matching pattern: s/((?<![\\])['"])(?:\1|((?:.(?!(?<![\\])\1))*.?)\1)/gs
### Replace pattern: s/((?<![\\])['"])(?:\1|((?:.(?!(?<![\\])\1))*.?)\1)/\?/gs
…rings with escaped quotes inside

description identical to commit 96eec7
+
small modif in regexp to handle properly following case:
CALL function_foo ('\\some.site.com\folder\'   ,  'some_string');
(quoted string AFTER \' closing network path pattern)
…rings with escaped quotes inside

description identical to commit 402c5ab
+
Let's only handle SINGLE-quoted strings since original pattern only handles SINGLE-quoted strings in order to achieve identical logic
… case after previous regexp processing

### Case remains because of the [^,\s]? part used previously to properly detect and split call parameters
### Remaining case: '<any_chars><spaces or commas>'
### Matching pattern: s/('[^']*[,\s]+')/gs
### Replace pattern: s/('[^']*[,\s]+')/\?/gs

sorry for the successive commits.
now my HTML page looks identical to the original 12.2 one EXCEPT for all the network paths part which are properly handled (in 2 steps).
@bbourgier bbourgier marked this pull request as draft October 23, 2023 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant