Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(sql) single backslash inside a string literal seems to not be valid #1748

Closed
schallm opened this issue May 17, 2018 · 18 comments · Fixed by #2301
Closed

(sql) single backslash inside a string literal seems to not be valid #1748

schallm opened this issue May 17, 2018 · 18 comments · Fixed by #2301
Labels
good first issue Should be easier for first time contributors help welcome Could use help from community language
Milestone

Comments

@schallm
Copy link
Contributor

schallm commented May 17, 2018

I opened a ticket against VS Code for a colorization issue (microsoft/vscode#50059). They closed it as upstream to this project.

It boils down to the following gets incorrect highlighting due to the backslash.

```sql
select '\' + foo from bar
```

Same issue displays here on github:

select '\' + foo from bar

vs

select 'hello' + foo from bar
@joshgoebel
Copy link
Member

joshgoebel commented Oct 7, 2019

Are backslashes not used to escape things in SQL? I forget. Our grammars thinks that's what they are for.

@joshgoebel joshgoebel self-assigned this Oct 7, 2019
@schallm
Copy link
Contributor Author

schallm commented Oct 7, 2019

Nope, at least not in TSQL (MS Sql Server), if you need a single quote inside of a string, you just double the single quote.

MSSQL:
https://stackoverflow.com/questions/1586560/how-do-i-escape-a-single-quote-in-sql-server

MYSQL:
https://stackoverflow.com/questions/9596652/how-to-escape-apostrophe-in-mysql

POSTGRES:
https://www.postgresql.org/docs/8.2/runtime-config-compatible.html#GUC-BACKSLASH-QUOTE

The preferred, SQL-standard way to represent a quote mark is by doubling it ('') but PostgreSQL has historically also accepted \'. However, use of \' creates security risks because in some client character set encodings, there are multibyte characters in which the last byte is numerically equivalent to ASCII. If client-side code does escaping incorrectly then a SQL-injection attack is possible.

@joshgoebel
Copy link
Member

What about escaping newlines and other characters?

@joshgoebel
Copy link
Member

joshgoebel commented Oct 7, 2019

Mysql for example:

https://dev.mysql.com/doc/refman/8.0/en/string-literals.html

Screen Shot 2019-10-07 at 10 10 39 AM

So what you pasted looks invalid the way I read this. I think we need some more SQL people to weigh in...

@joshgoebel joshgoebel changed the title SQL: Backslash in a string causes coloring issue SQL: Backslash inside a string (discuss validity) Oct 7, 2019
@egor-rogov
Copy link
Collaborator

The problem is, it's vendor specific, but we have one-language-fits-all (except PostgreSQL and, hopefully, MSSQL).
In my opinion our (distant) goal is to provide different languages for major DBMS and leave in-core sql to be close to Standard SQL.

@joshgoebel
Copy link
Member

Well, it's worse than that even. See the MySQL docs... it's a CONFIGURABLE option... :-)

Sometimes \ means escape, and sometimes it means literal \. It's a per server setting. :-)

@joshgoebel
Copy link
Member

I have a feeling this might stay open a while until someone can explain how to handle both these correctly:

select '\' + foo from bar
select '\'' + foo from bar

Obviously if one SQL provider is 100% strict in how they handle this we could fix it in their own grammar, but right now having the escaping in the core sql seems better than removing it... maybe there is a simple way to deal with this we're just missing?

@jwk6
Copy link

jwk6 commented Oct 9, 2019

For Microsoft SQL Server and other RDMSes, the string ''' (backslash, single quote) is not a valid string literal. Thus VSCode displays an "Unclosed quotation mark after the character string '''' + foo from bar" error. If you escape the embedded single quote with a single quote, then there's no parsing error and the syntax highlighting is correct.

TSQLEscapedSingleQuote

See the "Character string constants" section here in the Docs - https://docs.microsoft.com/en-us/sql/t-sql/data-types/constants-transact-sql

Oracle - https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements003.htm

DB2 -https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/sqlref/src/tpc/db2z_apostrophesandquotesindelims.html#db2z_apostrophesandquotesindelims

Using the delimiter to escape itself is actually a nice solution, because SQL is often written by non-programmers and they don't have to remember to escape all the other special characters within the string. So, for example:

select 'So this
    is a perfectly valid
    string literal too.'  -- contains embedded CRLFs and tabs
     + foo
from bar

Bear in mind, if you "fix" the problem is highlight.js, then syntax highlighting will be "correct" but parsing will still report an error. As the PostgreSQL link above says "Note that in a standard-conforming string literal, \ just means \ anyway."

It seems to me that maybe MySQL by itself has gone off the rails a bit as far as string literals are concerned. Yes, I mean MySQL is non-standard conforming in this case.

@joshgoebel
Copy link
Member

So neither GitHub or Prism (I just checked) accepts this as proper SQL. So it seems mistaken or not we're in good company. I think we need a lot more info before we'd consider changing this.

Screen Shot 2019-10-11 at 8 44 05 AM

@joshgoebel joshgoebel added help welcome Could use help from community on hold and removed question labels Oct 13, 2019
@joshgoebel
Copy link
Member

@schallm Can you point to anything in the SQL spec documentation that this is indeed a SQL issue and not specific to one of the more specific grammars such as T-SQL? If not I plan to close this issue.

@joshgoebel joshgoebel changed the title SQL: Backslash inside a string (discuss validity) (sql) single backslash inside a string literal (discuss validity) Oct 22, 2019
@schallm
Copy link
Contributor Author

schallm commented Oct 22, 2019

From what I understand and others have commented, the SQL spec is actually defined as I'm requesting.

It seems to me that maybe MySQL by itself has gone off the rails a bit as far as string literals are concerned. Yes, I mean MySQL is non-standard conforming in this case.  @jwk6

Is there a publicly available version of the SQL spec that I can research and point to?  The only one I can find is not freely available.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 22, 2019

Is there a publicly available version of the SQL spec that I can research and point to?

Yeah, that's annoying. No idea. Not sure it would have to be the spec... but we need two things answered here:

  • What is "correct" according to spec?
  • How can our grammar know the difference if some popular DBs support BOTH?

For Microsoft SQL Server and other RDMSes...

Is a little vague for me. Sadly @jwk6 focused a lot of thought on how to escape quotes rather than the actual validity of \'. So far what we seem to know:

  • For MySQL it's both valid or invalid depending on how you set NO_BACKSLASH_ESCAPES. By default it's implied that \' is valid and would represent an escaped single quote.
  • MS SQL seems to disallow this (per jwk6)
  • PostgreSQL: Any other character following a backslash is taken literally. Thus, to include a backslash character, write two backslashes (\). Also, a single quote can be included in an escape string by writing \', in addition to the normal way of ''. https://www.postgresql.org/docs/9.2/sql-syntax-lexical.html
  • We're not sure yet on other DBs...

@joshgoebel
Copy link
Member

joshgoebel commented Oct 22, 2019

Interestingly postgres says this:

PostgreSQL also accepts "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g., E'foo'. Within an escape string, a backslash character () begins a C-like backslash escape sequence, in which the combination of backslash and following character(s) represent a special byte value, as shown in Table 4-1.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 22, 2019

Evidentially double quotes aren't part of standard SQL either though, lol...

https://stackoverflow.com/questions/881194/how-do-i-escape-special-characters-in-mysql/881252#881252

So I'm left a little unsure how to fix this since our SQL is already blurred and SQL can mean so many things... I think I've talked myself out of closing it though, but I still don't see a quick fix right now.

@egor-rogov
Copy link
Collaborator

First of all: @schallm as long as your issue concerns MSSQL, you have you tried tsql grammar (which is in its own repo https://github.com/highlightjs/highlightjs-tsql)? I believe it handles quotes correctly according to MSSQL.

Now, for the sake of discussion:

Is there a publicly available version of the SQL spec that I can research and point to?

No, but there are publicly available drafts which hopefully don't differ much from the actual standard.

we need two things answered here:
What is "correct" according to spec?

As I can see in my copy of the standard, backslash is just a regular symbol, it is not used for escaping.

How can our grammar know the difference if some popular DBs support BOTH?

There is no way, I'm afraid...

For PostgreSQL and MSSQL we already have different grammars, so I think we can exclude them from the discussion. This leaves us with:

  • Oracle (support some escape sequences like \n, but not \', as far as I remember)
  • MySQL (configurable)
  • DB2 (seems that doesn't support backslash escaping)
  • the standard (no backslash escaping)
  • something else?

So maybe it's worth removing backslash escaping from sql.

@joshgoebel
Copy link
Member

So maybe it's worth removing backslash escaping from sql.

I'm still not sure there is a perfect solution here... I worry that our "sql" is bundled by default and often serves as a stand-on for "ALL sqls" (which might have bearing here), meaning we shouldn't so quickly dismiss the fact that we have separate tsql and pgsql grammars...

But I also don't really have a strong opinion to not change it. No one really spoke up to defend it, so we can always change it and then see if people show up raising issues to revert it. :-)

@schallm Would you be willing to whip up a small PR for this change?

@joshgoebel joshgoebel added good first issue Should be easier for first time contributors language and removed on hold labels Oct 24, 2019
@joshgoebel joshgoebel changed the title (sql) single backslash inside a string literal (discuss validity) (sql) single backslash inside a string literal seems to not be valid Oct 24, 2019
@schallm
Copy link
Contributor Author

schallm commented Oct 24, 2019

I have not looked closely at what this would take, but I will take a look at it soon.

@joshgoebel
Copy link
Member

It shouldn't be too difficult. But you might have to look at both double and single quoted strings... I suppose your test case could be the issue actually brought up here...

@joshgoebel joshgoebel added this to the 9.15.12 milestone Oct 25, 2019
@joshgoebel joshgoebel added this to Others in Highlight.js Oct 25, 2019
@joshgoebel joshgoebel removed their assignment Oct 25, 2019
@joshgoebel joshgoebel moved this from Backlog to Has open PR in Highlight.js Dec 7, 2019
Highlight.js automation moved this from Has open PR to Done Dec 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Should be easier for first time contributors help welcome Could use help from community language
Projects
No open projects
Highlight.js
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants