Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(ebnf) add backtick (grave symbol) as string character #2290

Merged
merged 4 commits into from Nov 29, 2019
Merged

(ebnf) add backtick (grave symbol) as string character #2290

merged 4 commits into from Nov 29, 2019

Conversation

vancluever
Copy link
Contributor

This adds backticks as an allowed character for terminal strings.

This is necessary for describing backslashes properly as using single or
double quotes seems to cause parsing issues with highlight.js. It also
has precedent in some EBNF-described specifications such as the Go
specification.

Copy link
Collaborator

@egor-rogov egor-rogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, although the wikipedia article on EBNF and some EBNF standards I've searched have no mention on backticks, Golang specification (https://golang.org/ref/spec) uses them.
PR looks good to me. @vancluever, please add a changelog entry.

@joshgoebel joshgoebel added language enhancement An enhancement or new feature labels Nov 21, 2019
@joshgoebel
Copy link
Member

Ping.

@vancluever
Copy link
Contributor Author

Hey @egor-rogov and @yyyc514, sorry for the delays on this one. Will add an entry shortly.

This adds backticks as an allowed character for terminal strings.

This is necessary for describing backslashes properly as using single or
double quotes seems to cause parsing issues with highlight.js. It also
has precedent in some EBNF-described specifications such as the Go
specification.
@vancluever
Copy link
Contributor Author

Changelog has been updated now!

@joshgoebel
Copy link
Member

This is necessary for describing backslashes properly as using single or
double quotes seems to cause parsing issues with highlight.js. It also
has precedent in some EBNF-described specifications such as the Go
specification.

I’m not sure I follow what you mean here?

@vancluever
Copy link
Contributor Author

I’m not sure I follow what you mean here?

This can be seen in the developer preview. Try doing a grammar that uses "\" or '\' and it should break parsing, even though there's no apparent reason why it should.

@joshgoebel
Copy link
Member

@vancluever Still not sure I follow. Can you make a reproducible example? You can fork my jsfiddle template:

https://jsfiddle.net/ajoshguy/a1kntyg4/

Example worth 1000 words. :)

@vancluever
Copy link
Contributor Author

@yyyc514 https://jsfiddle.net/ksv9Lqcz/ should give a good idea of what's going on!

@joshgoebel
Copy link
Member

joshgoebel commented Nov 25, 2019

Yes, indeed. EBNF is just reusing hljs.APOS_STRING_MODE, hljs.QUOTE_STRING_MODE which have \[char] as an escape sequence, so in your example the string isn't terminated since the quotes are escaped.

Does EBNF not have such escape sequences in strings? Or are they only valid in some types of strings but not others?

@vancluever
Copy link
Contributor Author

Can't comment for sure, but I don't know if there's any accepted standard, to be honest. But I have never really been able to find any sort of escaping standard, and I don't really know if a language for a context-free grammar should even have them.

Our Sentinel specification follows the example of the Go spec in the use of backticks, however, so we will need them regardless of whether or not we fix the escaping.

@joshgoebel
Copy link
Member

Escaping is mentioned here: https://www.w3.org/Notation.html and specifically:

If a double-quote character is used inside a string, it must appear as the sequence " (backslash followed by double quote).

That's regarding BNF, but still...

@joshgoebel
Copy link
Member

joshgoebel commented Nov 25, 2019

From Google's docs:

escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) 

I think the backtick is specifically for avoiding the need to double escape \ as \\. They seem to only consistently use it to enclose the backslash and prefer "" and '' in other places.

Another example where it looks like it's used for escaping:

Engelberg/instaparse#114

So I'm thinking our existing behavior with normal strings is correct and NOT an issue.

@egor-rogov Any insight to add here?

@joshgoebel
Copy link
Member

Holding this a bit longer since I'll adjust the commit text based on our conclusions here.

@egor-rogov
Copy link
Collaborator

I think the backtick is specifically for avoiding the need to double escape \ as \. They seem to only consistently use it to enclose the backslash and prefer "" and '' in other places.

I think so too.
But I see no harm in adding backticks as rightful string terminators. I think it's better than allow only backtick-backslash-backtick sequences.

@joshgoebel
Copy link
Member

joshgoebel commented Nov 29, 2019

But I see no harm in adding backticks as rightful string terminators.

Nor do I, but the original commit message make it sound like Highlight.js is BROKEN with regard to our standard string handling here. So I'm just trying to clear up if we still have an actual problem or if that original message was in error so that I can write a proper one when I merge this. :)

@joshgoebel joshgoebel merged commit eebe0cb into highlightjs:master Nov 29, 2019
@vancluever
Copy link
Contributor Author

Thanks @yyyc514 and @egor-rogov!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants