Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete EBNF Grammar #534

Open
jacobfriedman opened this issue Jul 8, 2022 · 9 comments
Open

Incomplete EBNF Grammar #534

jacobfriedman opened this issue Jul 8, 2022 · 9 comments

Comments

@jacobfriedman
Copy link

whitespace = SPACE
           | TAB
           | LF
           | VT
           | FF
           | CR
           | FS
           | GS
           | RS
           | US

Unfortunately, trying to parse these raises the question, 'Is there something I should know about"?
Please include these references (VT, FF, CR, FS, etc.) in the grammar. Otherwise I can't parse 'SPACE'... or 'TAB'.
Granted, these are easy additions on my end but it just doesn't work out-of-the-box.

Even if these were provided at this point, given the issue, the question is still raised unless I try to run the parser again... nothing should be implicit in an EBNF grammar file.

Thank you for the great work!

@jacobfriedman
Copy link
Author

IdentifierStart = ID_Start / Pc
IdentifierPart = ID_Continue / Sc

This is also something we should know about.

@jacobfriedman
Copy link
Author

jacobfriedman commented Jul 8, 2022


(* Based on the unicode identifier and pattern syntax
 *   (http://www.unicode.org/reports/tr31/)
 * And extended with a few characters.
 *)IdentifierStart = ID_Start
                | Sc
                | '_'
                | '‿'
                | '�'
                | '�'
                | '︳'
                | '︴'
                | '�'
                | '﹎'
                | '�'
                | '_'
                ;
(* Based on the unicode identifier and pattern syntax
 *   (http://www.unicode.org/reports/tr31/)
 * And extended with a few characters.
 *)IdentifierPart = ID_Continue
               | Sc
               ;
(* Any character except "`", enclosed within `backticks`. Backticks are escaped with double backticks. *)EscapedSymbolicName = { '`', { ANY - ('`') }, '`' }- ;

from https://github.com/paul-english/nom-ebnf/blob/83e6b84c300c653b5aa315152bbfd2a44d9a671b/src/cypher.ebnf

@jacobfriedman
Copy link
Author

jacobfriedman commented Jul 8, 2022

Also mentioned in #331

@jacobfriedman
Copy link
Author

Had to add additional rules outside this parser. Would have been nice to source those inside but hey, who wants to include the whole unicode standard :(

Watch out for EOI.

@nadja-muller
Copy link

Hi.
That is correct. If you are using a Java compatible language, have a look at this class here:
https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101
It includes the codepoint definitions of character sets used in the grammar.

@nadja-muller nadja-muller self-assigned this Jul 21, 2022
@jacobfriedman
Copy link
Author

jacobfriedman commented Jul 21, 2022 via email

@vincent-karuri
Copy link

Hi @jacobfriedman, do you have a (code) example of what you had to do to fix this? It's not completely apparent to me.

@vincent-karuri
Copy link

Hi. That is correct. If you are using a Java compatible language, have a look at this class here: https://github.com/opencypher/openCypher/blob/master/tools/grammar/src/main/java/org/opencypher/grammar/CharacterSet.java#L101 It includes the codepoint definitions of character sets used in the grammar.

Also, @nadja-muller, could you clarify what the suggestion is for Java? Are we supposed to leverage (invoke) this class directly or are we using the class as a reference to look up the definitions of ID_Start and ID_Continue so as to hardcode them into the ebnf?

@jacobfriedman
Copy link
Author

jacobfriedman commented Mar 14, 2024 via email

@nadja-muller nadja-muller removed their assignment Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants