Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ANTLR4 grammar - JavaScript - Issues with quotes and comments #392

Open
kenwebb opened this issue Jan 8, 2020 · 4 comments
Open

ANTLR4 grammar - JavaScript - Issues with quotes and comments #392

kenwebb opened this issue Jan 8, 2020 · 4 comments

Comments

@kenwebb
Copy link

kenwebb commented Jan 8, 2020

Hi,

I have downloaded the ANTLR4 grammar from:

https://www.opencypher.org/resources
https://s3.amazonaws.com/artifacts.opencypher.org/M14/Cypher.g4

I am able to process Cypher.g4 (JavaScript):

java -jar antlr-4.7-complete.jar -Dlanguage=JavaScript Cypher.g4

Antlr correctly generates five files:

 CypherLexer.js
 CypherLexer.tokens
 CypherListener.js
 CypherParser.js
 Cypher.tokens

My example web page (.html) includes the following JavaScript code:

  const antlr4 = require('xholon/lib/antlr4/index');
  const CypherLexer = require('xholon/lib/antlr4g/CypherLexer');
  const CypherParser = require('xholon/lib/antlr4g/CypherParser');
  const CypherListener = require('xholon/lib/antlr4g/CypherListener');
  const input = 'CREATE (fgh {ijk: 123.4})'; // OK
  const chars = new antlr4.InputStream(input);
  const lexer = new CypherLexer.CypherLexer(chars);
  const tokens  = new antlr4.CommonTokenStream(lexer);
  const parser = new CypherParser.CypherParser(tokens);
  const tree = parser.oC_Cypher();
  updateTree(tree); // this is where I run my own code

In this simple example CREATE (fgh {ijk: 123.4}), I then process the Cypher tree and get the results that I expect (it creates a new node in my application).

BUT, it fails to work with any Cypher statement that contains single quotes (ex: 'abc'), double quotes (ex: "def"), or comments (ex: // this is a comment). For example:

const input = "CREATE (fgh {ijk: 'This is some text.'})"; // error

or

const input = 'CREATE (fgh {ijk: "This is some text."})'; // error

ErrorListener.js (part of the ANTLR4 distribution) reports (in the browser console window):

line 1:18 token recognition error at: ''Th'
line 1:37 token recognition error at: ''}'
line 1:21 no viable alternative at input 'CREATE (fgh {ijk: is'

I have developed a temporary work-around by replacing content in Cypher.g4 with content from DOT.g4 (Graphviz dot language), which is included with the ANTLR4 distribution.
This lets me handle comments and double-quotes in openCypher, and allows me to continue exploring whether or not I will be able to use openCypher.

I hope this description of the issues I have found will be helpful,
Ken Webb

@Mats-SX
Copy link
Member

Mats-SX commented Jan 10, 2020

Hello Ken and thanks for your report.

This is very odd. In our tests we are only using the Java target for testing the generated parser. It is able to parse your example queries just fine -- I wonder if there is some issue with JavaScript that we are not taking into account.

Which content edits did you perform in order to address the issue? I am not very familiar with JavaScript myself, so I don't know by heart if ', " or other characters require some specific treatment to be handled correctly (given that my hypothesis on JavaScript is correct).

All the best
Mats

@kenwebb
Copy link
Author

kenwebb commented Jan 11, 2020

Hi Mats,

Thanks for your reply. I've uploaded a copy of my modified ANTLR grammar to:
Cypher.g4
If you look at the History, you can see how my version differs from the official openCypher file.

The main points are:

  1. Instead of the original StringLiteral and EscapedChar, I have:
StringLiteral : '"' ( '\\"' | . )*? '"' ;
  1. Instead of the original Comment, I have:
COMMENT
   : '/*' .*? '*/' -> skip
   ;

LINE_COMMENT
   : '//' .*? '\r'? '\n' -> skip
   ;
  1. I don't handle single quotes yet, because I haven't had time to do it.

I can't really speculate on exactly why my version works. My knowledge of ANTLR is limited. I decided to try substituting content from the grammar for DOT because I know that that grammar worked for me.

I program in Java and JavaScript, often both at the same time. I can't think off-hand of any specific difference that might be relevant here. Both languages use the same single line and multi-line comment characters. Java strings are delimited by double quotes, while JavaScript allows matching single or double quotes. Cypher looks pretty much the same as JavaScript in terms of comments and String quotes.

Ken

@Mats-SX
Copy link
Member

Mats-SX commented Apr 3, 2020

Hello @kenwebb and thanks for reaching back.

I will leave this topic here for now, but this is a useful point to pick up from when we next plan work on the openCypher grammar.

@jacobfriedman
Copy link

jacobfriedman commented Jul 14, 2022

I'm also wondering what that problem was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants