Skip to content

LALR grammar based Cypher parser using the grammar rules from the openCypher project.

License

Notifications You must be signed in to change notification settings

walter-weinmann/ocparse

Repository files navigation

ocparse - the openCypher parser written in Erlang

Travis (.org) Coveralls github GitHub GitHub release GitHub Release Date GitHub commits since latest release

ocparse is a production-ready openCypher parser written in pure Erlang. ocparse is closely aligned to the openCypher project and in future will be adapted on a regular basis as the openCypher project evolves. The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher. And, with the EBNF file the project provides the basis for the definition of the LALR grammar.

1. Usage

Example code:

MATCH (m:Movie)
WHERE m.title = 'The Matrix'
RETURN m

Parsing the example code:

1> {ok, {ParseTree, Tokens}} = ocparse:source_to_pt("MATCH (m:Movie) WHERE m.title = 'The Matrix' RETURN m").
{ok,
 {{cypher,
   {statement,
    {query,
     {regularQuery,
      {singleQuery,
       [{clause,
         {match,[],
          {pattern,
           [{patternPart,[],
             {anonymousPatternPart,{patternElement,{...},...}}}]},
          {where,
           {expression,
            {orExpression,{xorExpression,{andExpression,...},[]},[]}}}}},
        {clause,
         {return,[],
          {returnBody,
           {returnItems,[],[],[{returnItem,{...},...}]},
           [],[],[]}}}]},
      []}}},
   []},
  [{'MATCH',1},
   {'(',1},
   {'UNESCAPED_SYMBOLIC_NAME',1,"m"},
   {':',1},
   {'UNESCAPED_SYMBOLIC_NAME',5,"Movie"},
   {')',1},
   {'WHERE',1},
   {'UNESCAPED_SYMBOLIC_NAME',1,"m"},
   {'.',1},
   {'UNESCAPED_SYMBOLIC_NAME',5,"title"},
   {'=',1},
   {'STRING_LITERAL',1,"'The Matrix'"},
   {'RETURN',1},
   {'UNESCAPED_SYMBOLIC_NAME',1,"m"}]}}

Access the parse tree of the example code:

2> ParseTree.
{cypher,
 {statement,
  {query,
   {regularQuery,
    {singleQuery,
     [{clause,
       {match,[],
        {pattern,
         [{patternPart,[],
           {anonymousPatternPart,
            {patternElement,
             {nodePattern,{variable,...},{...},...},
             []}}}]},
        {where,
         {expression,
          {orExpression,
           {xorExpression,
            {andExpression,{notExpression,{...},...},[]},
            []},
           []}}}}},
      {clause,
       {return,[],
        {returnBody,
         {returnItems,[],[],
          [{returnItem,{expression,{orExpression,...}},[]}]},
         [],[],[]}}}]},
    []}}},
 []}

Access the token list of the example code:

3> Tokens.
[{'MATCH',1},
 {'(',1},
 {'UNESCAPED_SYMBOLIC_NAME',1,"m"},
 {':',1},
 {'UNESCAPED_SYMBOLIC_NAME',5,"Movie"},
 {')',1},
 {'WHERE',1},
 {'UNESCAPED_SYMBOLIC_NAME',1,"m"},
 {'.',1},
 {'UNESCAPED_SYMBOLIC_NAME',5,"title"},
 {'=',1},
 {'STRING_LITERAL',1,"'The Matrix'"},
 {'RETURN',1},
 {'UNESCAPED_SYMBOLIC_NAME',1,"m"}]

Compile the code from a parse tree:

4> ocparse:pt_to_source_td(ParseTree).
<<"match (m :Movie) where m .title = 'The Matrix' return m">>
5> ocparse:pt_to_source_bu(ParseTree).
<<"match (m :Movie) where m .title = 'The Matrix' return m">>

Complete parse tree:

The output of the parse tree in the Erlang shell is shortened (cause not known). The complete parse tree of the example code looks as follows:

{cypher,
 {statement,
  {query,
   {regularQuery,
    {singleQuery,
     [{clause,
       {match,[],
        {pattern,
         [{patternPart,[],
           {anonymousPatternPart,
            {patternElement,
             {nodePattern,
              {variable,{symbolicName,"m"}},
              {nodeLabels,
               [{nodeLabel,
                 {labelName,
                  {schemaName,{symbolicName,"Movie"}}}}]},
              []},
             []}}}]},
        {where,
         {expression,
          {orExpression,
           {xorExpression,
            {andExpression,
             {notExpression,
              {comparisonExpression,
               {addOrSubtractExpression,
                {multiplyDivideModuloExpression,
                 {powerOfExpression,
                  {unaryAddOrSubtractExpression,
                   {stringListNullOperatorExpression,
                    {propertyOrLabelsExpression,
                     {atom,{variable,{symbolicName,"m"}}},
                     [{propertyLookup,
                       {propertyKeyName,
                        {schemaName,{symbolicName,"title"}}}}]},
                    []},
                   []},
                  []},
                 []},
                []},
               [{partialComparisonExpression,
                 {addOrSubtractExpression,
                  {multiplyDivideModuloExpression,
                   {powerOfExpression,
                    {unaryAddOrSubtractExpression,
                     {stringListNullOperatorExpression,
                      {propertyOrLabelsExpression,
                       {atom,
                        {literal,{stringLiteral,"'The Matrix'"}}},
                       []},
                      []},
                     []},
                    []},
                   []},
                  []},
                 "="}]},
              []},
             []},
            []},
           []}}}}},
      {clause,
       {return,[],
        {returnBody,
         {returnItems,[],[],
          [{returnItem,
            {expression,
             {orExpression,
              {xorExpression,
               {andExpression,
                {notExpression,
                 {comparisonExpression,
                  {addOrSubtractExpression,
                   {multiplyDivideModuloExpression,
                    {powerOfExpression,
                     {unaryAddOrSubtractExpression,
                      {stringListNullOperatorExpression,
                       {propertyOrLabelsExpression,
                        {atom,{variable,{symbolicName,"m"}}},
                        []},
                       []},
                      []},
                     []},
                    []},
                   []},
                  []},
                 []},
                []},
               []},
              []}},
            []}]},
         [],[],[]}}}]},
    []}}},
 []}

2. Documentation

The documentation for ocparse is available here: Wiki.

3. Known issues of grammar support

Comment

The number of block comments (/* ... */) is limted to one per line.

Properties / Literal

The rule Properties has a higher precedence than the rule Literal.

SymbolicName

The following tokens may not be used as SymbolicName:

  ALL AND ANY AS ASC ASCENDING BY CONTAINS COUNT CREATE DECIMAL_INTEGER DELETE
  DESC DESCENDING DETACH DISTINCT ENDS ESCAPED_SYMBOLIC_NAME EXPONENT_DECIMAL_REAL
  EXTRACT FALSE FILTER HEX_INTEGER IN IS LIMIT MATCH MERGE NONE NOT NULL
  OCTAL_INTEGER ON OPTIONAL OR ORDER REGULAR_DECIMAL_REAL REMOVE RETURN SET
  SINGLE SKIP STARTS STRING_LITERAL TRUE UNESCAPED_SYMBOLIC_NAME UNION UNWIND
  WHERE WITH XOR

An exception is the use of the token COUNT as FunctionName.

Unicode

Unicode is not supported with Dash, LeftArrowHead, RightArrowHerad or UnescapedSymbolicName. Hence Dash is limited to the hyphen (-), LeftArrowHead is limited to '<' and RightArrowHead is limited to '>'.

4. Limitations of the test data generator

In the scripts test\gen_test.bat and test\gen_test_and_run.bat, the heap size has been changed to speed up test data generation. If necessary, you are welcome to make suitable adjustments for your purposes.

No test data is generated for the following rules:

FunctionInvocation

FunctionInvocation = FunctionName, [SP], '(', [SP], (D,I,S,T,I,N,C,T), ')' ;

MultiPartQuery

Instead of

MultiPartQuery = (ReadPart | (UpdatingStartClause, [SP], UpdatingPart)), With, [SP], { ReadPart, UpdatingPart, With, [SP] }, SinglePartQuery ;

it is only used

MultiPartQuery = (ReadPart | (UpdatingStartClause, [SP], UpdatingPart)), With, [SP], { ReadPart, With, [SP] }, SinglePartQuery ;

SchemaName

SchemaName = ... | ReservedWord ;

SymbolicName

SymbolicName = ... | (C,O,U,N,T) | (F,I,L,T,E,R) | (E,X,T,R,A,C,T) | (A,N,Y) | (N,O,N,E) | (S,I,N,G,L,E) ;

5. Acknowledgement

This project was inspired by the sqlparse project of the company K2 Informatics GmbH.