Shift right mistakenly parsed as type argument list #176

Sjord · 2021-04-06T16:09:13Z

var R = A < X >> B;

Actual:

Interpreted as A<X> > B

      (equals_value_clause [0, 6] - [0, 18]
        (binary_expression [0, 8] - [0, 18]
          (generic_name [0, 8] - [0, 15]
            (identifier [0, 8] - [0, 9])
            (type_argument_list [0, 10] - [0, 15]
              (identifier [0, 12] - [0, 13])))
          (identifier [0, 17] - [0, 18]))))))))

Expected:

Interpreted as A < (X >> B)

      (equals_value_clause [0, 6] - [0, 18]
        (binary_expression [0, 8] - [0, 18]
          (identifier [0, 8] - [0, 9])
          (binary_expression [0, 12] - [0, 18]
            (identifier [0, 12] - [0, 13])
            (identifier [0, 17] - [0, 18])))))))))

The text was updated successfully, but these errors were encountered:

initram · 2021-06-03T13:31:09Z

expression currently contain _simple_name, but I don't think generic_name is allowed as an expression. e.g. A<X> is never a valid expression by it self.

I have tried to replace _simple_name with _identifier_or_global, but there seems to be a lot of work to get the precedence correct for this to work in all cases. I got the test suite to run and also the example above to parse correctly, but the number of parse errors from the example repos role dramatically

An example of code that parses now, but that I could not get to parse with my changes were:

class SomeClass {
  public static Task<T?> Null<T>() where T : class => Default<T>();
}

Here it tries to parse Default<T>() as a binary expression along the lines of Default < T>().

tamasvajk · 2022-12-30T15:43:22Z

I'm struggling to understand what exactly is happening here. I understand that with the current grammar, expression A<X>>B can be parsed both as (A<X>) > B and A < (X >> B). The latter should be the correct one based on Roslyn.

Doing a tree-sitter parse -D shows the below:

Both versions have error_cost = 0. Why are these two cases condensed into a single one? Why is ErrorComparisonNone handled the same way as ErrorComparisonPreferLeft in parser.c? This is the comment a couple of lines above

// Examine each pair of stack versions, removing any versions that
// are clearly worse than another version. Ensure that the versions
// are ordered from most promising to least promising.

As far as I see the two stack versions are equally good. (I tried parsing the same input with a locally built tree-sitter, where the case ErrorComparisonNone was removed, and then I got select_earlier symbol:compilation_unit, over_symbol:compilation_unit, which (I think) is matching what one would get for equal dynamic precedence). I think this would be the correct tree-sitter behaviour, but my experience is quite limited.

Putting aside the above that I don't understand why certain things are happening in tree-sitter. How would one solve this conflicting parsing issue?

I see the following options:

The solution outlined above: Replace _simple_name with _identifier_or_global in expression, which seems to require some more changes to the grammar, because Default<T> is not going to be an expression any longer, so Dynamic<T>() doesn't parse as invocation_expression either.
Use prec.dynamic to prefer one case over the other. So binary_expression should have higher dynamic precedence than generic_name. This seems rather arbitrary, and it interferes with Generic type in declaration parsed as binary expression #243, which contains 2 binary expressions, so variable_declaration would need >2 dynamic precedence. This doesn't seem to scale too well for future precedence issues.

What are other options? None of the above seem viable/straightforward.

damieng · 2023-01-03T09:59:09Z

I wish I had more information or an answer for you but my experience in dealing with these awkward conflicts are also limited and there doesn't seem to be much else in the way of documentation or support available (Max has been quiet for quite some time).

I'd be okay going with either solution for a while to see how it pans out. We could always switch later if we needed to - maybe leave some comments as to why things are the way they are in the meantime.

maxbrunsfeld · 2023-01-03T22:00:04Z

I'm struggling to understand what exactly is happening here.

I believe right now, there are two valid parses, and they both have the same dynamic precedence (zero, the default), so Tree-sitter is having to just choose one semi-arbitrarily.

I agree with @initram - it seems like the grammar should be restructured so that generic_name is only allowed in certain positions (like method calls), not as an expression on its own.

initram · 2023-01-06T14:59:48Z

Looking at this again, it is not so straight forward. It seems that in some rare cases generic_name is a valid expression. Take for example this valid C# code:

var R = nameof(List<int>);

That is parsed as an invocation expression with a generic name as the argument expression by Roslyn.

It is really hard to find a good rule or precedence that solves all these issues with when it should be parsed as a generic name or not. One idea that I have is that generic names maybe should only be allowed as the top level expression, and not as part of a composite expression, but that may easier said than done.

We should add a bunch more test cases to cover some of all these use-cases of generic name and less-than and greather-than symbols. That way it is easier to experiment with new approaches and not have to rely on whether the "example repos" parse as well.

maxbrunsfeld · 2023-01-06T17:30:58Z

It seems that in some rare cases generic_name is a valid expression.

I see. One way to get this more correct might be to put a negative dynamic precedence on generic_name, so that when possible, we'll prefer not to use it, but then to compensate for this with a larger positive dynamic precedence on constructs where generic_name is normally used (like method calls, etc). This way, the order of preference in the event of a true ambiguity would be:

generic method calls etc
shift expressions
lastly, generic names on their own

tamasvajk · 2023-01-12T15:33:25Z

@initram I experimented with your approach a bit further. So changed _simple_name to _identifier_or_global in _lvalue_expression (previously _expression). As you found that was not enough, and generic method calls were not working any longer. So I started adjusting invocation_expression:

invocation_expression: $ => prec(PREC.INVOCATION, seq(
  field('function', $._invocation_function),
  field('arguments', $.argument_list)
)),

_invocation_function: $ => choice(
  $.this_expression,
  $.base_expression,
  $._name,
  $.postfix_unary_expression, // name!(), this could be more specific than postfix_unary_expression
  $.member_access_expression,
  $.conditional_access_expression,
  $.element_access_expression,
  $.invocation_expression,
  $.parenthesized_expression
),

This got me quite far. (There are 177 failing files in Roslyn, which is in line with my earlier checks, so there might not be any new failures.) The above list is somewhat arbitrary, I think all expression types that are in the Primary row of the precedence table would be needed. Let me know if you think this is a wrong assumption.

At the same time var a = b < c >> d; parses as below, which seems correct.

(compilation_unit [0, 0] - [0, 19]
  (global_statement [0, 0] - [0, 19]
    (local_declaration_statement [0, 0] - [0, 19]
      (variable_declaration [0, 0] - [0, 18]
        type: (implicit_type [0, 0] - [0, 3])
        (variable_declarator [0, 4] - [0, 18]
          (identifier [0, 4] - [0, 5])
          (equals_value_clause [0, 6] - [0, 18]
            (binary_expression [0, 8] - [0, 18]
              left: (identifier [0, 8] - [0, 9])
              right: (binary_expression [0, 12] - [0, 18]
                left: (identifier [0, 12] - [0, 13])
                right: (identifier [0, 17] - [0, 18])))))))))

Regarding

It seems that in some rare cases generic_name is a valid expression.

Do you know if this is specific to nameof()? If it is, we could special case it.

See the draft PR #285.

initram · 2023-01-13T09:49:57Z

@tamasvajk It seems promising. But why did you need to limit the expressions available for _invocation_function, why could it not just be (generic_name or _expression)? If it is needed to limit the expressions that are available to use as _invocation_function, then we need to make sure that there is no legal expression from _expression that is missing in _invocation_function. At least you are missing default expression, as this is valid C#: default(Func<int>)()

The only place that I can think of where generic_name is a legal expression is in nameof(). I could be wrong, and there are other cases where Roslyn accepts generic name as expressions for parsing purposes, but the code cannot compile for semantic reasons.

If you could add a few more tests to cover some of the invocation expressions that are not covered currently, and at the very least the shift right expression from this issue. That would help making sure that we do not regress in parsing in the future.

tamasvajk · 2023-01-13T12:51:01Z

But why did you need to limit the expressions available for _invocation_function

My initial goal was to limit it as much as possible in the hope that this way there wouldn't be too many new conflicts. The list grew a bit from what I initially added. So for grammar maintainability, your suggestion might be a lot better. I gave it a quick try

_invocation_function: $ => choice(
  $._expression,
  $.generic_name
),

which resulted in a couple more conflicts:

[$._invocation_function, $._simple_name],
[$._invocation_function, $.ref_expression],
[$._invocation_function, $.throw_expression],
[$._invocation_function, $.select_clause],
[$._invocation_function, $.assignment_expression],
[$._invocation_function, $.relational_pattern],
[$._invocation_function, $.group_clause],

and even the tests don't pass this way. For example var x = !Call(); is parsed as

(compilation_unit [0, 0] - [0, 16]
  (global_statement [0, 0] - [0, 16]
    (local_declaration_statement [0, 0] - [0, 16]
      (variable_declaration [0, 0] - [0, 15]
        type: (implicit_type [0, 0] - [0, 3])
        (variable_declarator [0, 4] - [0, 15]
          (identifier [0, 4] - [0, 5])
          (equals_value_clause [0, 6] - [0, 15]
            (invocation_expression [0, 8] - [0, 15]
              function: (prefix_unary_expression [0, 8] - [0, 13]
                (identifier [0, 9] - [0, 13]))
              arguments: (argument_list [0, 13] - [0, 15]))))))))

which I find rather odd. I don't understand why precedences are not working as I would expect. But this also suggests me that _invocation_function should be based on precedence as mentioned above. (The default() case that you mention above would also be covered as default is in the "Primary" precedence row.)

At the same time, default would also need to be special cased like nameof, because it can have a type as an argument. Similarly sizeof(Struct<int>) works too. I find it difficult to come up with a definitive list of such constructs.

tamasvajk · 2023-01-19T11:26:19Z

It seems that in some rare cases generic_name is a valid expression.

I realized that this is not specific to nameof, default and sizeof. generic_name can show up in some other places too as an expression. For example:

void M<T>(){}
var x = M<int>;

I'm closing the draft PR #285.

damieng added the bug label Apr 7, 2021

tamasvajk mentioned this issue Jan 11, 2023

Allow contextual keywords to be used as identifiers #276

Open

43 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift right mistakenly parsed as type argument list #176

Shift right mistakenly parsed as type argument list #176

Sjord commented Apr 6, 2021

initram commented Jun 3, 2021

tamasvajk commented Dec 30, 2022

damieng commented Jan 3, 2023

maxbrunsfeld commented Jan 3, 2023

initram commented Jan 6, 2023

maxbrunsfeld commented Jan 6, 2023

tamasvajk commented Jan 12, 2023 •

edited

initram commented Jan 13, 2023

tamasvajk commented Jan 13, 2023

tamasvajk commented Jan 19, 2023

Shift right mistakenly parsed as type argument list #176

Shift right mistakenly parsed as type argument list #176

Comments

Sjord commented Apr 6, 2021

initram commented Jun 3, 2021

tamasvajk commented Dec 30, 2022

damieng commented Jan 3, 2023

maxbrunsfeld commented Jan 3, 2023

initram commented Jan 6, 2023

maxbrunsfeld commented Jan 6, 2023

tamasvajk commented Jan 12, 2023 • edited

initram commented Jan 13, 2023

tamasvajk commented Jan 13, 2023

tamasvajk commented Jan 19, 2023

tamasvajk commented Jan 12, 2023 •

edited