Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Java) Can't match nested generic type identifier #2641

Closed
ezksd opened this issue Jul 29, 2020 · 10 comments · Fixed by #3074
Closed

(Java) Can't match nested generic type identifier #2641

ezksd opened this issue Jul 29, 2020 · 10 comments · Fixed by #3074
Labels
bug help welcome Could use help from community language

Comments

@ezksd
Copy link
Contributor

ezksd commented Jul 29, 2020

Describe the issue

Can't match nested generic type identifier such as: List<Atomic<Integer>>. This will lead to a method that can't be matched correctly.

Which language seems to have the issue?

Java

Are you using highlight or highlightAuto?
highlightAuto

Sample Code to Reproduce

static Optional<List<Token>> parseAll(String s) {
    var i = 0;
    var list = new LinkedList();
    while (i < s.length()) {
        var r = parse(s, i);
        if (r.isPresent()) {
            list.add(r.get().a());
            i = r.get().b();
        } else {
            return Optional.empty();
        }
    }
    return Optional.of(list);
}

https://jsfiddle.net/tx5jLs2a/8/

Expected behavior

The method should be recognized as a hljs-function, not a hljs-keyword.

Additional context

The code at the 3rd row of languages/java.js:

  var JAVA_IDENT_RE = '[\u00C0-\u02B8a-zA-Z_$][\u00C0-\u02B8a-zA-Z_$0-9]*';
  var GENERIC_IDENT_RE = JAVA_IDENT_RE + '(<' + JAVA_IDENT_RE + '(\\s*,\\s*' + JAVA_IDENT_RE + ')*>)?';

It doesn't handle nested angle bracket, and I don't know how to solve recursive match by regular expression of javascript either;

@ezksd ezksd added bug help welcome Could use help from community language labels Jul 29, 2020
@joshgoebel
Copy link
Member

You probably wouldn't you've solve it with a nested recursive rule instead using "self" in our grammars. But we'd need a lot more examples, are the types always so simple as simple strings or can you use , and [ and all sorts of other things in there as well?

@ezksd
Copy link
Contributor Author

ezksd commented Jul 30, 2020

You probably wouldn't you've solve it with a nested recursive rule instead using "self" in our grammars. But we'd need a lot more examples, are the types always so simple as simple strings or can you use , and [ and all sorts of other things in there as well?

Apart from annotations, wild card types, type parameter list, and exceptions list are also not taken care of, for example:

public static <A,B,C> Tuple<A,B,C> fun(Future<Tuple<A,B,C>> future) throws Exceptions {

}

The first <A, B, C> and throw Exceptions are optional, especially, the identifier in angle bracket maybe ? extends AnotherType or ? super AnotherType, that's all.
Annotations like @Runtime can appear almost everywhere, as we are not using a compiler front-end, maybe just ignore it will be more realistic.

@joshgoebel
Copy link
Member

Are these things only used in function definitions?

@ezksd
Copy link
Contributor Author

ezksd commented Aug 2, 2020

Are these things only used in function definitions?

These may appear in class definitions too, but the class definition can determine by keywords class and interface, so
you can just ignore it.

According to the java method name convention, Type name should start with a capital letter,and the return type of a method should be one of as follows:

  1. reference type: start from a capital letter
  2. primitive type: byte short int long char float double (keywords)
  3. void

In Addition, maybe we just need to distinguish the method from class and filed. Field definition should end with ;, function doesn't.

@joshgoebel
Copy link
Member

joshgoebel commented Sep 11, 2020

With regex we have to limit ourselves to a specifi level of nesting, but what about something like:

  var THREE_DEEP_NESTED_GT_LT = '<.*?(<.*?(<.*?>)?>)?>';
  var GENERIC_IDENT_RE = JAVA_IDENT_RE +
    regex.optional(
      regex.either(THREE_DEEP_NESTED_GT_LT,
      '<' + JAVA_IDENT_RE + '(\\s*,\\s*' + JAVA_IDENT_RE + ')*>'
      )
    );

This allows the function to again be detected... but the types get no highlighting... because none of those things are currently keywords in Java... I wonder if that is something we are lacking?
Screen Shot 2020-09-11 at 8 46 43 AM

I have no idea what the second type declaration there is for? Are we declaring the type of the exception or something? And what is an <A,B,C> without any template name?

@joshwand
Copy link

These can occur in simple variable declarations too:

Map<String, List<String>> stringListMap = new HashMap<String, List<String>();
// or 
Map<String, List<String>> stringListMap = new HashMap<>()

// and can involve generic types that will happily break your regex:
Map<K, List<T>> myThing;

@joshgoebel
Copy link
Member

// and can involve generic types that will happily break your regex:

Screen Shot 2021-01-25 at 10 11 12 AM

Looks like it works to me?

@joshwand
Copy link

👍

@RunDevelopment
Copy link

Regarding the <.*?(<.*?(<.*?>)?>)?> regex:

Firstly, each of the 3 .*? can reach each other by repeating <. This means that the regex will reject any string "<".repeat(n) in O(n^3) time.

Secondly, the regex isn't correct.
image

We use this pattern for Prism. Adopted to your approach, we get: <(?:[^\r\n<>]|<(?:[^\r\n<>]|<[^\r\n<>]*>)*>)*>

image

@joshgoebel
Copy link
Member

This means that the regex will reject any string "<".repeat(n) in O(n^3) time.

Yeah I assumed I'd run into that and need to reduce the potential overlap. Thanks for the pointing out the edge case with a list though. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug help welcome Could use help from community language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants