Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(cpp) Improve C++ class template headers #2728

Closed
wants to merge 5 commits into from

Conversation

klmr
Copy link
Contributor

@klmr klmr commented Oct 3, 2020

This fixes #2716 and it fixes #21021. In order to parse class and function headers correctly, it is insufficient to just skip greedily (or, worse, non-greedily) over /<.*>/, since C++ template declarations as well as template instantiations can contain nested template parameter lists. As a further complication, they can even contain the operators <, >, << and >> (as part of non-type template parameters or their defaults (in declarations)).

This PR attempts to be a minimal change — a much more comprehensive refactor of the C and C++ language definitions is somewhat overdue (see #2173 and #2146). But this will take time, and in the meantime the improvement from this PR is probably “good enough”.

This PR is a WIP: there are some open issues that I’ve commented inside the changes.


1 The failing code from the second issue isn’t added as a test case because hljs still fails to highlight it as a function declaration, but that’s unrelated to templates and rather because hljs currently doesn’t recognise operator overload definitions at all.

Copy link
Contributor Author

@klmr klmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request for comment:

src/languages/c-like.js Show resolved Hide resolved
src/languages/c-like.js Outdated Show resolved Hide resolved
template <int a, int b, bool c = (a > b), class T> void f1();
template <int a, int b, bool c = (a >> b), class T> void f2();

template <int a, int b, bool c = (a < b), class T> void g1();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing, and I don’t know why: I thought my rules would handle the nested parenthesised expression — and this works for the corresponding test case above, i.e. for (a > b).

Copy link
Member

@joshgoebel joshgoebel Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd guess the < in a < b is starting a new TEMPLATE, messing everything up. I'm not sure I see anywhere in the code where you handle this possibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course! I actually don’t know how to handle this: I thought about adding ) to the end of the TEMPLATE_USE, mode; this works, but then the mode started by ( won’t terminate, even when I also set excludeEnd: true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the following to the TEMPLATE_USE mode seems to work:

    'on:begin': (matchData, response) => {
      const rest = matchData.input.slice(matchData.index + 1);
      let parens = 0;
      let brackets = 1; // start with `<`
      for (let i = 0; i < rest.length; i++) {
        const c = rest.charAt(i);
        switch (c) {
          case '(': parens++; break;
          case ')': parens--; break;
          case '<': brackets++; break;
          case '>': brackets--; break;
        }
        if (brackets == 0) return;
        if (parens == -1) {
          response.ignoreMatch();
          return;
        }
      }
    },

However, this feels like overkill (potentially also inefficient?); it also only works in cases where the comparison expression is nested inside parentheses, i.e. it works for this test case but not the next one.

If this approach is taken at all, we’d probably want to add some emergency stop characters (;, {) to the switch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is too much... and definitely has performance implications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best maybe we can do here is liberal use of endsWithParent to make sure we escape if we hit a {}, line end, etc... and try to return to normal matching...

template <int a, int b, bool c = (a < b), class T> void g1();
class C;

template <int a, int b, bool c = a < b, class T> void g1();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is also failing, and there’s probably no fix, since it’s context sensitive: a<b, class T> could be a valid template parameter, if a were a template template parameter (i.e. if we had template <typename> typename a instead of int a). Of course in that case we’d be missing an additional terminating >, but I can’t see how to make the parser aware of this since it comes (potentially much) later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do someone like this with a callback that looks ahead and does code analysis, but that's definitely crossing over into "parsing" territory I think so I don't think we'd want to do that.

contains: [
{
begin: /</, end: />/,
endsWithParent: true,
Copy link
Member

@joshgoebel joshgoebel Oct 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so you know this endsWithParent does not auto-recurse... it simply means that within the scope of the <...> rule (line 230/231) that it will attempt to match not only the contains rules on line 235, but ALSO the end rule on line 228. But it doesn't go any deeper than that unless you added endsWithParent at every level...

I'm not sure if this has implications for your runaway guard on line 227 or not...

Copy link
Contributor Author

@klmr klmr Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! So, if I understand you correctly, I could equivalently remove endsWithParent, and change the end rule to /[>;{]/ (i.e. the same as the parent end)? In this case I should probably do that instead, and add the same end to the other contained rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if the desire is that they end with the parent you'd probably want to just use that... that's what it does - pull the parent "end" down into the children... I was just pointing out that it only goes one layer deep... so if it got "stuck" in a child of a child then your "guard" would not save you... so that needs to be considered and might result in adding some more "false positive" edge case tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works now. I needed to add a nested {begin: />/} to prevent a > inside a (…) from terminating the template declaration mode prematurely. That said, the behaviour on my unit tests is actually the same as before, I need to think of some example which works now and would previously have failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, I think I’ve just kicked the can down the road now. I think this fundamentally doesn’t work because I would need to nest the parent mode inside the mode that handles (…). With my newest change, the the two first lines parse correctly, but the template declaration in the third line ends prematurely, and the “class title” starts at class T rather than at class C:

template <int A = (B<C>), class T> class C;
template <int A = B<(c > D<E>)>, class T> class C;
template <int A = B<(c > D<(e > f)>)>, class T> class C;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  var TEMPLATE_USE = {};
  Object.assign(TEMPLATE_USE, {
    // .. can now use TEMPLATE_USE recursively

All you need is a reference. :-) Though I'm not sure we do this anywhere else and not sure it's a pattern we should start using now though. ...

I'm guessing the nesting here could be unlimited but is there some practical limits for MOST C++ code as far as what you'd see in the real world?

@klmr klmr changed the title enh(cpp) Improve C++ class headers enh(cpp) Improve C++ class template headers Oct 4, 2020
@joshgoebel
Copy link
Member

Closing this in favor of #2752. Thanks for all the hard work on this though - it's been appreciated! Hopefully you've learned a bit about the grammars and parser that will help you if you continue to contribute. :-)

If we find more failure cases we can always re-review, but this was getting way to complicated.

@joshgoebel joshgoebel closed this Oct 15, 2020
@klmr
Copy link
Contributor Author

klmr commented Oct 15, 2020

Yes, I’m definitely in favour of closing this for now and re-submitting another PR later (in particular, after splitting the definitions for C and C++), I just need to find some time …

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants