Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(handlebars) support for segment-literals and escaped mustaches #2184

Merged
merged 12 commits into from Oct 15, 2019

Conversation

nknapp
Copy link
Contributor

@nknapp nknapp commented Oct 12, 2019

This PR adds support for rather exotic expressions like

{{[property name containing spaces]}}
{{"property name containing spaces and [] square brackets"}}
{{'property name containing spaces and "" quotes'}}

{{helper [property name containing spaces]}}
{{property.[property name containing spaces]}}

and most importantly

{{"property containing a closing }} mustache"}} followed by xml {{and another mustache}}

I have set the relevance values such that no changes in auto-detection heuristics takes place.

Update: Added support for escaped mustaches as well:

\{{no-expression}} {{expression}}

\{{{no-expression}}} {{expression}}

\{{#no}} block \{{/no}} {{expression}}

\{{\{{no}}}} block \{{\{{/no}}}} {{expression}}

Update 2: looks like GitHub's syntax highlighter isn't supporting that.

@nknapp
Copy link
Contributor Author

nknapp commented Oct 12, 2019

The PR is based on the commits of #2182 so we should probably wait until that one is merged.

@joshgoebel
Copy link
Member

Lets not use functions if we don't need them. I think almost most of those cases can be normal variables. Only prefer the function when you need a unique object to make the parser happy.

For example no reason for EXPRESSION to be a function.

return {
className: 'name',
begin: /[a-zA-Z\.-]+/,
function handlebars(hljs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not change the intro, some of the existing build process is very picky about this... so just function(hljs) please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I like about functions, is that you can write them top-down in the order in which you would read the code.

function main() {
  if(isThis()) {
    doThat()
  }   
}

function isThis() {...}

function doThat() {...}

Mostly, functions are used the other way around,

function isThis() {...}

function doThat() {...}

function main() {
  if(isThis()) {
    doThat()
  }   
}

but I think it makes the code less readable, because when you start reading at the top, you get all the helper functions first and come to the actual calling code last.

But in that case, I actually started replace some functions by variables, and had all tests failing. I can try some more if you like.

@nknapp
Copy link
Contributor Author

nknapp commented Oct 12, 2019

To be honest, I'm not happy with changing this, because it really messes up the reading order.
I actually would move all functions to the bottom (behind the return) and even split more out of them.

Maybe I'll try what it does to the minified code.

Update: Yes, I believe it will increase the code size to have everything in a function.

@joshgoebel
Copy link
Member

joshgoebel commented Oct 12, 2019

To be honest, I'm not happy with changing this, because it really messes up the reading order.

To me it's a little bit of a consistency thing (with other grammars) and the functions do add a lot of boilerplate - but I'm not hard set myself, I could be easily persuaded. @egor-rogov Any strong opinions on this?

I do think it's perhaps a little overkill if you LITERALLY have just a variable and that then you should just deal with the order not being perfect. :-) This is why we need ES6 though then the new hoisting rules could help us without all this drama about functions. ;-) Plus then the => syntax makes it matter a whole lot less anyways. :-)

Are there any implications here about minification and what's possible or not (whether they are variables vs functions)? Ha, is that what you meant? Yeah, that'd be useful to know.

@nknapp
Copy link
Contributor Author

nknapp commented Oct 12, 2019

I just started writing a really long, but it is too long. Maybe I'll write a blog post about this.
The short version is: I try to apply principles from the book "Clean Code" by Robert C. Martin. But in this particular case it does not work because "Clean Code" is mostly about Java and byte size of the compiled program is not critical there.

So I agree with your request.

I would extract even more functions, but I what I have tried would result in 300 bytes more code.

In fact, if we find out why reusing the "starts" object is a problem, and fix it, then we should remove the other functions as well.

If you are interested: The minified code shows why that is is the case

hljs.registerLanguage("handlebars", function (e) {
  var t = {"builtin-name": "each in with if else unless bindattr action collection debugger log outlet template unbound view lookup"},
      a = {b: /("[^"]+"|'[^']+'|\[[^\]]+\]|\w+)/};

  function n() {
    return e.inherit(r(), {cN: "name"})
  }
...

The function, (), {, return, } of function n()... cannot be removed, whereas a={...} just requires a , before the a and a =. Extracting a variable is always smaller than extracting a function (in the resulting minified program)

@joshgoebel
Copy link
Member

I agree in general. I love Clean Code and clean code. Guess right now I think of the grammars more as data than code, though.

I also think we've been too paranoid in the past about size and gzip solves most of these problems. So I think mostly I was coming from a style and consistency standpoint when I made the original request to stick with variables. We'll see what Egor says. :-)

@joshgoebel joshgoebel added enhancement An enhancement or new feature language labels Oct 12, 2019
var BUILT_INS = {'builtin-name': 'each in with if else unless bindattr action collection debugger log outlet template unbound view lookup'};

var IDENTIFIER_PLAIN_OR_QUOTED = {
begin: /("[^"]+"|'[^']+'|\[[^\]]+\]|\w+)/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't back references be a better use here rather than repeating yourself 3-4 times?

Copy link
Contributor Author

@nknapp nknapp Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all, only the first two alternatives could be combined using back-referencens.

Non-greedy matching would make it better readable (and it must be .*, not .+, because '', "" and [] are also allowed (in theory):

 /".*?"|'.*?'|\[.*?\]|\w+/

But I don't think that this is intuitive:

/(['"]).*?\1|\[.*?\]|\w+/

I actually wanted to use:

  var IDENTIFIER_PLAIN_OR_QUOTED = {
    begin: /\w+/,
    variants: [
      {begin: /"/, end: /"/},
      {begin: /'/, end: /'/},
      {begin: /\[/, end: /]/}
    ]
  };

but that didn't work. The main begin-pattern /\w+/ seemed to be ignored in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-greedy is more readable.

Variants are alternatives to the main block, not "in addition to" or "after"... If you just removed begin would the variants not work?

@joshgoebel joshgoebel changed the title Expression variants (handlebars) Expression variants Oct 13, 2019
@joshgoebel joshgoebel added this to the 9.15.12 milestone Oct 13, 2019
@nknapp nknapp changed the title (handlebars) Expression variants (handlebars) support for segment-literals and escaped mustaches Oct 13, 2019

\{{#no}} block \{{/no}} {{expression}}

\{{\{{no}}}} block \{{\{{/no}}}} {{expression}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want to skip a comment like snippet? Won't those rules still match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You are right, that one is still missing. I'll add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A \ in front of the mustache is escaped by another \ in front of it.
This means that the expressions in

\\{{expression}}

\\\{{expression}}

\\\\{{expression}}

are all evaluated. I have added a commit that skips the escaping if another backslash appears in front of the backslash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean don’t the escape rules need to come before the comment rules to allow escaping a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a test for escaped comments and it passed without modification of the "handlebars.js" file.

I think, since the match of \{{ has a lower index than the matching {{, it will always be evaluated first.

Also, after writing my own review, I felt I had to extract the rules into there own variables with descriptive names. Sorry for all the changes that you didn't ask about.

I also added consistent escaping of { and } in the regular expressions of the language. I don't think that is really necessary and I thought about removing all backslashes at first. But since the regex are assembled into larger once, we cannot be sure. There ARE contexts where a { needs to be escaped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, since the match of {{ has a lower index than the matching {{, it will always be evaluated first.

Yeah, I thought I saw the comment regex coming FIRST, which is why i was curious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, after writing my own review, I felt I had to extract the rules into there own variables with descriptive names.

Do the names have meaning if you know Handlebars really well? I'm actually not sure it's a win (while normally I'd call it a win)... but to me I SEE the patterns and that means something to me, but I see the labels you've assigned them and that means nothing to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it were me I'd add comment examples beside them.

  STANDARD_BLOCK, // {{ ... }}

@joshgoebel
Copy link
Member

Sorry I might have just messed ya up :) You can just force push without my commit if it's no good.

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You all good now? This is looking pretty sharp.

@nknapp
Copy link
Contributor Author

nknapp commented Oct 15, 2019

I'm think its good like that

@nknapp
Copy link
Contributor Author

nknapp commented Oct 15, 2019

Thanks for all the review work you put in there.

@nknapp
Copy link
Contributor Author

nknapp commented Oct 15, 2019

Argh no. I noticed that I removed the yield keyword. It should be added again. Its not important for Handlebars but it was in there and if keyword cleanup is something we want to do, of should be a separate PR

- square brackets and quotes are used in Handlebars to escape
  identifiers, should they contain invalid characters. This
  commit makes sure that "abc}}abc" does not close the mustache-
  expression.

- keep values of "relevance" such that autodetection still keeps
  the same relevance values.
- called IDENTIFIER_PLAIN_OR_QUOTED now, because that is
  what it is.
@nknapp
Copy link
Contributor Author

nknapp commented Oct 15, 2019

I have reverted the keyword list and rebased on master. If the tests pass, from my point of view, you can merge.

@@ -6,63 +6,91 @@ Description: Matcher for Handlebars as well as EmberJS additions.
Website: https://handlebarsjs.com
Category: template
*/
function(hljs) {
function (hljs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of change just bit me today. :) Although a few of the other languages are also so different.

@joshgoebel
Copy link
Member

@nknapp Thanks for all the hard work!

@joshgoebel joshgoebel merged commit bdf87d2 into highlightjs:master Oct 15, 2019
@nknapp nknapp mentioned this pull request Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants