Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stylus: Fix comment-like grammar(/**/ or //) in property url() and st… …ring display #2361

Merged
merged 5 commits into from May 6, 2020

Conversation

dev-itsheng
Copy link
Contributor

Yesterday I said that I find another bug of stylus, means the comment-like grammar in url() and string displays error.

image

I find that the token property-declaration should have higher priority to cover comment token and others, in fact, I think that property-declaration should match the whole line. Please consider the following example:

selector
    border /* border-width */ 1px /* border-style */ solid /* border-color */ #000

Before my commit, the second line will be devided into 7 parts, 3 comments and 4 other parts which have no token.

If add greedy: true in property-declaration, the second line will be completely matched. In order to match the comment in it, I add comment in inside property, which copied by the above code.

Similarily, I add greedy: true in url property, so it can match url(http://......) and url("http://......") instead of comment.

@dev-itsheng
Copy link
Contributor Author

By the way, I found that some regex can be written better, like:

'string': {
	pattern: /("|')(?:(?!\1)[^\\\r\n]|\\(?:\r\n|[\s\S]))*\1/,
	greedy: true
},

The "|' is equal to ["'], so I want to change it to (["'])......

another example:

inside['interpolation'] = {
	pattern: /\{[^\r\n}:]+\}/,
	// ...
}

The \ of \{ and \} is unnecessary, so I want to change it to /{[^\r\n}:]+}/.

But I'm worried that current writing has other special meanings, otherwise before I realized that, they could have been changed a long time ago.

@dev-itsheng
Copy link
Contributor Author

Yes, if we replace the comment in the inside object, it can influence every token which used inside, such as variable-declaration, statement, atrule-declaration and so on.

But your question, "Shouldn't this be part of inside (the variable) itself?", in my opinion, two kinds of comment could be different answer.

  • The block comment /* ... */ is a part of xxx-declaration, for example, display /* set the display types */ block, it is unnecessary for us to ignore the comment and concat two parts of declaration which devides by comment.
  • The line comment // ... couldn't be a part of xxx-declaration, because it can only appear in the last of line, can't affect the declaration.

But if we separate the line comment from declaration, it seems that the solution will be more and more difficult.

I try to replace the comment in the inside object and run test, the result tell me there is no error, but I think the quantity of test needs to be increased more to cover all kinds of situation.

Let me commit and push, then think it over. In fact, I have much thought about it and want to do something.

@RunDevelopment
Copy link
Member

The "|' is equal to ["'], so I want to change it to (["'])......

IMO, it really doesn't matter with what you go in this case. For 3 or more, yes please use a character class but for 2? For 2 characters it really doesn't matter (and a|b is one character less than [ab] but that's just nitpicking).

I also did a small benchmark and it seems like "|' is 10~15% faster than ["']? I have no idea and question my benchmark, so please don't take this seriously.

The \ of \{ and \} is unnecessary, so I want to change it to /{[^\r\n}:]+}/.

Yes, the \ is unnecessary but it's better to have it for two reasons: ambiguity und understandability. JS RegExp can parse a { as either a character of quantifier (e.g. {1}) depending on what follows. This makes these regexes a little harder to read because you have to mentally parse more of the regex to figure out what a { is.

Whenever I see a {, I think: "Special character, so it's a quantifier," but then it turns out, that it's not. It just makes regexes a little bit harder to read. But regexes are already kinda hard to read, so I think that we should do everything to make them readable and understandable.
There are also a few other characters, you don't have to escape but probably should. I.e. you don't have to escape ] outside of character classes or [ inside character classes.

@RunDevelopment
Copy link
Member

About comments:
I just notices that all other tokens that also use inside aren't greedy, so a /**/ comment will break them. Is that a problem?

@dev-itsheng
Copy link
Contributor Author

For other tokens, I havn't thought about and tested it. Because at the beginning, I only noticed the problem in title, just like the top image, and tried fo fix it.

I will consider all of them later, and design a solution.

@dev-itsheng
Copy link
Contributor Author

Another discuss about regex:

In your opinion, the \ of \{ or \} should be added, but I accidentally found another regex style not far away: https://github.com/PrismJS/prism/blob/master/components/prism-stylus.js#L31

Does this mean that all regex should be checked again?

@RunDevelopment
Copy link
Member

Does this mean that all regex should be checked again?

No. Having some unescaped {} isn't a huge problem. It's just not worth the effort to go through many patterns just to escape some characters. Too much effort.
I just try to point people in (what I think is) the right direction, so that when they later write some regexes, their regexes are just a bit better than before.

But again: It's not worth the effort to actively "fix" this in existing patterns.

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing and then I think we are ready to merge.

@@ -1,6 +1,13 @@
(function (Prism) {
var inside = {
'url': /url\((["']?).*?\1\)/i,
'comment': {
pattern: /(^|[^\\])(?:\/\*[\s\S]*?\*\/|\/\/.*)/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the comment pattern isn't always the first pattern, it should be greedy.

@dev-itsheng
Copy link
Contributor Author

I'm still thinking more cases about the relationship of comment and declaration in these days.

But about this, I'm testing more scenes, consider the follow test:

div::after
	content 'http://www.example.com' // comment
	background url(http://example.com) /* comment */
	background /* comment */ url("http://example.com")
	attribute '/* string */' /* comment */ '/* string */' /* 'comment in quote' */
        attribute '// string' // comment 'comment in quote'
        attribute 1px /* comment */ red // comment blue

Whether I add or not add greedy: true in 'comment' object, the results are same.

Could you give me a test which can describe the different of them?

@RunDevelopment
Copy link
Member

You're right, it won't change anything. My bad. I didn't notice that comment (from inside) is always preceded by a token that can only capture the start of a (sub)string. Making comment greedy won't do anything then.

@RunDevelopment RunDevelopment merged commit 0d65d6c into PrismJS:master May 6, 2020
@RunDevelopment
Copy link
Member

Thank you for contributing!

quentinvernot pushed a commit to TankerHQ/prismjs that referenced this pull request Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants