Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stylus: Add color, entity, unit, change number and operator in inside object, Fix relational tests. #2368

Merged
merged 8 commits into from May 7, 2020

Conversation

dev-itsheng
Copy link
Contributor

I noticed some definitions in CSS are not exist in Stylus, then add them which copied from css-extras.

In addition, some modifications have been made.

  1. about number definition

    I noticed number definition in previous Stylus was /\b\d+(?:\.\d+)?%?/, it includes %, but in CSS-extra was /(^|[^\w.-]|)-?\d*\.?\d+/. In fact, the CSS types include <percent>, <integer>, <number> and more in CSS Level. Strictly speaking, they should be defined separately, but for code highlighting, we can think of % as unit, it is also the css-extra language's behavior.

    Of course, we can also do above 'strictly speaking' later.

  2. about operator definition

    I noticed the . in operator could refer to .. or ... in 1...5 and 1..5, it always repeated 2 or 3 times. If we don't change it, it would separate 1..5 into 1, . and .5, it may not be the result we expected.

    In fact, in my opinion, the variable-declaration, property-declaration and atrule-declaration should use the different inside property, because they have their own grammer. If we separate them, it might avoid more possible bugs.

After I do these modification, I changed all of affected tests, such as red in color red, px in 5px. so I believe that the previous tests can check the correctness of added code.

};
// 123 -123 .123 -.123 12.3 -12.3
var number = {
pattern: /(^|[^\w.-]|)-?\d*\.?\d+/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the lookbehind supposed to be (^|[^\w.-])?

@dev-itsheng
Copy link
Contributor Author

the number property is also copied from css-extra, I try to understand it, but failed.

ebe363f

@RunDevelopment
Copy link
Member

The previous issue with number was that it also matched the digits in __php20__ which is an internal marker used by Prism's PHP language. I won't go into the details but matching the 20 there as a number will break PHP code highlighting.
So I said: "You are only a number if you are not preceded by a digit, letter, dot, or minus." That's the (^|[^\w.-]) lookbehind. It's basically a \b but with support for dot and minus.

It should also work here.
(I don't know much about Stylus but it's similar to CSS, so it's probably fine.)

@dev-itsheng
Copy link
Contributor Author

I have read that issue, but don't understand how <img style="width:<?php echo (80 / count($images)) ?>%"/> become <img style="width:___PHP0___%" /> and the number how to break PHP code highlighting.

@dev-itsheng
Copy link
Contributor Author

And in your opinion, how should I modify the code

@RunDevelopment
Copy link
Member

I have read that issue, but don't understand how <img style="width:<?php echo (80 / count($images)) ?>%"/> become <img style="width:___PHP0___%" /> and the number how to break PHP code highlighting.

PHP uses Markup templating. Markup templating (MT) will replace all occurrences of some regex in code with a placeholder. PHP gives MT this regex to replace all PHP blocks with a placeholder.
In the case of the issue you mentioned, the placeholder is __PHP0__.
So after all PHP code blocks have been replaced with placeholders, MT will highlight the modified code with Markup. MT will then go through the highlighted modified code and re-insert the PHP code blocks (the code block has been highlighted separately). To do this, MT will search for the placeholders, so if MT can't find a placeholder, the code block will not be re-inserted. (This bit of magic allows us to easily implement languages embedded into HTML.)
In the issue, MT couldn't find a placeholder, so the placeholder remained there and was presented to the user.

So, why couldn't MT find the placeholder __PHP0__?.
MT assumes that Markup doesn't break up the placeholder into different tokens. It's fine if the placeholder is not tokenized or if it is part of a token (e.g. [__PHP0__] (I'm using [] to show tokens)) but if a placeholder is tokenized into different tokens (e.g. [__PH][P0__]), then MT can't find it.
CSS tokenized the style attribute and CSS's number pattern tokenized the 0 of __PHP0__. So the placeholder __PHP0__ was broken up into [__PHP][0][__].

All I had to do to fix the issue was to prevent the number pattern from matching 0. So I did just that be disallowing letters (and some other characters) in front of numbers.

And in your opinion, how should I modify the code

Sorry, but what do you want to change?

@dev-itsheng
Copy link
Contributor Author

I have already understand, it means the CSS tokenize wrongly handle the placeholder which PHP tokenize set.

But if we define a special placeholder which all tokenizes can distinguish and ignore it,
there is no similar error will occur in the future. Rather than defining a normal string in the template could make another possible error occured.

Sorry, but what do you want to change?

Back to this PR, you raised two questions. The first is about entity property, it seems to have been solved. The other is regex (^|[^\w.-]) in number property, which is copied from css-extra to avoid the above problem, but in my opinion, processing placeholder in markup language is a little dangerous thing, we cannot guarantee one day the Stylus language is used in another markup language in future.

If all of the placeholders are both grammar like ___PHP(number)___, use (^|[^\w.-]) is not bad. In fact, if I remove (^|[^\w.-]) and lookbehind: true, all of the tests can also be passed.

@RunDevelopment
Copy link
Member

which is copied from css-extra to avoid the above problem

There is an extra | (last character).

processing placeholder in markup language is a little dangerous thing

It is. But it is a simple solution.
In practice, this works really well because almost all languages have the concept of a literal or identifier that usually look like this /(?!\d)\w+/. If a language break up these literals, they usually have an issue somewhere. (e.g. all programming language I know will parse foo123 as one token)
So basically, all languages already have sequences they ignore. MT just uses this and assumes that it works.

For example, the CSS language before the number fix would break any literal or constant. If there was a CSS constant foo123, number would have broken that constant.
(Sure, foo123 is strange but there are a lot of CSS constants.)

@RunDevelopment
Copy link
Member

In fact, if I remove (^|[^\w.-]) and lookbehind: true, all of the tests can also be passed.

That's true. Maybe I should add a test that just checks that a literal is always tokenized as one?
We assume this, so we might as well verify it.

@dev-itsheng
Copy link
Contributor Author

I didn't notice this extra | before.

But in your opinion, (^|[^\w.-]) can avoid to match constant like foo123. Although there is no CSS constant (in my memory) like its grammar, but developers can define their own value, such as font-family, animation-name(such as test_05, animation4 in MDN ), so I think it should be kept.

So I removed the | and added some relative tests.

@RunDevelopment RunDevelopment merged commit 2c10ef8 into PrismJS:master May 7, 2020
@RunDevelopment
Copy link
Member

Thank you for contributing!

quentinvernot pushed a commit to TankerHQ/prismjs that referenced this pull request Sep 11, 2020
This adds `color`, `unit`, and `entity` tokens. The `number` and `operator` patterns were also improved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants