Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(swift) Revamped keywords and added support for operators #2908

Merged
merged 20 commits into from Dec 15, 2020
Merged

(swift) Revamped keywords and added support for operators #2908

merged 20 commits into from Dec 15, 2020

Conversation

svanimpe
Copy link
Contributor

@svanimpe svanimpe commented Dec 4, 2020

I'm working on some significant improvements to the grammar, and this felt like something I could submit already.

  • I've redone the keywords as that list was very outdated. The new list clearly separates the actual (reserved) keywords from contextual keywords (that are keywords only in specific contexts). In the future, I hope to move some of these contextual keywords into specific modes, which will avoid false matches.
  • I've also fixed (Swift) left and right keywords and precedencegroup highligthing #2856 by leaving out some rarely used keywords that will result in more false than actual matches. These keywords relate to precedence groups and can be highlighted in a future PRECEDENCEGROUP mode.
  • I've implemented the grammar for operators so they can be highlighted. This will also enable me to identify operator function declarations in the future.
  • Note that the grammar for operators isn't 100% correct as I don't know how to match Unicode characters above \uffff.

@svanimpe
Copy link
Contributor Author

svanimpe commented Dec 5, 2020

@joshgoebel I also have a new grammar for attributes ready (so we can get rid of the hardcoded list), but that depends on this PR.

@svanimpe
Copy link
Contributor Author

svanimpe commented Dec 5, 2020

Hopefully, this will also allow me to fix #2895 eventually.

@svanimpe
Copy link
Contributor Author

svanimpe commented Dec 5, 2020

I did detect an issue here: some keywords, such as #available can be followed by parentheses with arguments, in which case the keyword matcher seems to add the arguments and doesn't match the keyword itself. For example

The text #available matches the keyword #available
The text #available(test) looks for a keyword #available(test) and doesn't match #available.

So I have more work to do here.

@joshgoebel How do you recommend I handle keywords with parentheses? The old grammar didn't match them and just happened to highlight keywords such as private(set) correctly as two separate keywords, because set is also listed as a contextual keyword (although that's referring to another context). However, a keyword like unowned(unsafe) wasn't matched, and also will not match as two keywords, because unsafe is not a keyword.

@joshgoebel
Copy link
Member

The text #available(test) looks for a keyword #available(test) and doesn't match #available.

It would only do that if () are part of keywords.$pattern... and if so that is the correct behavior... $pattern is used to scan the current buffer and if there are any matches they are checked against the list of verbatim keywords.

You probably do NOT want () included in $pattern. More "complex" keywords like unowned(unsafe) might better be handled with their own mode.

src/languages/swift.js Outdated Show resolved Hide resolved
Comment on lines 115 to 144
const operatorHead = either(
/[/=\-+!*%<>&|^~?]/,
/[\u00A1–\u00A7]/,
/[\u00A9\u00AB]/,
/[\u00AC\u00AE]/,
/[\u00B0\u00B1]/,
/[\u00B6\u00BB\u00BF\u00D7\u00F7]/,
/[\u2016–\u2017]/,
/[\u2020–\u2027]/,
/[\u2030–\u203E]/,
/[\u2041–\u2053]/,
/[\u2055–\u205E]/,
/[\u2190–\u23FF]/,
/[\u2500–\u2775]/,
/[\u2794–\u2BFF]/,
/[\u2E00–\u2E7F]/,
/[\u3001–\u3003]/,
/[\u3008–\u3020]/,
/[\u3030]/
);
const operatorCharacter = either(
operatorHead,
/[\u0300–\u036F]/,
/[\u1DC0–\u1DFF]/,
/[\u20D0–\u20FF]/,
/[\uFE00–\uFE0F]/,
/[\uFE20–\uFE2F]/,
// TODO: The following characters are also allowed, but the regex doesn't work as intended.
// For example, it also matches -10 as an operator.
// /[\u{E0100}–\u{E01EF}]/u
Copy link
Member

@joshgoebel joshgoebel Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. From what I'm reading this is the full range of what MIGHT be used for defining a custom operator... just because something could potentially be an operator doesn't mean it most assuredly IS an operator and should be highlighted everywhere as an operator... correct?

This "wideness" might make sense in the context of something like func [x] (a,b) where [x] is an operator being defined (and we know that by context), but I don't think it makes sense in general. Pretty sure in the standard case we should only highlight well-known built-in operators that Swift supports, like >, <, etc...

Or do those UTF-8 ranges have NO other uses in Swift at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these characters are used elsewhere in the grammar, such as <GenericParam>, or OptionalType?, but other than that, they're always operators.

I assume the set of "head" character for operators and other identifiers is disjunct. For example, if you look at the range A1-FF, the ones that are missing from the list here are listed explicitly as valid head characters for regular (non-operator) identifiers:

\u00A8\u00AA\u00AD\u00AF\u00B2–\u00B5\u00B7–\u00BA
\u00BC–\u00BE\u00C0–\u00D6\u00D8–\u00F6\u00F8–\u00FF

So I think this should work just fine, as long as the cases where operators are used as punctuation are handled by specific modes. In the case of generics and optional types, that's already the case.

src/languages/swift.js Outdated Show resolved Hide resolved
src/languages/swift.js Outdated Show resolved Hide resolved
src/languages/swift.js Outdated Show resolved Hide resolved
// /[\u{E0100}–\u{E01EF}]/u
);
const OPERATOR = {
className: 'operator',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these operators or keywords?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this question about as? as! try? try!?

If so, the grammar specifies them as operators and keywords at the same time, so I'm highlighting them as keywords, which is what other grammars do too, although most don't bother highlighting the question or exlamation mark. I preferred to include that in the match, otherwise it would be matched as a postfix operator.

Copy link
Member

@joshgoebel joshgoebel Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are keywords then you should remove "operator", it's not serving any purpose. We can't have them be both (not desirable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add them to list of "special" keywords (like the ones that have parentheses) instead, so I don't have to change the keywords $pattern to allow question marks or exclamation points.

My goal here was to group these keywords under the "operator" umbrella as well, in case a future grammar says "any operator is allowed here", which would then also include these keywords.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal here was to group these keywords under the "operator" umbrella as well, in case a future grammar says "any operator is allowed here", which would then also include these keywords.

I think you may be over thinking that. :)

@svanimpe
Copy link
Contributor Author

svanimpe commented Dec 5, 2020

You probably do NOT want () included in $pattern. More "complex" keywords like unowned(unsafe) might better be handled with their own mode.

I was thinking the same thing. And also to match the # keywords with the same rules as @ attributes, as they have the same grammar (w.r.t. the arguments that can follow). For example:

#available(iOS 14, *)
@available(iOS 14, *)

The first is a keyword, whereas the second is an attribute 🤷‍♂️

I can just assign them a different class (keyword instead of meta). That would reduce the list of keywords to only the basic, simple keywords.

Or should I use meta-keyword here? Will that highlight with the keyword color?

@joshgoebel
Copy link
Member

I can just assign them a different class (keyword instead of meta). That would reduce the list of keywords to only the basic, simple keywords. Or should I use meta-keyword here? Will that highlight with the keyword color?

You should assign them the most semantically correct class. You can never 100% assume anything about color - that is 100% up to themes and sometimes themes defy all reasoning. meta-keyword would seem a reasonable usage if they are truly special kind of keywords. Often we use that for compiler directives, special statements, etc... you can always look thru other grammars (to see how they use a classname) though this can always sometimes lead one astray. :)

# Conflicts:
#	CHANGES.md
#	src/languages/swift.js
#	test/markup/swift/multiline-string.expect.txt
@svanimpe
Copy link
Contributor Author

svanimpe commented Dec 8, 2020

@joshgoebel I'm running into a lot of issues with how keywords are handled in HLJS.

For example:

  • x.var() highlights var as a keyword, whereas it clearly is a method. This is allowed in Swift, because you can declare properties and methods with keyword names by wrapping them in backticks ( func `var`() { })
  • Any and Self highlight as normal types, whereas they should be keywords. This is due to the rule that keywords are only matched as leftovers. In this case, the TYPE mode already matches these.
  • Swift as lots of keywords with special characters. For example private(set), #if and @objc(name) are generally all highlighted as keywords and require special modes.

I'm starting to think I should just stop using the keywords: attribute and just match all keywords in a KEYWORD mode. That would remove a lot of "dancing around the keyword rules" that I currently have to do. What do you think?

@svanimpe
Copy link
Contributor Author

@joshgoebel I've made a lot of changes here:

  • I've moved all keywords to a KEYWORDS mode as that was the only way to highlight them properly. The new grammar handles all sorts of keywords now, even those with special characters, and those that require (or may have) a leading dot. Fixes (Swift) left and right keywords and precedencegroup highligthing #2856
  • Likewise, built-ins now also highlight properly. The new grammar should remove 99% of false positives. The remaining 1% are global functions that are overloads of a built-in, but those should be extremely rare (and acceptable to highlight as a built-in). Fixes (Swift) Letter c is a built-in? #2847
  • I've added support for quoted identifiers, as those shouldn't highlight as keywords.
  • I've also added support for implicit parameters and property wrapper projections ($0, $name, etc). Fixes (Swift) Shorthand parameter names are highlighted as numbers #2871
  • I've written a grammar for attributes and adopted the convention of highlighting built-in attributes as keywords, and user-defined attributes as metadata.
  • I've extended the number of modes supported in string interpolation, including expressions that contain parentheses.

I've also changed the author to myself, as there's hardly any original code left, so I should be the one the contact/blame in case there's an issue with the grammar 😅 Is that OK?

@joshgoebel
Copy link
Member

x.var() highlights var as a keyword, whereas it clearly is a method. This is allowed in Swift, because you can declare properties and methods with keyword names by wrapping them in backticks ( func var() { })

Yes, the "typical" solution here is to add a guard/rule for dispatch (to cover these cases), as I believe I've already mentioned several places. That is the direction the library is headed as well since we'd like to highlight dispatch in the future as a thing.

Any and Self highlight as normal types, whereas they should be keywords. This is due to the rule that keywords are only matched as leftovers. In this case, the TYPE mode already matches these.

I have less advice here. This is sort of unusual as most grammars do not match just any identifier like this. What I'd do though is add a special rule for the exceptions (Any, Self)...

Swift as lots of keywords with special characters. For example private(set), #if and @objc(name) are generally all highlighted as keywords and require special modes.

# isn't necessarily hard but pretty sure I already provided guidance that the others should indeed by handled with custom rules.

I'm starting to think I should just stop using the keywords: attribute and just match all keywords in a KEYWORD mode. That would remove a lot of "dancing around the keyword rules" that I currently have to do. What do you think?

Generally this should be done as a last resort. All three of the things you mention here should not be difficult to work-around and is already common in other grammars.

@joshgoebel
Copy link
Member

I've also changed the author to myself, as there's hardly any original code left, so I should be the one the contact/blame in case there's an issue with the grammar 😅 Is that OK?

No objection. I'll need some time to review the current state of things though.

@svanimpe
Copy link
Contributor Author

Well, I did go for that last resort, as that lets me match all keywords without too many exception modes. However, I think removing the keywords attribute generated some issues with language detection 😕 I have no idea how that works, so I'll just wait for your feedback.

Comment on lines 248 to 249
/frozen/,
/GKInspectable/,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they (in the context of the list) are literal strings, then prefer normal quoting not //. This would apply to almost all these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I mix strings and regexes in the same array, will that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing isn't a problem. The underlying utility stuff and compiler usually handles strings and regex interchangeably without any issue. It's only an issue when doing something like a+b or string interpolation, which mostly we're trying to get away from.

// https://docs.swift.org/swift-book/ReferenceManual/zzSummaryOfTheGrammar.html
const DOT_KEYWORD = {
begin: concat(/\./, either(...Swift.dotKeywords, ...Swift.optionalDotKeywords)),
returnBegin: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer regex.lookahead to returnBegin whenever you can, we'd like to deprecate returnBegin in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even possible with a lookahead? I'm doing this workaround because lookbehind isn't supported. I want to highlight only the keyword, if it's preceded by a dot (which I don't want to highlight). The rule should just be:

begin: concat(/(?<=\.)/, either(...Swift.dotKeywords, ...Swift.optionalDotKeywords))

But that doesn't work yet.

Lookbehind will let me remove a lot of workarounds from the code, so I'm hoping it'll be supported soon. I also have some experience writing TextMate grammars, and lookbehind was quite essential there too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookbehind will let me remove a lot of workarounds from the code, so I'm hoping it'll be supported soon.

Don't hold your breath. Even after Safari supports it we have to decide when we want to adopt it... right now we green field browsers going back 3-4 years... so if we waited that long it'll be a LONG time coming.

For a simple "pair" [".", something] you don't even need this muck at all just use begin/end:

  const DOT_KEYWORD = {
    begin: concat(/\./, lookahead(either(...Swift.dotKeywords, ...Swift.optionalDotKeywords))),
    end: either(...Swift.dotKeywords, ...Swift.optionalDotKeywords),
    skipBegin: true

If you want you could rebase onto #2834 and just use beforeMatch sugar (I think that's what I named it?). It's implemented differently still, but it does the same thing.

Copy link
Contributor Author

@svanimpe svanimpe Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (in the next commit). But what about this one:

  const AVAILABLE = {
    begin: /(@|#)available/,
    returnBegin: true,
    contains: [
      {
        className: 'keyword',
        begin: /(@|#)available/
      },
      {
        begin: /\(/,
        end: /\)/,
        endsParent: true,
        keywords: Swift.availabilityKeywords.join(' '),
        contains: [
          OPERATOR,
          NUMBER,
          STRING
        ]
      }
    ]
  };

Here, I only want to highlight the keyword itself, but the match should also consume the argument list.

The rule still seems to work when I replace begin/returnBegin with begin: lookahead(/(@|#)available/), but I wonder if that's kosher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule still seems to work when I replace

Cause it's pretty much just two ways of doing the same thing. Just one rewinds the cursor and one avoids moving it forward.

You could collapse this to a single rule I think, couldn't you?

  const AVAILABLE = {
        begin: /(@|#)available\(/,
        keywords: "@available #available"
        // ...

@svanimpe
Copy link
Contributor Author

After setting some relevances to 0, the language detection issues seem resolved. All tests are back to green now :)

@@ -0,0 +1,301 @@
/* eslint-disable no-misleading-character-class */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add these, if there are issues remaining we need to resolve them.

export const identifierCharacter = either(
identifierHead,
/\d/,
/[\u0300-\u036F\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm showing \uFE20-\uFE2F (the whole range) as problematic because it's a combining character. Unless you want to add a specific test that proves this works as it should we should remove it/comment it out. (I'm trusting that the linter is right on this and this is broken)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this works and is valid Swift code:

infix operator ~︦ // ~ 0xFE26

func ~︦ (lhs: Int, rhs: Int) -> Int {
    lhs + rhs
}

1 ~︦ 2

Screen Shot 2020-12-13 at 13 07 35

Should I add this as a unit test?

I assume this isn't a problem here because there are different rules for the first character of an operator. I'm not familiar with these ranges, but I assume the rules for operatorHead don't include combining characters?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt. Do you have any idea what that warning is trying to tell us? Can the character also be used to make a compound character and then that doesn't work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I assume the rules for operatorHead don't include combining characters?

This one regex is the only one still being flagged and just that portion of it. So it's unique somehow.

Comment on lines 80 to 83
begin: Swift.operator
},
{
begin: `\\.(\\.|${Swift.operatorCharacter})+`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are both these cases accomplishing? I'm not sure I understand the second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first case is a regular operator: a head followed by zero or more valid characters.

The second is a dot-operator: only operators that start with a dot are allowed to use dots as characters (..., ...<, .*, etc). So there rule here is: a dot followed by one or more characters that may also include dots.

@joshgoebel
Copy link
Member

the language detection issues seem resolved. All tests are back to green now :)

Indeed, looking pretty good overall I think. I plan to spend a little time on this myself and see if the keywords can be simplified any.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 15, 2020

I cleaned this up a little:

  • Simplified @|#available as previous suggested.
  • Moved the simpler keywords (most of keywords, literals, numericKeywords) back into the standard keywords framework

All tests are still green.

Any questions before we merge?

@svanimpe
Copy link
Contributor Author

I just wonder whether I'll run into issues again with the regular keyword system, as keywords (such as Any and Self) take priority over modes (such as TYPE), but regular keywords are matched after modes. However, the tests include Any and Self, so if those are still green, everything should be fine.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 15, 2020

I just wonder whether I'll run into issues again with the regular keyword system, as keywords (such as Any and Self) take priority over modes (such as TYPE)

That's why Any and Self are still separate rules (to prevent TYPE from grabbing them). The keyword system should be preferred when it works, and then use modes to fill in the gaps. We'd like to extend the keyword system long-term where it makes sense not have every grammar build their own custom system. So forcing you to get used to working with it is part of the long-term plan. :-) There are a few issues open on keywords if you wanted to have a look and perhaps comment.

That said, cases like TYPE are probably not going away anytime soon because keywords are processed "in the gaps" and probably always will be (that's their distinction from modes). So the solution is just to handle Any and Self with their own rule that fires before type.

Long-term multi-patterns and look-behind would resolve a lot of your other keyword issues I think. Though you can really already do multi-patterns using either with $pattern for many cases. (as I do here)

@joshgoebel
Copy link
Member

@svanimpe Thanks for all the effort on this!

@joshgoebel joshgoebel merged commit c69b393 into highlightjs:master Dec 15, 2020
@svanimpe svanimpe deleted the swift-operators branch December 16, 2020 08:58
@svanimpe
Copy link
Contributor Author

@joshgoebel I ran into a problem with the new keyword regex: /\b\w+(\(\w+\))?/

I'm trying to match tuples like (name: Type, name: Type) to prevent name from being highlighted as a keyword, in case it matches any (which is allowed). However, because keywords with parentheses are now handled by the regular keyword system again, the tuple rule takes precedence and screws up the keyword matching.

I still feel like matching keywords last is only going to keep producing problems like this, which is why I moved them all into modes 😕 To solve my current problem, I'd have to revert some of your changes and change keywords with parentheses back to regexes, so they're included in the KEYWORD_MODES. Is that OK for you?

@joshgoebel
Copy link
Member

joshgoebel commented Dec 18, 2020

However, because keywords with parentheses are now handled by the regular keyword system again, the tuple rule takes precedence and screws up the keyword matching. ... revert some of your changes and change keywords with parentheses back to regexes

Please push the code that isn't working with a failing test and let me look at it. Often there are easy work-arounds for many of these kinds of problems. Off the top I wouldn't expect this to be hard to do.

Isn't a rule to match [a-z]+:ish sufficient... grabs all tuples... and keywords match the remainder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Swift) left and right keywords and precedencegroup highligthing
2 participants