Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bash) Less false positives for keywords in arguments ie, --keyword-flag #2669

Merged
merged 1 commit into from Sep 5, 2020

Conversation

sirosen
Copy link
Contributor

@sirosen sirosen commented Sep 1, 2020

In bash, option, param, and command names may contain builtins as hyphen-delimited components. As in foo --set-bar.

In order to handle this, just add - to the pattern's set of word characters. Includes a new test for bash tokens containing keywords, which checks hyphens and underscores for a few common cases.

Against current master, the new test fails, which helps to confirm the fix.

resolves #2668


Aside: I really have to compliment you guys on the structure of the testsuite. Even with my very limited knowledge in this domain, adding a new failing test case and verifying it was a breeze. Thanks to everyone who works on this project!

@joshgoebel
Copy link
Member

Yes, nice!

Perhaps add an example of one or two of the -a (or whatever) single letter keywords just to make sure they are still highlighted properly. (I didn't see a test for that)

@joshgoebel joshgoebel changed the title Make bash treat '-' as a wordchar fix(bash) Less false positives for keywords in arguments ie, --keyword-flag Sep 2, 2020
@joshgoebel joshgoebel added enhancement An enhancement or new feature language labels Sep 2, 2020
@sirosen
Copy link
Contributor Author

sirosen commented Sep 3, 2020

I just got to adding such a test, and I was surprised to see that it failed:

# test keyword usages with hyhens at their start
if [ $VAR -ne 0 ]; then echo 'hi'; fi
[ $FOO -eq 0 ] && [ ! -f $BAR ] || exit 2

became

<span class="hljs-comment"># test keyword usages with hyhens at their start</span>
<span class="hljs-keyword">if</span> [ <span class="hljs-variable">$VAR</span> -ne 0 ]; <span class="hljs-keyword">then</span> <span class="hljs-built_in">echo</span> <span class="hljs-string">&#x27;hi&#x27;</span>; <span class="hljs-keyword">fi</span>
[ <span class="hljs-variable">$FOO</span> -eq 0 ] &amp;&amp; [ ! -f <span class="hljs-variable">$BAR</span> ] || <span class="hljs-built_in">exit</span> 2

I'm reading and trying to understand why this is happening, but I'm not sure I'll be able to figure it out.

@sirosen
Copy link
Contributor Author

sirosen commented Sep 3, 2020

Actually, I'm not sure if that's a failure or not? These terms (e.g. -f) don't appear to be marked up today -- at least, not when I test. I'm not sure what the _ section of the language spec, containing them, is for.
I'm happy to make relevant/related changes, but I need some direction on that.

@egor-rogov
Copy link
Collaborator

Hmm... Options (like -lt etc) just don't work and it looks like they never did. That's because \b (word boundary) doesn't treat hyphens as a part of a word. We have to replace \b with explicit definition of a boundary.
On the other hand, if we do, we'll get rather ugly elements like <span class="hljs-_">...</span>, which was supposed, as I understand, as relevance boosters. But do we really what it?

@joshgoebel
Copy link
Member

@egor-rogov Creating a new issue so this doesn't get bogged down in that.

@sirosen Thanks for looking into. :-)

And thanks for contributing! It's truly appreciated.

@joshgoebel
Copy link
Member

@sirosen Can you please update CHANGES.md and then we can get this merged?

sirosen added a commit to sirosen/highlight.js that referenced this pull request Sep 5, 2020
In bash, option, param, and command names may contain builtins as
hyphen-delimited components. As in `foo --set-bar`.

In order to handle this, just add `-` to the pattern's set of word
characters. Includes a new test for bash tokens containing keywords,
which checks hyphens and underscores for a few common cases.
@joshgoebel joshgoebel added this to the 10.2 milestone Sep 5, 2020
@sirosen
Copy link
Contributor Author

sirosen commented Sep 5, 2020

I pushed a commit with the changelog update, but also did a squash-rebase so that it will merge cleanly.

@joshgoebel
Copy link
Member

joshgoebel commented Sep 5, 2020

I pushed a commit with the changelog update, but also did a squash-rebase so that it will merge cleanly

For future, don't bother we almost always squash commits when merging anyways.

@joshgoebel joshgoebel merged commit 01f049b into highlightjs:master Sep 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(shell) Shell tokenization doesn't treat hyphens as parts of words
3 participants