Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh(haskell) BinaryLiterals, NumericUnderscores, HexFloatLiterals #3150

Merged

Conversation

martijnbastiaan
Copy link
Contributor

@martijnbastiaan martijnbastiaan commented Apr 19, 2021

Changes

Added support for the following Haskell language extensions:

It slightly over-approximates. For example, it would highlight 10_. If that's an issue, I'll put more effort in the PR. IMO this isn't really a problem though, as it would be a syntax error anyway.

Edit: sorry for all the force pushes, should be done now :)

Checklist

  • Added markup tests
  • Updated the changelog at CHANGES.md

@martijnbastiaan martijnbastiaan force-pushed the haskell-numerical-underscores branch 2 times, most recently from cb8d5fe to b3f881e Compare April 19, 2021 08:05
@martijnbastiaan martijnbastiaan changed the title Add support for Haskell's NumericalUnderscores Add support for Haskell's BinaryLiterals/NumericalUnderscores Apr 19, 2021
@martijnbastiaan martijnbastiaan changed the title Add support for Haskell's BinaryLiterals/NumericalUnderscores Add support for Haskell's BinaryLiterals/NumericUnderscores Apr 19, 2021
Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just needs a few changes!

}),

hljs.inherit(hljs.C_NUMBER_MODE, {
begin: '(-?)(\\b0[xX][a-fA-F0-9_]+|(\\b(\\d|_)+(\\.(\\d|_)*)?|\\.(\\d|_)+)([eE][-+]?(\\d|_)+)?)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we do not match - since it could often by unary or binary and we do not try and figure out such distinctions. (very hard to do also without look-behind)

Also please add additional tests for some of the e variants, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there two variants here? (hex and regular?) It seems so. Please split them out into variants for readability and maintenance rather than using a more complex single regex. See many other grammars for examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I just copied it from C_NUMBER_MODE. I'll make a few tests though :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if it's exactly the same it of course shouldn't need to be copied at all... but if it's now specific to this language then we should clean it up into variants for future use - which is just the general policy on such things.

I'd suggest we could perhaps clean up C_NUMBER_MODE as well but too many grammars are likely dependent on it being exactly how it is now - but that doesn't mean we can't have nicer and easy to read matches in the individual grammars. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add such thoughts to the v12 list. v11 is already big enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah let me clarify a bit: I copied it and then added |_ to optionally match the underscores.

I'm a bit fuzzy on what you want me to do; should I just make a clean version for Haskell numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/highlightjs/highlight.js/blob/main/src/languages/swift.js#L121 for an example.

When we start adding custom numeric rules to grammars then we always do it as nicely as possible (for future readability and maintenance). That means separate variants for hex, binary, etc... (as you see in Swift) not just one large regex with multiple rules "hidden" inside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. I'll get to it tonight. Thanks for the quick responses :)

@martijnbastiaan martijnbastiaan force-pushed the haskell-numerical-underscores branch 3 times, most recently from 249ba7f to c2b2c2b Compare April 19, 2021 19:18
@martijnbastiaan
Copy link
Contributor Author

@joshgoebel I've rewritten Haskell's number parsing taking inspiration from Swift's implementation. I've included a whole bunch of test cases taken from the GHC documentation. As a happy coincidence, the highlighter now also supports the HexFloatLiterals language extension :). Let me know if there's anything more I can do.

@joshgoebel
Copy link
Member

Awesome. This looks MUCH much closer to mergeable. :-)

Not sure what all the valid/invalid is trying to accomplish since we randomly highlight some invalids? Is this sample code just copied and pasted from GHC? If so, ok, but we should add a comment saying so with a link to the source to explain what's going on and that the invalid can mostly be ignored - and that all the valid should be highlighted.

If it's not then I'm not a direct copy/paste I'm sure what valid having any invalids serves?

@martijnbastiaan
Copy link
Contributor Author

Yeah, you're right, leaving the invalids in is a bit silly. Removed!

@joshgoebel
Copy link
Member

Please remove the -? for consistency with every other grammar. We have no way to know if that - is unary or an operator, do we. I assume Haskell has the - operator for subtraction?

IE:

5-3

The - is NOT part of the number 3, it's an operator, correct?

@martijnbastiaan
Copy link
Contributor Author

Good point!

The - is NOT part of the number 3, it's an operator, correct?

That's absolutely correct. I've updated the code to reflec this. I've also added a comment to the negative number tests, in case someone else stumbles upon it in the future. Could you check whether the comment is accurate?

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great other than my last question!

Restructured Haskell's number parsing, taking a cue from Swift's
implementation. As a happy coincidence, this patch adds support for
Haskell's `HexFloatLiterals` language extension.
@joshgoebel joshgoebel changed the title Add support for Haskell's BinaryLiterals/NumericUnderscores enh(haskell) BinaryLiterals & NumericUnderscores Apr 20, 2021
@joshgoebel joshgoebel changed the title enh(haskell) BinaryLiterals & NumericUnderscores enh(haskell) BinaryLiterals, NumericUnderscores, HexFloatLiterals Apr 20, 2021
@joshgoebel
Copy link
Member

@martijnbastiaan Thanks for all the effort!

@joshgoebel joshgoebel merged commit b6e5b1a into highlightjs:main Apr 20, 2021
@martijnbastiaan martijnbastiaan deleted the haskell-numerical-underscores branch April 20, 2021 11:58
@martijnbastiaan
Copy link
Contributor Author

@joshgoebel Thanks for all the patience! This is what makes me love OSS ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants