Clarifying Lexical Comparison for Identifier Precedence (Again) #970

BenjaminHolland · 2023-08-31T04:35:50Z

I've looked at #832 and #561, and I'm still not sure how to resolve a comparison like

1.0.0-alpha.1
1.0.0-1.alpha

Is it safe to assume that the comparison here will be alphanumeric, even though one field in each comparison is numeric?

hoelzeli · 2023-09-03T14:32:54Z

I think so, yes. 11.4 is the relevant section:

[...] Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found [...]

So the prerelease version will be split into the identifiers like so (written as python arrays and python strings):
1.0.0-alpha.1 --> prerelease identifiers: ["alpha", 1]
1.0.0-1.alpha --> prerelease identifiers: [1, "alpha"]

The comparisons would therefore be:

"alpha" <--> 1
1 <--> "alpha"

The second comparison will never be reached because "alpha" takes precedence over 1 (section 11.4.3)

jwdonahue · 2023-09-14T05:44:21Z

So first we have #9:

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--.

Then there's #11:

Precedence for two pre-release versions with the same major, minor, and patch version MUST be determined by comparing each dot separated identifier from left to right until a difference is found as follows:

Identifiers consisting of only digits are compared numerically.
Identifiers with letters or hyphens are compared lexically in ASCII sort order.
Numeric identifiers always have lower precedence than non-numeric identifiers.

Fortunately, the ASCII codes for the digits [0..9] are lower than the codes for [a..z] and [A..Z], so as long as your string consists of ASCII or UTF-8 characters in those character ranges, you can simply compare each character of each field from left to right until you encounter a character that is greater, or the field has more characters. So:

'a' > '1' (ie; 97 > 49), and your done.

So there's no need to ever convert a numeric identifier to a scaler like int or long, it's just wasteful.

I do always split the string on the dots and make sure the first three fields are pure numeric characters and the remaining fields are in the expected character ranges, before doing any comparisons. While not explicitly stated in the spec, the SemVer rules do not apply to non-SemVer strings. Such comparisons require consideration of implicit and explicit semantics, if any, of the non-SemVer string and may also require attention to cultures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifying Lexical Comparison for Identifier Precedence (Again) #970

Clarifying Lexical Comparison for Identifier Precedence (Again) #970

BenjaminHolland commented Aug 31, 2023

hoelzeli commented Sep 3, 2023

jwdonahue commented Sep 14, 2023 •

edited

Clarifying Lexical Comparison for Identifier Precedence (Again) #970

Clarifying Lexical Comparison for Identifier Precedence (Again) #970

Comments

BenjaminHolland commented Aug 31, 2023

hoelzeli commented Sep 3, 2023

jwdonahue commented Sep 14, 2023 • edited

jwdonahue commented Sep 14, 2023 •

edited