Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "forked" semver format <semver><symbol><semver> #924

Open
hpe-ykoehler opened this issue Mar 15, 2023 · 21 comments
Open

Support for "forked" semver format <semver><symbol><semver> #924

hpe-ykoehler opened this issue Mar 15, 2023 · 21 comments

Comments

@hpe-ykoehler
Copy link

Often we have to built software using multiple components from different sources. We also often have to patch / fork the original package to fit within that software we are building. But those patch may take time to be accepted upstream or may not fit with the original design direction of the upstream project.

There should be a way to track the original version and the forked version of a given package.

For example if I applied a breaking change to a package 1.2.3 version, I then want to express that but still keep the 1.2.3 version as the root, an example would be 1.2.3-2.0.0 but that '-' is certainly not the proper symbols to separate those.,but that is the idea where both version are kept and the "upstream" package version remains the first one since all the rest is based on it.

As one upgrade the upstream package to 2.3.4 for example, you need to reconsider your patch, and they may or not be any breaking change anymore, so you would need to review that starting at 1.0.0 and adjust from there depending on the patch impact on the 2.3.4 version.

@ljharb
Copy link
Contributor

ljharb commented Mar 15, 2023

If it's a fork, the project name should change.

@hpe-ykoehler
Copy link
Author

Not really, I am not talking about public fork, but internal, where you upstream patches don't get accepted, you still need them and you eventually get them reworked to be accepted, and such, but all the in-between time you still need to track the package separately and keep track of the upstream project. The goal of those forks isn't to start a new project but to adapt one to something else, and sometimes projects aren't flexible enough to allow for that without forking.

For example, in Gentoo, there were often cases where some package needed patches (which is basically like a fork) to adapt to gentoo environment, and since the package itself didn't offer the change requires as config, the code had to be changed. In their case, they had to use versioning out of semver.

I think it would be nice for semver to acknowledge that such a scenario is actually quite frequent, as component gets adapted to different environment they often need adjustments, which sometimes can't be yet pushed upstream, and may take a couple of releases before the upstream project decide to add the require flexibility and such.

@ljharb
Copy link
Contributor

ljharb commented Mar 16, 2023

That’s still a distinct project. I’d suggest keeping that metadata out of band, instead of coupling the version numbers of the public package with those of your internal fork.

It’s quite frequent but that doesn’t mean it’d advisable nor something that semver should accommodate.

@hpe-ykoehler
Copy link
Author

hpe-ykoehler commented Mar 16, 2023

Well, it is more the opposite, where the fork keeps track of the upstream project on which it is based (would not make sense for the public package to change its version to include forks). Using a different name is hiding the information away, we then have to use other means to keep track of it, and then everyone goes their way, making semver less relevant.

@ljharb
Copy link
Contributor

ljharb commented Mar 16, 2023

It doesn't hide it if you just slap a scope/prefix in front of it - an internal fork of foo at acme would be @acme/foo or acme-foo, problem solved.

@hpe-ykoehler
Copy link
Author

The problem is not solved, if project foo is at 2.3.4, you then create @acme/foo at what version?

either you keep it, so @acme/foo 2.3.4, or you restart at @acme/foo 1.0.0

Then you evolve the code to add features if you re-used 2.3.4 you may now be at 4.1.2, if you restart from 1.0.0 you may be at 2.1.2.

During that time the upstream package has evolved and is now at 3.1.2

Now it may cause confusion since upstream had 3.1.2 and @acme/foo also had 3.1.2 at some time but the feature of the upstream foo in 3.1.2 isn't part of the @acme/foo 3.1.2, not even part of the 4.1.2, so we lost that data.

Then if you do merge the upstream foo 3.1.2 into the @acme/foo you then may increase @acme/foo to 5.0.0 and again you can lose that information of which feature set of foo your @acme/foo is based upon, which is a great amount of information, as it basically says All of the foo 3.1.2 API is or not part of the @acme/foo...

The fork I am talking about is not a fork in the sense that we are changing the design, but more about we are adapting the component to a specific env, yet the component remains the same, and its intent is to follow the upstream project.

And today each time I saw that pattern people deviated from the semver, making all tools out there build upon semver to require customization. So that is why I think it would be worthwhile for semver to support such model so that tooling can be adjusted to match it and stop every project that use multiple components to go their own way.

The above scenario with 2 semver would be like

foo 2.3.4 -> @acme/foo 2.3.4-2.1.2
foo 3.1.2 -> @acme/foo 3.1.2-1.0.0

It better shows which upstream version is being talked about and how many changes were made to it. Using '-' would not make sense I think, but one could easily see that 3.1.2-1.0.0 is a greater version than 2.3.4-2.1.2 and semver tool could be designed to support this model so that others can use it in their distro.

Pretty sure most distro that re-uses community package has to solve this problem on their own and likely they break semver and use some other means to express their own change to the package they imported/adapted.

@ljharb
Copy link
Contributor

ljharb commented Mar 16, 2023

What you're describing is why maintaining a synced fork is highly problematic, regardless of what versioning system you're using, and why it's something that one should strenuously avoid.

@hpe-ykoehler
Copy link
Author

It is not something one can avoid, it depends on the flexibility offered by the upstream package itself, and also for example how fast they apply patches vs how fast you need to yourself provide them.

Some packages simply do not offer enough config elements (with a reason as they cannot foresee the environment their software may get used). For example, not all binaries out there support reloading their config on a SIGHUP. Some require full process restarts, etc.

@ljharb
Copy link
Contributor

ljharb commented Mar 16, 2023

Then you can submit a PR to make things configurable.

An option that for some reason folks don't like to consider in these cases is "then you simply can't do that yet".

@hpe-ykoehler
Copy link
Author

I think I already answered why that doesn't work.

@jwdonahue
Copy link
Contributor

@hpe-ykoehler, I think this has been suggested in earlier proposals. I don't have time to find them right now.

I agree with @ljharb that what you are trying to version is a separate product. Here's the problem with your proposed solution:

  • You fork at P 1.0.0 and subsequently release P 1.0.0@2.0.0 into your production system.
  • I fork at P 1.0.0 and subsequently release P 1.0.0@2.0.0 into my production system.

Now we have a violation of the SemVer clause 3 which basically says "there can be only one". You could argue that our respective production environments are isolated from each other, but my experience with this in large scale systems, is they are not. I've seen multiple forks of various bits of code number in the dozens in some large scale development environments, where branch A of P has to be merged with B of P and the two branches don't have the same version sets of P. Yes, each branch with multiple versions of P, but different sets! Yuk ;(. Fixing build breaks in an environment with hundreds of branches and thousands of machines under these circumstances can be very painful.

Packages have an unfortunate tendency to find their way into the wild. If your P v and my P v happen to be the same but with different bits in them, much confusion ensues. Better for us to rename our packages. Whether our forks are ahead, behind or exactly at parity with the base product, our work is done in parallel to that of the base product owners, so there's always a risk of duplicate versions being applied to potentially different bits with the same name.

One solution is for you to always depend on your fork of P, ForkOfP. For record keeping purposes, you can embed the version you forked from in the metadata, so your ForkOfP might have a version history along these lines:

1.0.0+P.2.1.0 // Initial fork without changes.
1.0.1+P.2.1.0 // You added your patch.
1.0.1+P.2.1.1 // Your patch was merged with base.
1.0.2+P.2.1.3 // You merged from base.
1.1.0+P.2.1.3 // You added a feature.
etc.

The downside to this is; if your product is consumed by others that also have a dependency on the base product:

  • There's potential for some duplication of bits.
  • There's some potential for incompatibility issues if the two versions rely on the same resources.
  • There's some potential for namespace or symbol conflicts.

But your proposal already has those issues as well, and depending on the languages/environments involved there may or may not be straightforward solutions to them. Diamond dependency issues have always been a hard problem.

The simplest thing to do is as @ljharb suggests and offer a PR to the base product owners. If they are unable to respond reasonably quickly to your request, offer to pay them or volunteer your time to vet and accept PR's for them.

@hpe-ykoehler
Copy link
Author

@jwdonahue I see your point about the fact that two distinct vendors could release the same package with the same name/version. It actually can happen today as well, if distro one fix package P and create a 1.0.0-p1 and distro2 also apply a fix to package P and create a 1.0.0-p1 (if they use the same custom semver model).

It is clear that the distro will not "rename" a package because they add a patch to it, it is the same thing for vendors using open source, we do not rename a package because we add a patch to it.

So the solution of renaming a package makes no sense in the context because the idea is still that we want users of the vendor/distro to identify the function they need based on the package name they know..

Yet, to integrate there is pretty much often the need to apply patches, for all the reasons I already indicated (won't copy them back here).

I somehow feel that using a more explicit versioning system such as - would be more useful than - and would convey better if a patch actually breaks something or not, without fixing the notion that there are now 2 distinct content with the same - because those would not be available together in the same environment, like today.

@ljharb
Copy link
Contributor

ljharb commented Apr 4, 2023

@hpe-ykoehler note that things like a distro modifying the package without renaming it is also not really compatible with semver, in spirit if not in letter - only the maintainers decide what versions something has.

@hpe-ykoehler
Copy link
Author

hpe-ykoehler commented Apr 4, 2023

@ljharb That is exactly my point... a distro modifying a package is a common thing, the person that modifies that package with the patch becomes the maintainer for that package in that distro.

Since there is no versioning spec in those scenarios we see people coming up with their own versioning scheme on top of semver. To me, that is an indication that there is a "need" that semver is not answering.

The issue with the way distro is doing things is that we lost the data about if the patch actually change the API or not, if they added a feature or just apply a security fix, etc. Which I think semver can provide.

Hence my request for semver to answer such case and provide a more standard approach allowing semver to have a larger usage than initially defined and provide the same benefit in additional context.

@ljharb
Copy link
Contributor

ljharb commented Apr 4, 2023

That something is common doesn’t mean it’s at all advisable - designing a versioning system to accommodate it would be encouraging something bad.

@hpe-ykoehler
Copy link
Author

"That something is common doesn’t mean it’s at all advisable"

Agree, but what we are talking about is not optional either, distro has simply no choice, as they cannot delay their release on upstream package aspect and more.

"designing a versioning system to accommodate it would be encouraging something bad."

I disagree with this, the reason why semver was created is basically identical, everyone was doing their own things, and there was no convention, no definition. When combining dependency it was problematic.

That same issue exists in a distro, and no one is addressing it, therefore we see many variants where you can't know what Ubuntu change from package A, nor centos, etc.

So I think this is a philosophical question, and if semver doesn't want to address this fine, but it doesn't make that need disappear.

@ljharb
Copy link
Contributor

ljharb commented Apr 4, 2023

Distros can and should delay their release, or omit a package, if it’s not compatible.

@hpe-ykoehler
Copy link
Author

hpe-ykoehler commented Apr 4, 2023

@ljharb I guess we live in a different reality then, ideally they do, but not all upstream package are maintained the same.

@jwdonahue
Copy link
Contributor

jwdonahue commented Apr 4, 2023

Since there is no versioning spec in those scenarios we see people coming up with their own versioning scheme on top of semver. To me, that is an indication that there is a "need" that semver is not answering.

There are so many needs that SemVer does not, cannot, will not and the maintainers probably do not want to attempt to fill, that we could not delineate them all here. Not every tool chain has to follow SemVer, in fact, many packaging tools are not limited to SemVer only, precisely because it is not the end all and be all of versioning.

Back in the 90's when I spent most of my time working embedded systems, it was not uncommon for us to install a base package (usually untar/zip something), then apply targeted patches to that base package for the particular board. Pretty sure that's still a common thing. I designed a packaging tool for the Windows build system about 10 years ago that made that sort of thing very easy to accomplish. We used a manifest that defined what packages or parts thereof were laid down where on the drive(s), where they came from and the order in which they were laid down. A "distro" in this sense was just a manifest that defined where base packages and patch packages come from and how they are laid out in storage. No defiling of anybody's products or pretending to provide X when you're really providing Y, zero risk of content collisions. And it requires manifests are versioned using SemVer, but content packages can use any unique name/version scheme the developers find useful (like build numbers or dates), though we were encouraging adoption of SemVer at the time.

That technique probably doesn't work for all things either, but it is another tool in the kit.


Oh and the tool did require SemVer if you wanted to use ranges, and those were standard set notation [1.0.0, 2-), meaning any version starting with release 1 but less than 2, including any prerelease greater than 1.0.0. We added modifier prefixes (<>) to determine whether it should get the lowest or highest in the range, defaulting to lowest of course. This allowed test builds to get the latest bits and release managers to build a candidate built from of an explicit list. One goal was to mine test data and construct a manifest that included the latest known bits that were all known to pass certain test suites in combination with each other.

@jwdonahue
Copy link
Contributor

It actually can happen today as well, if distro one fix package P and create a 1.0.0-p1 and distro2 also apply a fix to package P and create a 1.0.0-p1 (if they use the same custom semver model).

That is not a SemVer compliant solution. It's a violation of clause 3 and it doesn't make any sense due to the precedence rules.

@jwdonahue
Copy link
Contributor

Add to the above that the package name is copyrighted, sometimes trademarked, and often not part of the OSS content. It is usually the intent of OSS publishers to grant rights in the source code and certain build outputs, not the name of the product. Some licenses are quite explicit in that regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@ljharb @jwdonahue @hpe-ykoehler and others