Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Autodiscover should be smarter when it decides which versionfilter to use #977

Closed
olblak opened this issue Nov 12, 2022 · 14 comments

Comments

@olblak
Copy link
Member

olblak commented Nov 12, 2022

Is your feature request related to a problem?

Right now, we always use the semver versionfilter in the context of autodiscovery.
I would like Updatecli to be smart an try multiple constraint

Solution you'd like

I would like Updatecli to respect the following versions

1.0.0 -> Should return a semver with three fields
1.0 -> should return a regex with two fields such as ^(\d).(\d)$
1 -> should return a regex with one field such as ^(\d)$

1.0.0-alpha -> Should return a regex such as ^(\d).(\d).(\d)-alpha

Alternatives you've considered

No response

Anything else?

No response

@dduportal
Copy link
Contributor

dduportal commented Nov 12, 2022

A few notes:

  • 1.0.0-alpha is a semver as per https://semver.org/. They provide, in their FAQ, a cool set of regex: check https://regex101.com/r/vkijKf/1/, scroll in their loooong list of valid and invalid semver strings.

  • You might want to provide "smartness" to the autodiscoverry when handling semver valid versions prefixed by a v (even if it is not semver

  • What do you think about the following kind of logic?

    1. "If it matches a semver regex" then assume it is a semver
    2. "Otherwise if it has a v prefix and matches (without the v' prefix) a semver regex then assume it is a semver 'ala GitHub'"
    3. Otherwise uses a "latest" semver filter
  • For other cases, better to think about an escape hatch, e.g. allow the user to define their versionFilter in an autodiscovery top-level manifest

@olblak
Copy link
Member Author

olblak commented Nov 14, 2022

I am adding a bit of context on why I opened this issue.
It's pretty common in the container ecosystem to add information after "-" such as "1.0.0-jdk11" or "1.0.0-jdk17"
From a semantic point of view those are prereleases and as soon as a release 1.1.0 is ship, the semver will return 1.1.0
This isn't what I want. Especially in the context of autodiscovery where I try to detect common patterm.
I am looking for something like the latest release matching "(\d).(\d).(\d)-jdk11"

I am also facing this problem in the context of updatemonitor where I want to monitor specific prerelease.

What do you think to try multiple regex such

	patterns := []string{
		"^v(\\d).(\\d).(\\d)",
		"^(\\d).(\\d).(\\d)",
		"^v(\\d).(\\d)$",
		"^(\\d).(\\d)$",
		"^v(\\d)$",
		"^(\\d)$",
		"^v(\\d).(\\d).(\\d)-(.*)$",
		"^(\\d).(\\d).(\\d)-(.*)$",
		"^v(\\d).(\\d).(\\d)+(.*)$",
		"^(\\d).(\\d).(\\d)+(.*)$",
	}

and if none matching then return versionfilter of type latest or semver

@lemeurherve
Copy link
Member

lemeurherve commented Nov 14, 2022

This regexp should be enough to match all these cases: ^v?(\\d)(\.(\\d)){0,2}(-(.*))?$

Note: be aware the dots in your regexp patterns match every character, not only .

@olblak
Copy link
Member Author

olblak commented Nov 14, 2022

More I think about this issue and less I think it's doable in a generic way... :(

The problem is depending on the API it doesn't sort the values the same way
For example DockerHub will return tags sorted alphabetically so in the following example
["19.0.1-buster-slim","19.0.1-slim","4","4-alpine","4-onbuild"]

While ghcr.io will return the list sorted by published time such as
["4","4-alpine","4-onbuild","19.0.1-buster-slim","19.0.1-slim"]

So the way to uniform this is to use the semantic versioning approach.

Let's take the docker image tag "node:9-alpine"

We could generate the following manifest

sources:
    node:
        name: Get latest "node" Docker Image Tag 
        kind: dockerimage
        spec:
            image: node
            versionfilter:
                kind: regex
                pattern: ^v?\d*-alpine$

or

sources:
    node:
        name: Get latest "node" Docker Image Tag 
        kind: dockerimage
        spec:
            image: node
            versionfilter:
                kind: semver
       transformers:
         - addpostfix: -alpine
conditions:
    node:
        name: Ensure latest "node" Docker Image Tag exist with tag alpine
        kind: dockerimage
        source: node
        spec:
            image: node

Or we just don't care about semver metadata and only return the latest version using semantic versioning

@olblak
Copy link
Member Author

olblak commented Nov 15, 2022

Well I think the root cause of my problem is that the library we use to filter version using sementic versioning do not allow to preserve prerelease information Masterminds/semver#184

So for example for the docker image jenkins:2.235-jdk11, I would like to retrieve the latest tag using the prerelease -jdk11
My initial attempt was to use the regular expression but depending on the API, the list of versions are not ordered the same way.

Maybe I could a parameter to the versionfilter to mention that we want to preserve some fields

sources:
    node:
        name: Get latest "node" Docker Image Tag 
        kind: dockerimage
        spec:
            image: node
            versionfilter:
                kind: semver
                pattern: >=1.0.0
                constraints:
                  - name: prelease
                    value: jdk11

The goal is to build a list of versions that only contains prerelease information matching "jdk11" and then we apply the pattern ">=1.0.0"

Maybe the solution must come from the upstream library

@olblak
Copy link
Member Author

olblak commented Nov 15, 2022

I am sharing this document here
https://github.com/docker/metadata-action#typesemver

@olblak
Copy link
Member Author

olblak commented Nov 15, 2022

I have been scratching my head about how to retrieve a docker image tag using a semver prerelease information.

The problem is if we have both a version like "19.0.0-alpine" and "19.0.0" and we try to know which one is the latest, then "19.0.0" win always win.

This is not what I am looking for. If I use a docker image tag with "alpine" in the tag then there is probably a reason.

Second thing that bother me in the context of docker image tag, is that it can be really difficult to make the difference between "20221115" and "2". In my test case I have many images like "quay.io/calico/node"

The best solution I could think about is to provide tagfilter to reduce the list of docker image tag to process such as

sources:
  default:
    kind: dockerimage
    spec:
      image: node
      tagfilter: '^(\d*)-alpine$'
      versionfilter:
        kind: semver
        pattern: ">9-alpine"

Where the goal is to retrieve the docker image tag "node:19-alpine"

@dduportal
Copy link
Contributor

The problem is if we have both a version like "19.0.0-alpine" and "19.0.0" and we try to know which one is the latest, then "19.0.0" win always win.

That is correct as per the semver specification: https://semver.org/#spec-item-11.
It means that the Docker image tag 19.0.0-alpine is not a valid semver and should be treated as a regexp instead:

sources:
  default:
    kind: dockerimage
    spec:
      image: node
      versionfilter:
        kind: regexp
        pattern: '^(\d*).(\d*).(\d*)-alpine$'

=> Problem is that you want to use the "semver" library for the "sorting" part when using a regex because you have different sorting based on the different repositories. I understand the root cause since you explained it to me in 1:1, but it still not a solution trying to use edge cases of semver to solve that issue.

The best solution I could think about is to provide tagfilter to reduce the list of docker image tag to process such as

sources:
  default:
    kind: dockerimage
    spec:
      image: node
      tagfilter: '^(\d*)-alpine$'
      versionfilter:
        kind: semver
        pattern: ">9-alpine"

That snippet should be:

sources:
  default:
    kind: dockerimage
    spec:
      image: node
      versionfilter:
        kind: regexp
        pattern: '^(\d*)-alpine$'

because 19-alpine is not a valid semver.
I would expect the autodiscovery to detect this case because it's an "easy" one from my user point of view.

=> again, the issue would be with the sorting algorithm.

That would be a concern outside the "autodiscovery": it is a concern with the docker image resource and not an easy one;

WDYT about fixing the root issue at the "versionFilter" level when it is a regex? and/or the time based sorting?

I know it is not easy topic, but it need to be correctly done to not create false expectations for users.

@dduportal
Copy link
Contributor

Something that could eventually help us around sorting: https://yourbasic.org/golang/how-to-sort-in-go/#sort-with-custom-comparator

@olblak
Copy link
Member Author

olblak commented Nov 16, 2022

First of fall this problem is indeed difficult and I have been scratching my head around this problem for quite long time now :/

So regarding "19.0.0-alpine" is a valid a semantic version where "--alpine" is the prerelease information
As per https://semver.org/#spec-item-9

=> Problem is that you want to use the "semver" library for the "sorting" part when using a regex because you have different sorting based on the different repositories. I understand the root cause since you explained it to me in 1:1, but it still not a solution trying to use edge cases of semver to solve that issue.

Unfortunately, the following example is not option as it doesn't return reliable results. That was one of my first attempts to solve this problem.

sources:
  default:
    kind: dockerimage
    spec:
      image: node
      versionfilter:
        kind: regexp
        pattern: '^(\d*)-alpine$'

because 19-alpine is not a valid semver.
I agree with you. But they are so many projects publishing only the major or major.minor version that ignoring them makes things a lot more difficult to handle.
Some context on masterminds/semver#3

That being said using the the semver regex, https://github.com/Masterminds/semver/blob/master/version.go#L42 the masterminds/semver library is able to parse it and convert this version to 19.0.0, then we can apply constraint such as ~19 to retrieve what we are looking for. Where it gets messy is when you have three versions published like 19.0.0, 19.0, and 19, they would all be considered as [19.0.0] and only one will be pick for sorting.
So the library we use, has the ability to require strict semantic versioning so it would only consider "19.0.0"

My suggestion out of the following versions:

["17","17-alpine","17-debian","17.0","17.0-alpine","17.0-debian","17.0.1","17.0.1-alpine","17.0.1-debian","2004-15-11",
["18","18-alpine","18-debian","18.0","18.0-alpine","18.0-debian","18.0.1","18.0.1-alpine","18.0.1-debian"]

Is to first pre-filter tags using regex for example using "^(\d*).(\d*).(\d*)-alpine$"
Would return

[ "17.0.0-alpine", "17.0.0-alpine", "17.0.1-alpine", "17.0.1-alpine", "18.0.0-alpine", "18.0.0-alpine", "18.0.1-alpine", "18.0.1-alpine"]

If we apply the same technique to a more restrive rule such as "^(\d*)-alpine$"

["17-alpine","18-alpine"] the lib parse them and consider them as ["17.0.0-alpine","18.0.0-alpine"] and then in the back we retrieve the original value.
["17-alpine","18-alpine"]

We already use this approach to filter github release. For example the Golang project publish a new major release such as {{ Major }}.{{ Minor }} then all following are {{ Major }}.{{ Minor }}.{{ Patch }}
And using regex to filter version do not allow me to specify constraints such as "~18"

I don't think there is another way that first pre-filtering information before passing them to the versionFilter engine.
That being said, instead of a new filterTag on the docker resource, I should have a preFilter parameter in the versionFilter. Because I expected the same problem to raise once we implement more versionFilter type such as pep440 or calver

@dduportal
Copy link
Contributor

So regarding "19.0.0-alpine" is a valid a semantic version where "--alpine" is the prerelease information
As per https://semver.org/#spec-item-9

It is not because -alpine is NOT a prerelease. Because 19.0.0.0-alpine is not indicating that it is priori to 19.0.0".
"Sem" stands for "semantic" => it's not only a matter of matching a regex or not.

That is the reason why trying to use semver for things that are NOT semvers by exploiting edge cases of the implementation (the semver library used under the hood) is a dangerous path to walk. Particularly in a tool such as updatecli.

If we apply the same technique to a more restrive rule such as "^(\d*)-alpine$"

["17-alpine","18-alpine"] the lib parse them and consider them as ["17.0.0-alpine","18.0.0-alpine"] and then in the back we retrieve the original value.
["17-alpine","18-alpine"]

Maybe it solves your problem on short term for your use case. But it is definitively a lie to the end user. At least add a LOT of INFO line to tell the user how it is sorted. The risk of confusion between 19-alpine and 19.0.0-alpine for instance.

If we apply the same technique to a more restrive rule such as "^(\d*)-alpine$"

the prefiltering is a good first step. I wonder if it should be exposed to the user or not. WDYT about doing it as an implementation detail (not specified by the end user)?


I wonder if we could not do the sorting ourselves after the prefilter (based on your idea)

@olblak
Copy link
Member Author

olblak commented Nov 16, 2022

the prefiltering is a good first step. I wonder if it should be exposed to the user or not. WDYT about doing it as an implementation detail (not specified by the end user)?

I couldn't definitely use it as an implementation detail for now, so not exposed via the manifest. To delay the decision for later but in the end I think it will be useful for other resources too. So it could make sense to move from Dockerimage to versionfilter.

Maybe it solves your problem on short term for your use case. But it is definitively a lie to the end user. At least add a LOT of INFO line to tell the user how it is sorted. The risk of confusion between 19-alpine and 19.0.0-alpine for instance.

Hence the purpose of pref

The purpose of the autodiscovery is to catch as much as cases as possible but it's impossible to cover all of them, hence the importance to be able to optout depending on rules, such as docker image, docker-compose file,etc...

I wonder if we could not do the sorting ourselves after the prefilter (based on your idea)
I also wondering if it's worthwhile

@olblak
Copy link
Member Author

olblak commented Apr 21, 2023

I don't think it will be possible to automatically detect the right versionfilter to use.
Recently I experimented with allowing to override the default versionfilter in the golang autodiscovery module and it works quite well

https://www.updatecli.io/docs/plugins/autodiscovery/golang/

I am planning to update the other autodiscovery plugins with that approach.
Depending on the plugin I'll use different default version filter kind

@olblak
Copy link
Member Author

olblak commented Aug 4, 2023

Closing for now, the situation already improved

@olblak olblak closed this as completed Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants