Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added relative git URL support #333

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

oddko
Copy link

@oddko oddko commented Jan 26, 2022

Added the ability to use relative URLs when adding git repos, e.g. instead of the main repo being:

https://github.com/org/someproject/repo.git

And its dependencies mentioned in CMake files as:

CPMAddPackage("gh:org/dependencies/dependency1")
CPMAddPackage("gh:org/someproject/dependency2")

Use relative URLs in the CPMAddPackage calls:

CPMAddPackage("rel:../../dependencies/dependency1")
CPMAddPackage("rel:../dependency2")

This can help with several things:

  • CI and private repos. CI runners usually pass their credentials/access tokens inside the URL passed to the git clone command, which makes these private repos inaccessible from the bare URL with no access token.
  • Mirroring your projects on another git hosting service (e.g. GitHub -> GitLab, etc.) can be done seamlessly without changing any of the code.

@OlivierLDff
Copy link
Contributor

Does this create a copy?

@oddko
Copy link
Author

oddko commented Jan 26, 2022

What do you mean by a copy ?
All it does is essentially resolve the absolute URL from the relative one you pass into it, the rest of the process remains identical to using a plain git URL.

@OlivierLDff
Copy link
Contributor

Ok so a git clone is happening that was my question

@iboB
Copy link
Member

iboB commented Jan 27, 2022

Hi. Thanks for the PR. I like this one. I've been thinking about something along these lines and I'm glad other have too.

I'm not sure about this approach though.

My first concern is that calling git for a package on each configuration will slow the config step considerably. I have not tested it though. How does this feature affect configuration times?

My thinking on the same problem involved allowing users to provide a "root" for their project with a separate function. Something in the lines of CPMProject(NAME myproj SOURCE "gh:myorg/myproj") and then, if this is provided, allow relative packages. This function can be extended to automatically get the root as per this PR. It can cache the result, too, and avoid externally calling git.

@TheLartians will have to weigh in here.

@oddko
Copy link
Author

oddko commented Jan 27, 2022

Hi, and thanks for you comment.

Performance-wise, calling git on my system takes about 12-14ms (time git remote). We could reduce the amount of calls to git by doing a regex match on the output of git remote -v if performance is a concern.

As for a configuration variable approach to specifying the root URL, I don't like it very much because I like my relative URLs to be as local as possible.

If we take my previous example and add a common "dependency of dependencies" (e.g. a generic observers library, a testing framework, ...) in the "dependencies" folder, other repos in that folder could simply refer to it via rel:../commondependency rather than specifying a full path all the way from the root of the org. (Not to mention if these relative URLs are supposed to use different git hostings as their root because you pulled from another absolute URL that itself pulls relative ones... ugh).

And theres always the matter of the less configuration to do on a repo the better, big partisan of "I clone and it just works™". That could however be solved by just caching the URL of the root repo (i.e. from CMAKE_SOURCE_DIR) on the first run as you said.

EDIT: Come to think of it the whole CACHE trick could be used to cache all URLs when resolved, even locally. Cache variables akin to "CPM_projectname_resolved_url" that store in a first-come first-served fashion the URL of the relative repo. This makes changing/re-resolving said URLs a matter of clearing the cache. I'll look in that direction.

@oddko
Copy link
Author

oddko commented Jan 27, 2022

Added said cache system, the resolved relative URLs are stored in internal cache variables titled "CPM_PACKAGE_name_RESOLVED_URL". Let me know what you think.

@TheLartians
Copy link
Member

TheLartians commented Jan 27, 2022

This is an interesting idea, I do like the concept of relative repo locations! On the other hand, I'm not really sure I'd recommend this to library authors, as it might be very confusing for users. One example that comes to my head would that forking such a library won't work, unless all other relative dependencies are forked as well (which may even be a good thing, though probably unintuitive).

I currently prefer the @iboB's suggestion of an explicit relative source, as it just seems more intuitive. Perhaps also with the current discussed option to infer the root from the current git remote.

Performance-wise I don't think getting the output of a simple git command is an issue, especially as we currently already use git status to check the state of cached dependencies.

@oddko
Copy link
Author

oddko commented Jan 28, 2022

Thanks for your comment !

One example that comes to my head would that forking such a library won't work, unless all other relative dependencies are forked as well (which may even be a good thing, though probably unintuitive).

EDIT: Now that I re-read your sentence, that makes sense yes. I could allow overriding locally the root URL that is used.

That is however not the case. The beauty of inferring it from the current source dir is that a dependency added from a full URL (e.g. "gh:org/repo") will use that as its root URL for its dependencies (so a call to CPMAddPackage("rel:...") would use https://github.com/org/repo.git as the root URL from that dependency). No need to fork the entire ecosystem of dependencies.

@oddko
Copy link
Author

oddko commented Jan 28, 2022

Added a way to override what is used as a base URL for relative URIs.
CPMSetRelativeUriBaseUrl to specify it explicitely (e.g. CPMSetRelativeUriBaseUrl("https://github.com/someorg"))
CPMSetRelativeUriBaseAuto to auto infer it from the current git repo (default behavior).

These are scope based, and simply define the variable CPM_RELATIVE_URI_BASE_URL (which can be used in its raw form if you wish to pass it in OPTIONSfor example).

Haven't added the way to use shorthands tho (e.g. gh:someorg). Should we perhaps extract it (the shorthand expansion code) from the CPMAddPackage function to be able to use it anywhere ?

@TheLartians
Copy link
Member

TheLartians commented Jan 29, 2022

Thanks for adding the option to explicitly set the base URL! I still have to think a bit about the benefit and potential consequences of this feature.

I'm not sure about the default behaviour being auto-inferring the URL. Say a project A sets the base URL explicitly using CPMSetRelativeUriBaseUrl() and then adds another project B, which itself relies on the default behaviour of automatically determining the base URL (and does not explicitly invoke CPMSetRelativeUriBaseAuto()). As B is now added in the scope of A it now uses A's base URL for its own dependencies.

In any case, if we move forward with this feature it would make sense to add some unit and an integration tests to make sure that it works as expected and we don't accidentally break it in the future.

Haven't added the way to use shorthands tho (e.g. gh:someorg). Should we perhaps extract it (the shorthand expansion code) from the CPMAddPackage function to be able to use it anywhere ?

That's probably a good idea if we intend to use a consistent URL syntax with different functions like this one.

CI and private repos. CI runners usually pass their credentials/access tokens inside the URL passed to the git clone command, which makes these private repos inaccessible from the bare URL with no access token.

Out of interest: does this work for GitHub actions and which privileges does that token hold? I would usually create a .gitconfig to globally override git URLs with an access token provided as a secret to the CI runner, instead of inferring it from the repo's URL.

@oddko
Copy link
Author

oddko commented Jan 30, 2022

I'm not sure about the default behaviour being auto-inferring the URL. Say a project A sets the base URL explicitly using CPMSetRelativeUriBaseUrl() and then adds another project B, which itself relies on the default behaviour of automatically determining the base URL (and does not explicitly invoke CPMSetRelativeUriBaseAuto()). As B is now added in the scope of A it now uses A's base URL for its own dependencies.

I don't know about making such a call mandatory for every scope, that'd make it impossible to override it without modifying/forking the project itself (when auto is used), although I don't know if such a use case would present itself. If you want to let the dependency keep "auto" mode, you can just pass an empty CPM_RELATIVE_URI_BASE_URL in options. When it comes to publicly available librairies I don't think relative URLs are really needed anyway.

Out of interest: does this work for GitHub actions and which privileges does that token hold? I would usually create a .gitconfig to globally override git URLs with an access token provided as a secret to the CI runner, instead of inferring it from the repo's URL.

I don't know about GitHub actions specifically (I'd have to check) but on GitLab runners it really is just that URL used to clone the repo (In a format like https://gitlab-ci-token:${ci-token}@gitlab.com/org/repo.git) which makes bare URLs to this repo non-functional and has come to bite me. I can retrieve the token that is used in the CI scripts, but I don't really want to have my projects bear CI-specific configuration options.

The idea I had to solve it through relative URLs is that it's actually a feature of git submodules as well (e.g. git submodule add ../../subdir/repo.git) that does infer from current remote of the repo that adds the submodule when you initialize it, which makes CI and using either ssh or https to authenticate on the private org a non issue. But obviously for DRY reasons when it comes to common dependencies that has its limits.

@oddko
Copy link
Author

oddko commented Feb 2, 2022

Upon more testing of that feature, turns out you cannot use "go up one directory" (..) via SSH (https://github.com/someorg/somerepo.git/../anotherrepo.git works, git@github.com:someorg/somerepo.git/../anotherrepo.git does not), so this likely means we'll have to do some lightweight URI parsing (Oh no!) if we are to support all protocols.

EDIT: actually fairly easy to do with the recent addition of cmake_path and its path normalizing abilities (https://cmake.org/cmake/help/git-stage/command/cmake_path.html#normal-path), however this would require bumping the minimum cmake version to 3.20, so either set it as the min version or gatekeep the feature behind having that version.

@TheLartians
Copy link
Member

I can retrieve the token that is used in the CI scripts, but I don't really want to have my projects bear CI-specific configuration options.

Sure, I can definitely see the use-case. Just to be clear though, what I suggested can be achieved by adding a single line to your .gitlab-ci.yml to use the runner's access token for any calls to your GitLab. E.g. the step

git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.com/".insteadOf "https://gitlab.com/"

Would result in git rewriting all GitLab URLs to use your CI's job token. Imo this feels nicer than using relative URLs as it's a more general solution and also works with other git based dependency management solutions.

When it comes to publicly available librairies I don't think relative URLs are really needed anyway.

I do wonder if the feature may be too niche then for its inclusion in the main script, especially if it's locked behind a 3.20 version requirement. I'd still be happy to be convinced otherwise though. @iboB what are your thoughts on this?

@oddko
Copy link
Author

oddko commented Feb 9, 2022

Sure, I can definitely see the use-case. Just to be clear though, what I suggested can be achieved by adding a single line to your .gitlab-ci.yml to use the runner's access token for any calls to your GitLab. E.g. the step

git config --global url."https://gitlab-ci-token:${CI_JOB_TOKEN}@gitlab.com/".insteadOf "https://gitlab.com/"

Would result in git rewriting all GitLab URLs to use your CI's job token. Imo this feels nicer than using relative URLs as it's a > more general solution and also works with other git based dependency management solutions.

I'd rather not tamper with the global git config of the machine when these are actual physical machines (Windows/MacOS runner) nor repeat that step in every job when they are not. Nice trick though will definitely keep it in mind.

I do wonder if the feature may be too niche then for its inclusion in the main script, especially if it's locked behind a 3.20 version requirement. I'd still be happy to be convinced otherwise though. @iboB what are your thoughts on this?

Fair enough, this mainly targets private orgs and a private fork of CPM can always be done to enable such feature.

@iboB
Copy link
Member

iboB commented Feb 9, 2022

First of all, apologies for the late response.

I agree that we should be very careful if we ad something like this. The room for error is big and fixing potential errors in a feature like this may lead to very nasty backwards compatibility issues.

Here's my thinking on the subject (and I realize that it's very different from the initial proposal)

  1. Introduce CPMSetRelativeUri
    • An expected value here would be something like https://github.com/myorg/
    • A value of DEFAULT for this field would mean "infer from current git url" and would lead to an URI which is one above the current git url (for example if the git url is https://github.com/myorg/myrepo.git, DEFAULT would lead to https://github.com/myorg/)
    • Not using CPMSetRelativeUri would mean that the value is unset.
    • CPMSetRelativeUri is unique (like CPMAddPackage) subsequent calls won't change an already set value
  2. Allow rel: uris in CPMAddPackage as per this proposal. This uses the value if set or produces an error if not set

This have the following benefits

  • In case of problems and need of tweaks, we can allow additional special arguments to CPMSetRelativeUri to allow better differentiation (say FORCE or REQUIRED...)
  • We can allow other sources for this value (for example hypothetical CPMPackage which can be used for package self-description/definition in a future where we have package repositiories)
  • We conservatively do not introduce any defaults for rel: as they may end-up being something we regret.
  • We may choose to deprecate and remove this relatively isolated functionality or add it to a hypothetical CPM future plugin and out of the main source
  • The defaults we do introduce seem better IMO. With the current proposal every single instance of rel would be followed by ..

@oddko
Copy link
Author

oddko commented Feb 12, 2022

Your idea isn't very different from my proposal (outside of the fact you'd want a mandatory call to a SetUri function to 'unlock' the feature).
As for the reduncancy of having to put "../" in front of every relative URI, while this may seem redundant it makes sense from a URI perspective and is consistent with how git submodules do it, so it wouldn't be very standard to cut it when auto inferring from the current git URL.

@itavero
Copy link

itavero commented Feb 17, 2022

Just my two cents. I'm not that experienced with CPM yet, but do have experience with other package managers and stuff like git submodules.

What we tend to do now on some Git environments (i.e. GitLab), is use relative git submodules, because of the simple fact that some systems authenticate over SSH, others over HTTPS and using relative paths to the other modules makes it work for everyone, regardless of how they access the Git server.

What I would expect is that if I depend on package X (via a relative path or not) and package X depends on Y via a relative path, that the path is resolved against the path to X (not against my own project).
And if Y where to depend on Z via a relative path, then it should resolve against the path of Y.

Keeping it similar to how relative paths for git submodules work, makes a lot of sense to me.

@ghost
Copy link

ghost commented May 12, 2022

The way you've done this, works well for my use case @oddko. Thanks for your work, I've started putting it to use.

I understand that this particular implementation isn't ideal for CPM itself, but I definitely think relative paths should be incorporated. As @itavero says, it's a very useful feature of git submodules (encouraged by GitLab) and makes a lot of sense for CPM to implement it in the same way.

In my opinion it's much cleaner than @TheLartians idea editing the git config file to manually implement a specific authentication method. Because, it's a much more general solution that is already used by git submodules.

@itavero
Copy link

itavero commented May 12, 2022

In my opinion it's much cleaner than @TheLartians idea editing the git config file to manually implement a specific authentication method. Because, it's a much more general solution that is already used by git submodules.

I agree on that part. I'd rather keep things consistent with how we are used to doing them with git submodules (of course, I can imagine that not everyone is used to doing that).

I also wonder how something like CPMSetRelativeUri would work cleanly with dependencies of dependencies. As mentioned I believe that if dependency X is included by dependency Y with a relative URL, it should be resolved against the URL of dependency Y.
As per my understanding from previous comments, it would only be allowed to call it once, so if that's the case, I don't see how that would work properly.

One thing I did think of is that we tend to reconfigure our CMake project multiple times a day (we write software for different microcontrollers and run the automated tests on our PCs, so we switch between different toolchain files).
Caching the URL result in CMake works fine if you don't reconfigure your project, but for us I think it might also help to cache to a file inside the source directory.
Initially I though the package lock file would make sense for this, but that actually does not work, as you probably want to keep that in your Git repository as well and then the paths are no longer determined correctly if you'd clone the repository via a different URL (SSH instead of HTTPS for instance).

@oddko
Copy link
Author

oddko commented May 12, 2022

As per my understanding from previous comments, it would only be allowed to call it once, so if that's the case, I don't see how that would work properly.

In my PR I intended it to be scope-based (calling the function works for the project itself and any that it adds). I'm not extremely fond of it either but it can help to bridge into an "ecosystem" of repositories that reference each other relatively and that live on a different hosting service (be it gitlab -> github, etc.).

As for reconfiguring CMake, do you mean by clearing the cache first ? In which case indeed it will be dropped and resolved by calling git once again, but when it comes to reconfiguring by clearing the cache I don't think that this particular 15ms or so git operation is going to be the performance bottleneck.

@itavero
Copy link

itavero commented Aug 15, 2022

Just curious what exactly is preventing this PR from being merged: is there anything that still needs consensus? If so, can we make those things explicit so they can be discussed?

We are starting a new product platform soon and I'd really like to make CPM.cmake a part of our toolbox for that. My fellow engineers would greatly appreciate this feature.

@oddko
Copy link
Author

oddko commented Aug 15, 2022

Probably some coding-style (according to the CI), tests (would need to look into how they work) and making sure the specs are well defined and not prone to issues later down the line that'd break them. Some improvements could be made also (For example changing the URL of a module requires a CMake cache clearing as I cache the resolved URLs to avoid too many calls to git commands). There is also the fact that it needs a more recent CMake version (3.20+) since I use cmake_pathto normalize URLs (mandatory for ssh URLs). The script can still be used with older CMake versions (I throw a warning when relative paths are used in that case) but I'm not sure how the maintainers feel about introducing a feature-availability grid depending on the version of CMake used.

What we've down at our company is maintain a fork, publish releases in the same way as CPM and using a different URL in the get_cpm.cmake script.

@oddko
Copy link
Author

oddko commented Aug 21, 2022

Fixed style & URLs are re-resolved if the relative URI changes (no need to clear the cache anymore).

I guess we should now write some tests for these. Do you have any preference or guide into how those should be written ?

@TheLartians
Copy link
Member

Thanks for the updates!

Honestly, I'm still hesitant to add this feature as I can see some nasty unexpected issues appearing, e.g. when users fork or attempt to vendor a project with relative URLs. If we could somehow make the solution resistant to such cases I would feel much better (e.g. we could default to a valid URL unless a specific flag is passed to CMake that activates relative URLs for a specific schema). I'm aware that this would also break some of the elegance of the suggested approach.

Obviously there is some demand for such a feature (and git submodules support something similar), but as said above we should consider the implications carefully when adding such a change.

In any case, there are still two more changes that the PR would require:

  • Documentation: the feature and it's use-case or potential pitfalls should be mentioned in the Readme.
  • Tests: an integration test would definitely make sense, and it could also be a good idea to add unit tests for simple routines to guarantee that this feature continues working as expected in the future. For unit tests, simply add a CMake file to the unit tests directory. See here for an example unit-test.

@itavero
Copy link

itavero commented Aug 29, 2022

I think some guidelines on how/when this should be used would also be wise.

For my use case it would exclusively be used in private repositories. We have these on GitLab, with a tree-like structure created using subgroups. Even if we would use the short hand notation (gl:), we would still often have to type a lot of redundant stuff due to how the repositories are organized.

For public repositories, given that the dependencies of that repo must also all be public, I think the "short hand" URLs would be sufficient for that most of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants