Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteration on parseHostSpec refactor #4954

Merged
merged 15 commits into from Jan 10, 2023

Conversation

KnVerey
Copy link
Contributor

@KnVerey KnVerey commented Jan 4, 2023

@annasong20 this iterates on your great work in #4932. I find this code difficult to wrap my head around without diving in and working with it (hopefully we're now fixing that step by step!). Feel free to review this PR or to cherry-pick my commit onto your branch (in which case we can continue on #4932), whatever is easier for you. I have more ideas for RepoSpec, but I'll make follow-up PRs for them. This attempts to keep the same scope as your PR (additional test cases aside).

Functional changes:

  • Emit a deprecation warning when we strip the git:: prefix. This prefix originally signalled (to go-getter) that the git protocol should be used rather than whatever would be used by default based on the rest of the URL. However, Kustomize has been stripping it unconditionally since v2.0. So Kustomize has long since not being doing what the user intended.
  • Drop support for the gh: prefix. We were unable to find any basis for this (it is not part of the protocol, nor any other standard), nor any usage at all in public Kustomization files. We believe it may have been targeting someone's custom gitconfig shorthand syntax.

Part of #4866
Fixes #4847

/cc @annasong20 @natasha41575

@k8s-ci-robot
Copy link
Contributor

@KnVerey: This PR has multiple commits, and the default merge method is: merge.
You can request commits to be squashed using the label: tide/merge-method-squash

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 4, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KnVerey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 4, 2023
api/internal/git/repospec.go Outdated Show resolved Hide resolved
return scheme, n
}

username, n := parseUsername(n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not always immediately obvious how n is changing throughout this code, so some communication that parts of n are being removed as we parse through it may be helpful. Wydt about renaming this to trimUsername (and renaming the above parseScheme to trimScheme)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that a lot!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the existing occurrences of Trim[Thing] in this file, I think we have three operations we can distinguish:

  • TrimThing(input string)(rest string) (or TrimThing(input string)(rest string, trimmed bool)) -- "trim" is used for this in stdlib functions too, like strings.TrimPrefix, which also removes a known quantity and returns the result (not the thing being trimmed)
  • ParseThing(input string)(thing string) -- returns the target without reference to the remainder of the input
  • ??Thing(input string)(thing string, rest string) <-- this case. Since "trim" usually removes the thing in question without returning it, I'm leaning towards a third verb here. I'm thinking "extract" but I'm not set on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract sgtm!

if strings.HasPrefix(scheme, "ssh://") || username == gitUsername {
return normalizedSCPGithub
}
return normalizedHTTPGithub
}

func normalizeGitHostSpec(host string) string {
s := strings.ToLower(host)
if strings.Contains(s, "github.com") {
Copy link
Contributor Author

@KnVerey KnVerey Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One changed code path is that gh: could previously end up here, so gh:github.com/org/repo/foo would get normalized... but that's non-sensical based on Anna's comment that gh: is assumed to be a gitconfig shorthand that gets swapped for the github URL, right? (same with file://, which is definitely non-sensical)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that comment to the best of my knowledge. I couldn't find any official documentation for gh: on the internet, other than Github CLI. I also tried looking at code history to no avail. The only sites it popped up on denoted a .gitconfig shorthand: https://gist.github.com/lauhakari/a48a94b60dc66f6c84e2#file-gitconfig-L143

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. What a mystery! I tried looking at the original PR too, and it doesn't help much... but it definitely shows the "url" is expected to look like gh:org/repo. So I think it is reasonable to consider gh:github.com non-sensical.

Alternatively, maybe we can remove support for this. Your research indicates this is not a standard, and we do not have it documented anywhere. Furthermore, I don't see ANY usage of it in public Kustomizations on Github: https://github.com/search?q=%22gh%22+filename%3AKustomization.yaml&type=Code&ref=advsearch&l=&l=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be down to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@annasong20 annasong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code paths are so easy to read! Please feel free to ignore the nits.

api/internal/git/repospec.go Show resolved Hide resolved
}

func findPathSeparator(hostPath string, isSCP bool) int {
if strings.Contains(hostPath, "://") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small chance that the url will contain ://, but not to denote the scheme. According to RFC 3986, uri paths can contain :. The // could be delimit the path inside the repo or maybe it's an empty path segment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good point. This was previously handled by gating all the path parsing blocks on having already parsed a recognized scheme. I'll try validating the username+scheme combo ahead of this function so that it won't need this weird check.

return normalizedGithubHost(scheme, username), rest
}

sepIndex := findPathSeparator(n, username == gitUsername)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this to scheme == "" && username != ""?

  1. If scheme is ssh:// and username is gitUsername, we'd be wrongly passing true.
  2. If we decide to allow other usernames, we'd be wrongly passing false.

nit: Maybe we could define an isSCP variable in extractHost? Then, we could pass it to functions like normalizedGithubHost, findPathSeparator, and validHostSpecParsed. This would help limit references to gitUsername in helper functions for when we generalize allowed usernames.

func validHostSpecParsed(scheme, username, host string) bool {
isSCP := username == gitUsername
return len(host) > 0 &&
(scheme != "" || isSCP) // scheme may be blank for URLs like git@github.com:org/repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we change this comment to a non-github example since github urls would have been already been returned by the normalizedGithubHost code path?

if strings.HasPrefix(scheme, "ssh://") || username == gitUsername {
return normalizedSCPGithub
}
return normalizedHTTPGithub
}

func normalizeGitHostSpec(host string) string {
Copy link
Contributor

@annasong20 annasong20 Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concern for future PR: I find normalizeGitHostSpec hard to reason about because it checks for target strings out of context (with strings.Contains). For example, the repo url ssh://git@ssh.github.com:443/YOUR-USERNAME/YOUR-REPOSITORY.git with the following use case: https://docs.github.com/en/authentication/troubleshooting-ssh/using-ssh-over-the-https-port

would be normalized to git@github.com:YOUR-USERNAME/YOUR-REPOSITORY.git, which might come as a surprise to the user.

Maybe we could reuse some of the logic in the existing extractHost to replace the call to normalizeGitHostSpec in parseGitURL.

At the very least, I'm glad we no longer use this function in extractHost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my next PR is getting rid of this entirely in favour of handling all host parsing with extractHost! That's a great example of changed behaviour in this PR though, and I added it as a test.

},
},
{
name: "complex github ssh url from docs",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this on master shows that the current result is even worse than you thought. Here's what passes there:

		{
			name:      "complex github ssh url from docs",
			input:     "ssh://git@ssh.github.com:443/YOUR-USERNAME/YOUR-REPOSITORY.git",
			cloneSpec: "ssh://git@ssh.git",
			absPath:   notCloned.String(),
			repoSpec: RepoSpec{
				Host:      "ssh://git@",
				OrgRepo:   "ssh",
				Path:      "",
				GitSuffix: ".git",
			},
		},

Totally mangled clonespec!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future PR: Maybe we could check that delimiters like .git are succeeded by a /: https://github.com/kubernetes-sigs/kustomize/blob/master/api/internal/git/repospec.go#L136? I believe this test case reveals that we're looking for such delimiters out of context.

},
},
{
name: "supported protocol after username (invalid and will be rejected by git)",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two cases have slightly different parsing results on master. Basically a slash moves. But the clonespec stays the same, and git will reject it anyway, so I don't think this matters a ton.

		{
			name:      "unsupported protocol after username",
			input:     "git@scp://github.com/org/repo.git//path",
			cloneSpec: "git@scp://github.com/org/repo.git",
			absPath:   notCloned.Join("path"),
			repoSpec: RepoSpec{
				Host:      "git@scp:/",
				OrgRepo:   "/github.com/org/repo",
				Path:      "/path",
				GitSuffix: ".git",
			},
		},
		{
			name:      "supported protocol after username",
			input:     "git@ssh://github.com/org/repo.git//path",
			cloneSpec: "git@ssh://github.com/org/repo.git",
			absPath:   notCloned.Join("path"),
			repoSpec: RepoSpec{
				Host:      "git@ssh:/",
				OrgRepo:   "/github.com/org/repo",
				Path:      "/path",
				GitSuffix: ".git",
			},
		},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not request for change: This test case made me ponder the prioritization of delimiters in parseGitURL. Currently, we always prioritize .git over //. Given that we advertise // as the repo path delimiter, we could alternatively look for .git, only if it isn't preceded by //. Otherwise, we could also document the .git delimiter more clearly in https://github.com/kubernetes-sigs/kustomize/blob/master/examples/remoteBuild.md.

Regardless, this'd be work for a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my next PR is tackling parseGitURL, so I'll consider this then.

"gh:org/repo",
"url lacks orgRepo",
},
"username unsupported with http": {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that all three of these would be accepted on master, so the errors here are a behaviour change. I'm not 100% sure about this yet; will ponder more before we merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I believe file git urls are technically allowed to have @ in their repo paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated the code and tests to allow it. As commented elsewhere, I also decided against this validation more generally.

@KnVerey
Copy link
Contributor Author

KnVerey commented Jan 7, 2023

Thank you for the excellent feedback @annasong20 . I've made some moderate changes in response. PTAL.

// although the git protocol (which is insecure and unsupported on many platforms, including Github)
// will not actually be used as intended.
func ignoreForcedGitProtocol(n string) string {
n, _ = trimPrefixIgnoreCase(n, "git::")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: we could deprecate this behaviour by printing a warning here now. It's probably not harmful, but technically speaking, the user is requesting that the git protocol be forced, and we're not doing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@KnVerey KnVerey added the release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. label Jan 9, 2023
Copy link
Contributor

@annasong20 annasong20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just 1 request for change. Sorry I didn't catch it the 1st round.

Everything else looks great!

Comment on lines +351 to 353
if colonIndex := strings.Index(hostPath, ":"); colonIndex > 0 && colonIndex < sepIndex {
sepIndex = colonIndex
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return -1 as the index otherwise? SCP-like urls don't accept / as a separator even if there isn't a : (not preceded by a /). git fetch will fail for such a case.

The error message here: https://github.com/kubernetes-sigs/kustomize/pull/4954/files#diff-2a7284bc651dfb0465d49f2270ed590fce191d14bba769751e4d16d9184a198cR97 should be url lacks host.

This change will require some accompanying changes in extractHost for github.com/path/to/repo.git to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When thinking about this and the question of whether to throw an error on http(s)+username, I came back to this question: what is the point of all this parsing code? In the end, we have a URL that we're going to pass to git, and it will throw an error. Why don't we always let it do its job?

I wasn't here for the original authorship, but my answer would be: because we support some invalid formats (e.g. the git:: prefix we strip, github URLs we "fix") and the URLs we are given ultimately point to a Kustomization root, whereas git needs the repo root. So the point is to identify the repo root, i.e the clonespec, and to fix any oddities we specifically allow for historical reasons.

That led me to conclude that we should only throw errors in this code when the anomaly prevents us from achieving those two goals. When we are able to parse the intended clone spec (valid or not), we should do so, and let git throw the error (if any).

With that in mind, I think it is clear that the http(s)+username case should be allowed to fall through to git, and I've restored that behaviour. I think that is also true for the case you're highlighting here: we've found a schemeless, non-Github url that starts with a username and is missing the colon that should delimit the host. The following test passes on master, and currently on this branch:

		{
			name:      "non-github_scp incorrectly using slash (invalid but currently passed through to git)",
			input:     "git@bitbucket.org/company/project.git//path?ref=branch",
			cloneSpec: "git@bitbucket.org/company/project.git",
			absPath:   notCloned.Join("path"),
			repoSpec: RepoSpec{
				Host:         "git@bitbucket.org/",
				RepoPath:     "company/project",
				KustRootPath: "/path",
				Ref:          "branch",
				GitSuffix:    ".git",
			},
		},

I think we should continue to let it pass though.

After reading the above, do you agree? Are there other cases to consider around SCP-like with missing colons?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense! Thank you for writing such a great explanation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially hoped looking for : before / would decrease our chances of mistaking a relative local file path that contains an @ in the first directory for a repo url, but I realized that users can prefix such file paths with . to look like ./path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, and also the "a relative local file path that contains an @ in the first directory" scenario is likely vanishingly rare, so I'm not too worried.

@@ -226,10 +226,6 @@ func TestLocRootPath_URLComponents(t *testing.T) {
urlf: "file:///var/run/repo//%s?ref=value",
path: simpleJoin(t, FileSchemeDir, "var", "run", "repo", "value"),
},
"gh_shorthand": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing this!

"gh:org/repo",
"url lacks orgRepo",
},
"username unsupported with http": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it helps, I believe file git urls are technically allowed to have @ in their repo paths.

Comment on lines 95 to 98
"bad_scp": {
"git@local/path:file/system",
"url lacks repoPath",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into TestNewRepoSpecFromUrl_Smoke? This test originally checked that extractHost failed on not being able to find a host delimiter because : is behind a /. Now that we use / as the delimiter instead, this test no longer makes sense.

If we still want to document the behavior that we only detect :s not preceded by /, the TestNewRepoSpecFromUrl_Smoke test would look something like this:

		{
			name:      "colon behind slash not scp delimiter",
			input:     "git@gitlab.com/user:name/YOUR-REPOSITORY.git",
			cloneSpec: "git@gitlab.com",
			absPath:   notCloned.String(),
			repoSpec: RepoSpec{
				Host:      "git@gitlab.com/",
				OrgRepo:   "user:name/YOUR-REPOSITORY",
				Path:      "",
				GitSuffix: ".git",
			},
		},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, that's much better! Updated.

@annasong20
Copy link
Contributor

/lgtm

This is awesome, @KnVerey!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2023
@k8s-ci-robot k8s-ci-robot merged commit da5d572 into kubernetes-sigs:master Jan 10, 2023
@KnVerey KnVerey deleted the more_repospec branch January 10, 2023 21:39
@KnVerey
Copy link
Contributor Author

KnVerey commented Jan 10, 2023

🎉 thank you for the diligent reviews @annasong20 !

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Apr 15, 2023
v1.27.1

No CLI change.

v1.27.0

API Change

* Adds feature gate NodeLogQuery which provides cluster administrators with
a streaming view of logs using kubectl without them having to implement a
client side reader or logging into the node.

Feature

* Added "general", "baseline", and "restricted" debugging profiles for
kubectl debug.
* Added "netadmin" debugging profiles for kubectl debug.
* Added --output plaintext-openapiv2 argument to kubectl explain to use old
openapiv2 explain implementation.
* Added e2e tests for kubectl --subresource for beta graduation
* Changed kubectl --subresource flag to beta (#116595, @MadhavJivrajani)
* Enable external plugins can be used as subcommands for kubectl create
command if subcommand does not exist as builtin only when
KUBECTL_ENABLE_CMD_SHADOW environment variable is exported.
* Kubectl will now display SeccompProfile for pods, containers and
ephemeral containers, if values were set.
* Kubectl: added e2e test for default container annotation
* Made kubectl-convert binary linking static (also affects the deb and rpm
packages).
* Promoted whoami kubectl command.
* Since Kubernetes v1.5, kubectl apply has had an alpha-stage --prune flag
to support deleting previously applied objects that have been removed from
the input manifest. This feature has remained in alpha ever since due to
performance and correctness issues inherent in its design. This PR exposes
a second, independent pruning alpha powered by a new standard named
ApplySets. An ApplySet is a server-side object (by default, a Secret;
ConfigMaps are also allowed) that kubectl can use to accurately and
efficiently track set membership across apply operations. The format used
for ApplySet is set out in KEP 3659 as a low-level specification. Other
tools in the ecosystem can also build on this specification for improved
interoperability. To try the ApplySet-based pruning alpha, set
KUBECTL_APPLYSET=true and use the flags --prune --applyset=secret-name with
kubectl apply.
* Switched kubectl explain to use OpenAPIV3 information published by the
server. OpenAPIV2 backend can still be used with the --output
plaintext-openapiv2 argument
* Upgrades functionality of kubectl kustomize as described at
https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv5.0.0
and
https://github.com/kubernetes-sigs/kustomize/releases/tag/kustomize%2Fv5.0.1.
This is a new major release of kustomize, so there are a few
backwards-incompatible changes, most of which are rare use cases, bug fixes
with side effects, or things that have been deprecated for multiple
releases already:

 - kubernetes-sigs/kustomize#4911: Drop support for a very old, legacy
 style of patches. patches used to be allowed to be used as an alias for
 patchesStrategicMerge in kustomize v3. You now have to use
 patchesStrategicMerge explicitly, or update to the new syntax supported by
 patches. See examples in the PR description of
 kubernetes-sigs/kustomize#4911.
 - kubernetes-sigs/kustomize#4731: Remove a potential build-time
 side-effect in ConfigMapGenerator and SecretGenerator, which loaded values
 from the local environment under some circumstances, breaking kustomize
 build's side-effect-free promise. While this behavior was never intended,
 we deprecated it and are announcing it as a breaking change since it
 existed for a long time. See also the Eschewed Features documentation.
 - kubernetes-sigs/kustomize#4985: If you previously included .git in an
 AWS or Azure URL, we will no longer automatically remove that suffix. You
 may need to add an extra / to replace the .git for the URL to properly
 resolve.
 - kubernetes-sigs/kustomize#4954: Drop support for using gh: as a
 host (e.g. gh:kubernetes-sigs/kustomize). We were unable to find any usage
 of or basis for this and believe it may have been targeting a custom
 gitconfig shorthand syntax.

* [alpha: kubectl apply --prune --applyset] Enabled certain custom
resources (CRs) to be used as ApplySet parent objects. To enable this for a
given CR, apply the label applyset.kubernetes.io/is-parent-type: true to
the CustomResourceDefinition (CRD) that defines it.
* kubectl now uses HorizontalPodAutoscaler v2 by default.

Documentation

* kubectl create rolebinding -h

Bug or Regression

* Added (dry run) and (server dry run) suffixes to kubectl scale command
when dry-run is passed
* Changed the error message of kubectl rollout restart when subsequent
kubectl rollout restart commands are executed within a second
* Changed the error message to cannot exec into multiple objects at a time
when file passed to kubectl exec contains multiple resources
* Kubectl: enabled usage of label selector for filtering out resources when
pruning for kubectl diff
* kubectl port-forward now exits with exit code 1 when remote connection is
lost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RepoSpec parses non-github ssh urls incorrectly
4 participants