Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update http.Transport if it already exists in ExecProvider #66395

Merged

Conversation

awly
Copy link
Contributor

@awly awly commented Jul 19, 2018

What this PR does / why we need it:
This unbreaks ExecPlugin. Without the change, we hit this error
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/transport/transport.go#L32

Release note:

Fix kubelet startup failure when using ExecPlugin in kubeconfig

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 19, 2018
@awly
Copy link
Contributor Author

awly commented Jul 19, 2018

/assign @mikedanese
/cc @liggitt

@liggitt
Copy link
Member

liggitt commented Jul 19, 2018

/hold
this reverts the fix in #63492 and removes the ability to tear down kubelet connections on heartbeat failure. it's fine if we want to compose these two things differently, but we cannot reintroduce the kubelet TCP heartbeat hang issue.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 19, 2018
@awly awly force-pushed the fix-kubelet-exec-plugin-startup branch from 5262e8e to 4471282 Compare July 19, 2018 21:50
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 19, 2018
@awly
Copy link
Contributor Author

awly commented Jul 19, 2018

@liggitt good point, not sure how i forgot about that :|

Changed to handle existing Transport in ExecProvider and use stdlib GetCertificate version.
WDYT?

@awly awly changed the title Only call kubeletcertificate.UpdateConfig if cert rotation is enabled Update http.Transport if it already exists in ExecProvider Jul 19, 2018
cert, err := getCert()
if err != nil {
return nil, err
if t, ok := c.Transport.(*http.Transport); ok && t != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do when run with --v=6? might need a func TransportFor(transport http.RoundTripper) (*http.Transport, error) helper in pkg/util/net to unwrap nested debugging transports (see DialerFor as an example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested, --v=6 works fine.
But I agree with your point, someone can introduce a wrapper eventually, breaking this.
Added a TransportFor helper.

}
getCert := t.TLSClientConfig.GetClientCertificate
t.TLSClientConfig.GetClientCertificate = func(hi *tls.CertificateRequestInfo) (*tls.Certificate, error) {
// If previous GetCert is present and returns a valid non-nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to think through the implications if you use client cert rotation and an exec plugin... the client cert rotation code would always win. does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we checking if the returned cert is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we checking if the returned cert is valid?

refreshCredsLocked does validation. It fetches the actual certificate, validates and caches it. GetClientCertificate returns the cached copy.
I added cert expiry validation too in this PR, just in case.

trying to think through the implications if you use client cert rotation and an exec plugin... the client cert rotation code would always win. does that make sense?

Is cert rotation and exec plugin a valid configuration? Does it mean "fetch initial cert via plugin but then rotate manually"?
Our use case relies on exec plugin to bootstrap a fresh cert whenever needed and cert rotation in kubelet is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshCredsLocked does validation.

I was referring to what the previous GetCert was returning ("If previous GetCert is present and returns a valid non-nil..."), not what the exec plugin returned

Is cert rotation and exec plugin a valid configuration?

I don't know, but this is allowing it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to what the previous GetCert was returning ("If previous GetCert is present and returns a valid non-nil..."), not what the exec plugin returned

Ah, gotcha. I don't think it's this plugin's job to check that. If something returns a certificate here, it better be valid. Otherwise it's a bug.

I'd rather leave it as is, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sanity of the next person who tries to debug this, I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

that seems like a reasonable starting point. mixing the two gets weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, made it an error in both cases

if getCert != nil {
cert, err := getCert(hi)
if err != nil {
return nil, err
Copy link
Member

@liggitt liggitt Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we want to check the exec plugin in an error case? (I know this is pre-existing, just trying to understand the implications and whether they are different if we're dealing with an existing transport vs a GetCert func)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, added.
It's an unlikely configuration, but doesn't hurt.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 20, 2018
@awly
Copy link
Contributor Author

awly commented Jul 20, 2018

@liggitt thanks for the comments, PTAL

@awly
Copy link
Contributor Author

awly commented Jul 20, 2018

FYI, I'd like to cherry-pick this into 1.11.2.

Copy link
Member

@mikedanese mikedanese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest client keeps on giving. What if we added a WrapDialer field to the restclient.Config and used that to add the connection tracker? Would that be simpler?


// TransportFor returns the root *http.Transport behind all wrapper
// RoundTrippers. Wrappers must implement RoundTripperWrapper.
func TransportFor(transport http.RoundTripper) (*http.Transport, error) {
if transport == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shifts the nil check from every caller to here. transport.Config.Transport can be nil with valid config, afaik (just not in kubelet)

}
getCert := t.TLSClientConfig.GetClientCertificate
t.TLSClientConfig.GetClientCertificate = func(hi *tls.CertificateRequestInfo) (*tls.Certificate, error) {
// If previous GetCert is present and returns a valid non-nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sanity of the next person who tries to debug this, I think it would be better to return an error if GetClientCertificate is not nil. That would effectively disallow client cert rotation to be used with the exec plugin, right? I'd be happy to avoid this complexity.

if certBlock == nil {
return errors.New("can't parse client certificate returned by exec plugin as PEM block")
}
x509Cert, err := x509.ParseCertificate(certBlock.Bytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the cert on line 402 so you don't have to parse the pem twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean manually parse ClientKeyData and build tls.Certificate via a literal?
It feels like more work and easy to miss some validations that tls.X509KeyPair does

if err != nil {
return fmt.Errorf("failed parsing x509 client certificate returned by exec plugin: %v", err)
}
if time.Now().After(x509Cert.NotAfter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we already checking time client-side elsewhere? this makes us vulnerable to client clock skew

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we're checking it elsewhere. It should be kube-apiserver's responsibility to verify cert validity on incoming requests. Could you clarify how this makes us vulnerable?

@awly awly force-pushed the fix-kubelet-exec-plugin-startup branch from 97ab95f to 527b23b Compare July 25, 2018 23:11
@awly
Copy link
Contributor Author

awly commented Jul 25, 2018

@liggitt @mikedanese PTAL. Turns out we can just set restclient.Config.Dialer instead of Transport. This fixes the issue in my test cluster.

}
return a.cert()
if c.TLS.GetCert != nil {
return errors.New("can't add TLC certificate callback: transport.Config.TLS.GetCert already set")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/TLC/TLS/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@awly awly force-pushed the fix-kubelet-exec-plugin-startup branch 2 times, most recently from e619672 to 7293fab Compare July 25, 2018 23:23
Instead of Transport. This fixes ExecPlugin, which fails if
restclient.Config.Transport is set.
@awly awly force-pushed the fix-kubelet-exec-plugin-startup branch from 7293fab to 3357b5e Compare July 25, 2018 23:24
@mikedanese
Copy link
Member

I'm reasonably happy with this.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2018
@awly
Copy link
Contributor Author

awly commented Jul 25, 2018

Moved cert expiry check out to #66632

@mikedanese
Copy link
Member

/retest

1 similar comment
@mikedanese
Copy link
Member

/retest

@liggitt
Copy link
Member

liggitt commented Jul 26, 2018

/lgtm
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 26, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awly, liggitt, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 26, 2018
@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit cef2d32 into kubernetes:master Jul 26, 2018
@awly awly deleted the fix-kubelet-exec-plugin-startup branch July 26, 2018 17:51
k8s-github-robot pushed a commit that referenced this pull request Jul 31, 2018
…pstream-release-1.11

Automatic merge from submit-queue.

Automated cherry pick of #66395: Set connrotation dialer via restclient.Config.Dialer

Cherry pick of #66395 on release-1.11.

#66395: Set connrotation dialer via restclient.Config.Dialer
@marpaia
Copy link
Contributor

marpaia commented Jul 31, 2018

/sig cli

@k8s-ci-robot k8s-ci-robot added the sig/cli Categorizes an issue or PR as relevant to SIG CLI. label Jul 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants