Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mTLS does not work when kubelet does not listen on 127.0.0.1 #27583

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

weizhoublue
Copy link
Contributor

@weizhoublue weizhoublue commented Aug 18, 2023

bug description

with latest version v1.14.1, when kubelet listens on a specified local ip address but not 0.0.0.0, spire agent will fail to contact local kubelet with 127.0.0.1:10250, and mTLS feature does not work at all.

some log on issue enviroment:

when node has multiple network interface, kubelet listens on a specified local address 172.16.1.11 but not 0.0.0.0


[root@master1 ~]# cat /etc/kubernetes/kubelet-config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
nodeStatusUpdateFrequency: "10s"
failSwapOn: True
authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: True
  x509:
    clientCAFile: /etc/kubernetes/ssl/ca.crt
authorization:
  mode: Webhook
staticPodPath: /etc/kubernetes/manifests
cgroupDriver: systemd
containerLogMaxFiles: 5
containerLogMaxSize: 10Mi
maxPods: 110
podPidsLimit: -1
address: 172.16.1.11
readOnlyPort: 0
healthzPort: 10248
healthzBindAddress: 127.0.0.1
kubeletCgroups: /system.slice/kubelet.service
clusterDomain: cluster.local
protectKernelDefaults: true
rotateCertificates: true
clusterDNS:
- 172.26.0.3
resolvConf: "/run/systemd/resolve/resolv.conf"
eventRecordQPS: 5
shutdownGracePeriod: 60s
shutdownGracePeriodCriticalPods: 20s

[root@master1 ~]# ss -lntp
LISTEN          0               4096                         172.16.1.11:10250                          0.0.0.0:*              users:(("kubelet",pid=1625,fd=22))

the spire agent fails to contact kubelet of its local node

[root@master1 ~]# kubectl logs -n cilium-spire   spire-agent-pjpsb
Defaulted container "spire-agent" out of: spire-agent, init (init)
time="2023-08-18T08:49:12Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Internal desc = workloadattestor(k8s): unable to perform request: Get \"https://127.0.0.1:10250/pods\": dial tcp 127.0.0.1:10250: connect: connection refused" pid=8491 subsystem_name=workload_attestor
time="2023-08-18T08:49:13Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Internal desc = workloadattestor(k8s): unable to perform request: Get \"https://127.0.0.1:10250/pods\": dial tcp 127.0.0.1:10250: connect: connection refused" pid=6845 subsystem_name=workload_attestor
time="2023-08-18T08:49:13Z" level=error msg="no identity issued" method=SubscribeToX509SVIDs service=spire.api.agent.delegatedidentity.v1.DelegatedIdentity subsystem_name=debug_api


[root@master1 ~]# kubectl logs -n kube-system    cilium-l5dk4
level=error msg="Error in delegate stream, restarting" error="rpc error: code = PermissionDenied desc = no identity issued" subsys=spire-delegate


it could not find any SPIFFE ID with following command , and mTLS could not work at all

kubectl exec -n cilium-spire spire-server-0 -c spire-server -- /opt/spire/bin/spire-server entry show -selector cilium:mutual-auth

how to fix

referring to spire doc and spire code , spire support to specify the kubelet address with an enviroment named "MY_NODE_NAME"

no matter what the local address listened by the kubelet, the spire agent could succeed to contact its local kubelet with status.hostIP

with this fix, it works well

Fix behavior where SPIRE doesn't work when kubelet does not listen on 127.0.0.1

@weizhoublue weizhoublue requested review from a team as code owners August 18, 2023 10:39
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 18, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Aug 18, 2023
@weizhoublue weizhoublue force-pushed the fix/welan/spire branch 2 times, most recently from 2cc71e6 to 8e60658 Compare August 18, 2023 10:53
@weizhoublue weizhoublue changed the title fix: mTLS does not work when kubelet does not listen on 0.0.0.0 fix: mTLS does not work when kubelet does not listen on 127.0.0.1 Aug 18, 2023
@squeed squeed requested a review from meyskens August 20, 2023 03:02
@squeed
Copy link
Contributor

squeed commented Aug 20, 2023

Nice fix!

@squeed squeed added the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Aug 20, 2023
@squeed
Copy link
Contributor

squeed commented Aug 20, 2023

Just marking as do-not-merge until @meyskens has had a chance to review.

@weizhoublue
Copy link
Contributor Author

@meyskens done

Copy link
Member

@meyskens meyskens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@meyskens meyskens removed the dont-merge/bad-bot To prevent MLH from marking ready-to-merge. label Aug 21, 2023
@meyskens
Copy link
Member

/test

@meyskens meyskens added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Aug 21, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 21, 2023
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
@weizhoublue
Copy link
Contributor Author

it looks like the PR has no business making the 'ConformanceGatewayApi' CI fail ?

https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819

I will try to rebase the main branch

=== CONT  TestConformance/HTTPRouteRequestMirror/0_request_to_'/mirror'_should_go_to_infra-backend-v1
[1624](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1625)
    httproute-request-mirror.go:69: Making GET request to http://172.18.255.200/mirror
[1625](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1626)
    http.go:222: Response expectation failed for request: {URL: {Scheme:http Opaque: User: Host:172.18.255.200 Path:/mirror RawPath: OmitHost:false ForceQuery:false RawQuery: Fragment: RawFragment:}, Host: , Protocol: HTTP, Method: GET, Headers: map[X-Echo-Set-Header:[]], UnfollowRedirect: false, Server: , CertPem: <truncated>, KeyPem: <truncated>}  not ready yet: expected status code to be 200, got 404 (after 3.1µs)
[1626](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1627)
    http.go:228: Request passed
[1627](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1628)
    mirror.go:41: Searching for the mirrored request log
[1628](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1629)
    mirror.go:42: Reading "gateway-conformance-infra/infra-backend-v2" logs
[1629](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1630)
panic: runtime error: invalid memory address or nil pointer dereference
[1630](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1631)
[signal SIGSEGV: segmentation violation code=0x1 addr=0xc0 pc=0x2087919]
[1631](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1632)

[1632](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1633)
goroutine 1756 [running]:
[1633](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1634)
k8s.io/client-go/kubernetes.(*Clientset).CoreV1(...)
[1634](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1635)
	/home/runner/work/cilium/cilium/vendor/k8s.io/client-go/kubernetes/clientset.go:309
[1635](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1636)
sigs.k8s.io/gateway-api/conformance/utils/kubernetes.DumpEchoLogs({0x2870b33, 0x19}, {0x285ae52, 0x10}, {0x2c52aa0, 0xc00022eab0}, 0x0)
[1636](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1637)
	/home/runner/work/cilium/cilium/vendor/sigs.k8s.io/gateway-api/conformance/utils/kubernetes/logs.go:50 +0x399
[1637](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1638)
sigs.k8s.io/gateway-api/conformance/utils/http.ExpectMirroredRequest.func1()
[1638](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1639)
	/home/runner/work/cilium/cilium/vendor/sigs.k8s.io/gateway-api/conformance/utils/http/mirror.go:43 +0x1df
[1639](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1640)
github.com/stretchr/testify/assert.Eventually.func1()
[1640](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1641)
	/home/runner/work/cilium/cilium/vendor/github.com/stretchr/testify/assert/assertions.go:1852 +0x22
[1641](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1642)
created by github.com/stretchr/testify/assert.Eventually in goroutine 1747
[1642](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1643)
	/home/runner/work/cilium/cilium/vendor/github.com/stretchr/testify/assert/assertions.go:1852 +0x21e
[1643](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1644)
FAIL	github.com/cilium/cilium/operator/pkg/gateway-api	233.042s
[1644](https://github.com/cilium/cilium/actions/runs/5925798992/job/16065936819#step:14:1645)

@meyskens
Copy link
Member

/test

@meyskens
Copy link
Member

@weizhoublue this was indeed an issue we just fixed on the main branch, thanks for rebasing!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Aug 22, 2023
@borkmann borkmann merged commit 917f625 into cilium:main Aug 22, 2023
60 checks passed
@borkmann borkmann added the needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch label Aug 22, 2023
@tklauser tklauser mentioned this pull request Aug 24, 2023
9 tasks
@tklauser tklauser added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Aug 24, 2023
@joestringer joestringer added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants