Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make encryption-related connectivity tests more reliable #2035

Merged
merged 7 commits into from
Oct 19, 2023

Conversation

giorio94
Copy link
Member

Please refer to the individual commit messages for additional details.

@giorio94 giorio94 requested review from a team as code owners October 12, 2023 14:37
@giorio94 giorio94 requested review from brb and nebril October 12, 2023 14:37
@giorio94 giorio94 temporarily deployed to ci October 12, 2023 14:37 — with GitHub Actions Inactive
@giorio94 giorio94 marked this pull request as draft October 12, 2023 14:53
@giorio94 giorio94 temporarily deployed to ci October 12, 2023 16:19 — with GitHub Actions Inactive
@giorio94 giorio94 temporarily deployed to ci October 13, 2023 12:46 — with GitHub Actions Inactive
@giorio94 giorio94 temporarily deployed to ci October 13, 2023 13:27 — with GitHub Actions Inactive
@giorio94 giorio94 temporarily deployed to ci October 13, 2023 16:38 — with GitHub Actions Inactive
@giorio94 giorio94 closed this Oct 13, 2023
@giorio94 giorio94 reopened this Oct 13, 2023
@giorio94 giorio94 temporarily deployed to ci October 13, 2023 17:00 — with GitHub Actions Inactive
@giorio94
Copy link
Member Author

Closed and reopened to retrigger tests.

@giorio94
Copy link
Member Author

EKS (tunnel) failure is addressed by #2045
Marking ready for review again

@giorio94 giorio94 marked this pull request as ready for review October 16, 2023 11:40
@giorio94 giorio94 added the dont-merge/blocked Another PR must be merged before this one. label Oct 16, 2023
@giorio94
Copy link
Member Author

Converting back to draft while investigating one failure surfaced in https://github.com/cilium/cilium/actions/runs/6544264527/job/17770427183

@giorio94 giorio94 temporarily deployed to ci October 17, 2023 10:50 — with GitHub Actions Inactive
@giorio94
Copy link
Member Author

E2E and ipsec tests completed correctly in cilium/cilium#28635

@giorio94
Copy link
Member Author

Back in review state

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Small nit.

connectivity/tests/encryption.go Outdated Show resolved Hide resolved
@giorio94 giorio94 temporarily deployed to ci October 17, 2023 16:39 — with GitHub Actions Inactive
@brb
Copy link
Member

brb commented Oct 18, 2023

@giorio94
Copy link
Member Author

giorio94 commented Oct 18, 2023

The EKS failure looks legit - https://github.com/cilium/cilium-cli/actions/runs/6550230402/job/17788873914?pr=2035.

Yes, it is addressed by #2045 (that's why this PR is still blocked). Essentially node to pod connectivity is broken due to the leftover iptables chains, and apparently we are not testing that path anywhere else.

Currently, we always create the host-netns daemonset in a single cluster
only, even when clustermesh is enabled. Yet, this can cause issues when
connectivity tests depend on its presence, such as node-to-node encryption.
Hence, let's adapt the logic to create it in both clusters.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When running in tunnel mode, we currently validate that pod to pod
connectivity does not leak any unencrypted packet (if either IPSec
or WireGuard are enabled) by constructing a tcpdump filter that uses
as source IP address the one of the interface the traffic is routed
to, instead of the source IP of the client. According to the commit
message of f512df4, this was required to correctly test node to
node connectivity, which leverages the same logic. Yet, this breaks
the check for pod to pod traffic, which does not get masquerated.
Indeed, the current check would not fail when forcefully run on a
cluster with encryption disabled.

Let's instead modify the filter, so that we 'OR' together both the
original client IP and the one of the outgoing interface, accounting
for both scenarios.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When running in ENI mode, the outgoing interface of pod originating traffic
is different from the one that would be used by host originating traffic
towards the same destination. This breaks the current pod-to-pod
encryption validation, as the source interface for the tcpdump filter is
determined based on the routes towards the given destination only.

Let's update the source interface determination to additionally consider
the source address. This approach had been initially suggested by Paul
Chaignon in cilium#1241, but then reverted in [1] because `ip route get` returns
and error in case the `from` address is not assigned to any local interface.
We can work around this by specifying an input interface: let's use lo
as it should be always present.

[1]: 2fc0835 ("connectivity: Fix iface derivation in encrypt tests")

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
When both tunneling and host firewall is enabled, traffic from a pod to
a remote node gets forwarded through the tunnel to preserve the source
identity. This breaks the current node-to-node encryption tests, though,
which expect the packets to be seen on the native device.

Let's adapt the source interface determination to override the detected
device in this situation, and always sniff on the tunnel interface.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, encryption checks are run only when encryption (either
IPSec or WireGuard) is enabled. While this approach is sensible,
it also led to cases in which the tests were broken in certain
circumstances, and we didn't notice [1,2,3,4].

Let's instead always run these tests, asserting that unencrypted
packets are either seen or not based on the feature set. Hence,
we can immediately notice if there's a regression, and no
unencrypted packets are detected even if encryption is not enabled.

[1]: f512df4 ("connectivity: fix node-to-node encryption tests")
[2]: HEAD~3 ("connectivity: fix pod-to-pod encryption validation")
[3]: HEAD~2 ("connectivity: fix encryption validation when running in ENI mode")
[4]: HEAD~1 ("connectivity: fix encryption validation when host firewall is enabled")

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the encryption tests validate that no unencrypted packets are
leaked by capturing them on the source only. Since we can capture only
egressing packets (as the XFRM stack recirculates the ingressing ones
during decryption), this means that we are checking only one direction.

Let's extend this check to validate both directions, starting a tcpdump
capture both on the source and destination hosts. We leverage the
bidirectional validation for the pod-to-pod encryption case though,
as with the node-to-node one it is particularly tricky to construct
the correct filter without additional information (as packets might
be masquerated at the source, and in that case they should be sniffed
on a different interface).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Disable the encryption tests when running the connectivity tests on a
single node cluster, as they require that source and destination are
hosted on different nodes.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 temporarily deployed to ci October 18, 2023 17:38 — with GitHub Actions Inactive
@giorio94 giorio94 removed the dont-merge/blocked Another PR must be merged before this one. label Oct 18, 2023
@giorio94
Copy link
Member Author

Kind / installation-and-connectivity (helm) hit #1894

@brb brb merged commit 75c850c into cilium:main Oct 19, 2023
19 checks passed
@giorio94 giorio94 deleted the mio/connectivity-encryption branch November 16, 2023 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants