endpoint: Always compute proxy policies after DNS proxy updates #25147

jrajahalme · 2023-04-26T19:08:25Z

The planned logic was to only update proxy network policy if the added policy map entries included proxy redirections. This design did not consider the possibility a concurrent endpoint policy computation consuming the part of the policy map entries with redirection, which lead to the policy update to be missed after the remaining FQDN selectors were applied.

For example:

DNS proxy answer to query 'www.example.com' applies to two FQDN selectors: 'www.example.com' with an L3-only rule, and '*.example.com' with a rule on port 80 for which HTTP policy only allowing GET requests is applied.
New local security ID is allocated for the IP in the DNS answer
3. Endpoint regenerations are spawned due to the new ID (which could also match an CIDR-based identity selector in addition to the FQDN selectors).
DNS proxy adds the new security ID to the '.example.com' selector associated with the HTTP rule. 5. Endpoint regeneration consumes the above ID update (from another goroutine). As this update is associated with an HTTP rule, proxy network plicy is recomputed. During this computation the new security ID is only associated with selector '.example.com'.
DNS proxy adds the same new security ID to the selector 'www.example.com', which has no HTTP rules.
DNS proxy waits for the FQDN selector updates to finish.
DNS proxy consumes the computed policy map keys, but only sees the ones for the selector 'www.example.com' as the ones for the selector `*.example.com' were already consumed. Since 'www.selector.com' has no HTTP rules, the proxy network policy update is skipped.
DNS proxy responds to the pod with the DNS answer.
pod sends a HTTP POST request to the resolved IP on port 80.
Traffic to port 80 is redirected to the proxy.
Proxy only has the ID associated with the destination IP in the rule that only allows PUT requests and drops the request.
Proxy policy is synced by a controller and the proxy policy is healed, so that it is eventyally consistent. This was too late for the traffic the pod was sending.

Fix this by unconditionally computing proxy policy at (8) above.

A more efficient implementation would make sure that all the map updates due to a specific DNS response get to each endpoint's policy map changes as one transaction, so that it is only possible to consume them as one unit. This would make sure that proxy network policy updates are done for all of the updated selectors if any of the them have proxy redirections.

DNS proxy now always updates the proxy policy to avoid intermittent policy drops.

jrajahalme · 2023-04-26T19:16:48Z

/test

The planned logic was to only update proxy network policy if the added policy map entries included proxy redirections. This design did not consider the possibility a concurrent endpoint policy computation consuming the part of the policy map entries with redirection, which lead to the policy update to be missed after the remaining FQDN selectors were applied. For example: 1. DNS proxy answer to query 'www.example.com' applies to two FQDN selectors: 'www.example.com' with an L3-only rule, and '*.example.com' with a rule on port 80 for which HTTP policy only allowing GET requests is applied. 2. New local security ID is allocated for the IP in the DNS answer 3. Endpoint regenerations are spawned due to the new ID (which could also match an CIDR-based identity selector in addition to the FQDN selectors). 4. DNS proxy adds the new security ID to the '*.example.com' selector associated with the HTTP rule. 5. Endpoint regeneration consumes the above ID update (from another goroutine). As this update is associated with an HTTP rule, proxy network plicy is recomputed. During this computation the new security ID is only associated with selector '*.example.com'. 6. DNS proxy adds the same new security ID to the selector 'www.example.com', which has no HTTP rules. 7. DNS proxy waits for the FQDN selector updates to finish. 8. DNS proxy consumes the computed policy map keys, but only sees the ones for the selector 'www.example.com' as the ones for the selector `*.example.com' were already consumed. Since 'www.selector.com' has no HTTP rules, the proxy network policy update is skipped. 9. DNS proxy responds to the pod with the DNS answer. 10. pod sends a HTTP POST request to the resolved IP on port 80. 11. Traffic to port 80 is redirected to the proxy. 12. Proxy only has the ID associated with the destination IP in the rule that only allows PUT requests and drops the request. 13. Proxy policy is synced by a controller and the proxy policy is healed, so that it is eventyally consistent. This was too late for the traffic the pod was sending. Fix this by unconditionally computing proxy policy at (8) above. A more efficient implementation would make sure that all the map updates due to a specific DNS response get to each endpoint's policy map changes as one transaction, so that it is only possible to consume them as one unit. This would make sure that proxy network policy updates are done for all of the updated selectors if any of the them have proxy redirections. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>

jrajahalme · 2023-04-27T10:05:26Z

Removed dead code.

jrajahalme · 2023-04-27T10:13:54Z

/test

christarazi · 2023-04-27T18:30:45Z

pkg/endpoint/bpf.go

-	if proxyChanges {
-		// Ignoring the revertFunc; keep all successful changes even if some fail.
-		err, _ = e.updateNetworkPolicy(proxyWaitGroup)
-	} else {
-		// Allow caller to wait for the current network policy to be acked
-		e.useCurrentNetworkPolicy(proxyWaitGroup)
-	}
+	// Ignoring the revertFunc; keep all successful changes even if some fail.
+	err, _ = e.updateNetworkPolicy(proxyWaitGroup)


My understand is that this is the core of the "fix". It seems like previously this logic was deliberate, so are we just losing an optimiztion or is there more to it? How can we understand the greater impact of this change?

I'd be curious if extra contention on the proxy wait group might have some kind of impact.

When policy update triggered due to the new identity managed to consume the MapChange with proxy redirection, but not the MapChange for the wildcard port rule (no proxy redirection), then this run here would get proxyChanges as false and not update the proxy NetworkPolicy, causing the traffic from the source pod to be dropped until the policy is updated again a git later.

There definitely is room for performance improvement. For example, we could delay the regeneration trigger due to new local identities due to DNS proxy results to be done after the FQDN selectors have already been updated on this code path (one or more callers up the call chain.

Note that the wildcard port rule is the one needed to allow the traffic in the L7 proxy, so it was misguided to prune out the policy update in this case. If all the updates due to the new DNS identity were applied at the same policy update round, one of them would have a proxy port if a proxy network policy update is needed. When the race hit, the map change with the redirection was handled separately from the one without redirection, and we lost the context that the latter change without redirection was also significant for the proxy.

We could also fix this by making sure all the map updates due to a DNS result are applied atomically to the procy network policy, and then we could move back to using the optimization of not updating the proxy network policy on this hot path to DNS response being sent back to the source pod.

jrajahalme · 2023-05-03T06:53:56Z

/ci-e2e

jrajahalme added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Apr 26, 2023

jrajahalme requested a review from a team as a code owner April 26, 2023 19:08

jrajahalme requested a review from tommyp1ckles April 26, 2023 19:08

jrajahalme added needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 26, 2023

maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 Apr 26, 2023

maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 Apr 26, 2023

jrajahalme requested review from a team and christarazi and removed request for a team April 26, 2023 19:18

jrajahalme force-pushed the xds-multiple-transactions-single-version branch from 0b5d12a to ec1ad73 Compare April 27, 2023 10:05

jrajahalme requested a review from a team as a code owner April 27, 2023 10:05

christarazi reviewed Apr 27, 2023

View reviewed changes

tommyp1ckles approved these changes May 2, 2023

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 3, 2023

joestringer merged commit 45577a9 into cilium:main May 4, 2023
57 checks passed

YutaroHayakawa mentioned this pull request May 10, 2023

v1.13 Backports 2023-05-10 #25346

Merged

14 tasks

YutaroHayakawa added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 10, 2023

maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 May 10, 2023

YutaroHayakawa mentioned this pull request May 10, 2023

v1.12 Backports 2023-05-10 #25348

Merged

7 tasks

YutaroHayakawa added backport-pending/1.12 and removed needs-backport/1.12 labels May 10, 2023

maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 May 10, 2023

YutaroHayakawa added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels May 15, 2023

maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 May 15, 2023

This was referenced May 16, 2023

Prepare for release v1.12.10 #25490

Merged

Prepare for release v1.13.3 #25514

Merged

christarazi mentioned this pull request Jun 6, 2023

frequent dnsproxy timeouts #25877

Closed

2 tasks

julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jul 10, 2023

maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

endpoint: Always compute proxy policies after DNS proxy updates #25147

endpoint: Always compute proxy policies after DNS proxy updates #25147

jrajahalme commented Apr 26, 2023

jrajahalme commented Apr 26, 2023

jrajahalme commented Apr 27, 2023

jrajahalme commented Apr 27, 2023

christarazi Apr 27, 2023

tommyp1ckles May 1, 2023

jrajahalme May 2, 2023

jrajahalme May 2, 2023

jrajahalme commented May 3, 2023

endpoint: Always compute proxy policies after DNS proxy updates #25147

endpoint: Always compute proxy policies after DNS proxy updates #25147

Conversation

jrajahalme commented Apr 26, 2023

jrajahalme commented Apr 26, 2023

jrajahalme commented Apr 27, 2023

jrajahalme commented Apr 27, 2023

christarazi Apr 27, 2023

Choose a reason for hiding this comment

tommyp1ckles May 1, 2023

Choose a reason for hiding this comment

jrajahalme May 2, 2023

Choose a reason for hiding this comment

jrajahalme May 2, 2023

Choose a reason for hiding this comment

jrajahalme commented May 3, 2023