Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: Always compute proxy policies after DNS proxy updates #25147

Merged

Conversation

jrajahalme
Copy link
Member

The planned logic was to only update proxy network policy if the added policy map entries included proxy redirections. This design did not consider the possibility a concurrent endpoint policy computation consuming the part of the policy map entries with redirection, which lead to the policy update to be missed after the remaining FQDN selectors were applied.

For example:

  1. DNS proxy answer to query 'www.example.com' applies to two FQDN selectors: 'www.example.com' with an L3-only rule, and '*.example.com' with a rule on port 80 for which HTTP policy only allowing GET requests is applied.
  2. New local security ID is allocated for the IP in the DNS answer
    3. Endpoint regenerations are spawned due to the new ID (which could also match an CIDR-based identity selector in addition to the FQDN selectors).
  3. DNS proxy adds the new security ID to the '.example.com' selector associated with the HTTP rule. 5. Endpoint regeneration consumes the above ID update (from another goroutine). As this update is associated with an HTTP rule, proxy network plicy is recomputed. During this computation the new security ID is only associated with selector '.example.com'.
  4. DNS proxy adds the same new security ID to the selector 'www.example.com', which has no HTTP rules.
  5. DNS proxy waits for the FQDN selector updates to finish.
  6. DNS proxy consumes the computed policy map keys, but only sees the ones for the selector 'www.example.com' as the ones for the selector `*.example.com' were already consumed. Since 'www.selector.com' has no HTTP rules, the proxy network policy update is skipped.
  7. DNS proxy responds to the pod with the DNS answer.
  8. pod sends a HTTP POST request to the resolved IP on port 80.
  9. Traffic to port 80 is redirected to the proxy.
  10. Proxy only has the ID associated with the destination IP in the rule that only allows PUT requests and drops the request.
  11. Proxy policy is synced by a controller and the proxy policy is healed, so that it is eventyally consistent. This was too late for the traffic the pod was sending.

Fix this by unconditionally computing proxy policy at (8) above.

A more efficient implementation would make sure that all the map updates due to a specific DNS response get to each endpoint's policy map changes as one transaction, so that it is only possible to consume them as one unit. This would make sure that proxy network policy updates are done for all of the updated selectors if any of the them have proxy redirections.

DNS proxy now always updates the proxy policy to avoid intermittent policy drops.

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Apr 26, 2023
@jrajahalme jrajahalme requested a review from a team as a code owner April 26, 2023 19:08
@jrajahalme jrajahalme added needs-backport/1.12 needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Apr 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.3 Apr 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.10 Apr 26, 2023
@jrajahalme
Copy link
Member Author

/test

@jrajahalme jrajahalme requested review from a team and christarazi and removed request for a team April 26, 2023 19:18
The planned logic was to only update proxy network policy if the added
policy map entries included proxy redirections. This design did not
consider the possibility a concurrent endpoint policy computation
consuming the part of the policy map entries with redirection, which lead
to the policy update to be missed after the remaining FQDN selectors were
applied.

For example:

 1. DNS proxy answer to query 'www.example.com' applies to two FQDN
    selectors: 'www.example.com' with an L3-only rule, and '*.example.com'
    with a rule on port 80 for which HTTP policy only allowing GET requests
    is applied.
 2. New local security ID is allocated for the IP in the DNS answer
    3. Endpoint regenerations are spawned due to the new ID (which could
       also match an CIDR-based identity selector in addition to the FQDN
       selectors).
 4. DNS proxy adds the new security ID to the '*.example.com' selector
    associated with the HTTP rule.
    5. Endpoint regeneration consumes the above ID update (from another
       goroutine). As this update is associated with an HTTP rule, proxy
       network plicy is recomputed. During this computation the new security
       ID is only associated with selector '*.example.com'.
 6. DNS proxy adds the same new security ID to the selector
    'www.example.com', which has no HTTP rules.
 7. DNS proxy waits for the FQDN selector updates to finish.
 8. DNS proxy consumes the computed policy map keys, but only sees the
    ones for the selector 'www.example.com' as the ones for the selector
    `*.example.com' were already consumed. Since 'www.selector.com' has no
    HTTP rules, the proxy network policy update is skipped.
 9. DNS proxy responds to the pod with the DNS answer.
10. pod sends a HTTP POST request to the resolved IP on port 80.
11. Traffic to port 80 is redirected to the proxy.
12. Proxy only has the ID associated with the destination IP in the rule
    that only allows PUT requests and drops the request.
13. Proxy policy is synced by a controller and the proxy policy is
    healed, so that it is eventyally consistent. This was too late for the
    traffic the pod was sending.

Fix this by unconditionally computing proxy policy at (8) above.

A more efficient implementation would make sure that all the map updates
due to a specific DNS response get to each endpoint's policy map changes
as one transaction, so that it is only possible to consume them as one
unit. This would make sure that proxy network policy updates are done for
all of the updated selectors if any of the them have proxy redirections.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme jrajahalme force-pushed the xds-multiple-transactions-single-version branch from 0b5d12a to ec1ad73 Compare April 27, 2023 10:05
@jrajahalme jrajahalme requested a review from a team as a code owner April 27, 2023 10:05
@jrajahalme
Copy link
Member Author

Removed dead code.

@jrajahalme
Copy link
Member Author

/test

Comment on lines -1176 to +1170
if proxyChanges {
// Ignoring the revertFunc; keep all successful changes even if some fail.
err, _ = e.updateNetworkPolicy(proxyWaitGroup)
} else {
// Allow caller to wait for the current network policy to be acked
e.useCurrentNetworkPolicy(proxyWaitGroup)
}
// Ignoring the revertFunc; keep all successful changes even if some fail.
err, _ = e.updateNetworkPolicy(proxyWaitGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understand is that this is the core of the "fix". It seems like previously this logic was deliberate, so are we just losing an optimiztion or is there more to it? How can we understand the greater impact of this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious if extra contention on the proxy wait group might have some kind of impact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When policy update triggered due to the new identity managed to consume the MapChange with proxy redirection, but not the MapChange for the wildcard port rule (no proxy redirection), then this run here would get proxyChanges as false and not update the proxy NetworkPolicy, causing the traffic from the source pod to be dropped until the policy is updated again a git later.

There definitely is room for performance improvement. For example, we could delay the regeneration trigger due to new local identities due to DNS proxy results to be done after the FQDN selectors have already been updated on this code path (one or more callers up the call chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the wildcard port rule is the one needed to allow the traffic in the L7 proxy, so it was misguided to prune out the policy update in this case. If all the updates due to the new DNS identity were applied at the same policy update round, one of them would have a proxy port if a proxy network policy update is needed. When the race hit, the map change with the redirection was handled separately from the one without redirection, and we lost the context that the latter change without redirection was also significant for the proxy.

We could also fix this by making sure all the map updates due to a DNS result are applied atomically to the procy network policy, and then we could move back to using the optimization of not updating the proxy network policy on this hot path to DNS response being sent back to the source pod.

@jrajahalme
Copy link
Member Author

/ci-e2e

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 3, 2023
@joestringer joestringer merged commit 45577a9 into cilium:main May 4, 2023
57 checks passed
@YutaroHayakawa YutaroHayakawa mentioned this pull request May 10, 2023
14 tasks
@YutaroHayakawa YutaroHayakawa added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels May 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.3 May 10, 2023
@YutaroHayakawa YutaroHayakawa mentioned this pull request May 10, 2023
7 tasks
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.10 May 10, 2023
@YutaroHayakawa YutaroHayakawa added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels May 15, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.10 May 15, 2023
@christarazi christarazi mentioned this pull request Jun 6, 2023
2 tasks
@julianwiedmann julianwiedmann added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Jul 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jul 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.3 Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
1.12.10
Backport done to v1.12
1.13.3
Backport done to v1.13
Development

Successfully merging this pull request may close these issues.

None yet

6 participants