Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.15] Backport FQDN: fix incorrect reaping of newly-expired IPs #31318

Merged
merged 2 commits into from Mar 11, 2024

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Mar 11, 2024

Once this PR is merged, a GitHub action will update the labels of these PRs:

$ for pr in 31205; do contrib/backporting/set-labels.py $pr done 1.15; done

[ upstream commit 03f8c85 ]

The FQDN GC subsystem waits before a successful CT GC run before marking
IPs as stale. However, we were erroneously marking CT GC as successful
even on failure, or when only run for a single family.

So, only mark notify FQDN  when we've done a successful GC pass for all
configured families.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit b284170 ]

A bug was found where a low-TTL name was incorrectly reaped despite
being part of an active connection. After looking at logs, and
reproducing locally, it was determined that there is an unfortunate
interleaving between the DNS and CT GC loops.

The code attempts to prevent this issue by ensuring that names inserted
after CT GC has started are exempt from reaping. However, we don't
actually track the insertion time, we track the DNS TTL expiration time,
which is strictly in the past. In fact, it can be up to a minute in the
past. We shouldn't rely on timestamps anyways, as the scheduler can
always play tricks on us.

So, if a CT GC run has started and finished in the time between name
expiration and insertion in to zombies, the IP address is immediately
considered dead and unnecessarily reaped.

Timeline:

T1. name expires
T2. CT GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()

At T5, zombies.GC will remove IPs associated with name, because T2 > T1.

The solution is to use an explicit serial number to ensure that CTGC has
completed a full run before we are allowed to delete an IP. We actually
need to let CT GC run twice, as it may have started before this zombie
was added and thus not marked it alive.

Additionally, we already have a grace period, the idle connection
timeout, that gives applications a chance to re-use an expired IP.
However, we did not respect this grace period if the IP in question did
not have an entry in conntrack. So, pad deletion time by this grace
period as well, just to be sure this grace period applies to all
possible deletions.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed added kind/backports This PR provides functionality previously merged into master. backport/v1.15 labels Mar 11, 2024
@squeed squeed requested a review from a team as a code owner March 11, 2024 17:19
@maintainer-s-little-helper maintainer-s-little-helper bot added the backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. label Mar 11, 2024
@squeed
Copy link
Contributor Author

squeed commented Mar 11, 2024

/test-backport-1.15

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 11, 2024
@squeed squeed merged commit ec3e2a6 into v1.15 Mar 11, 2024
219 checks passed
@squeed squeed deleted the pr/v1.15-backport-2024-03-11 branch March 11, 2024 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants