New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FQDN: fix incorrect reaping of newly-expired IPs #31205
Conversation
b42d2e6
to
a766b95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I was sort of wondering about alternatives that avoid add extra fields that are not exported to DNSZombieMapping
and ignored by the marshaller. Given the timeline you've detailed:
T1. name expires
T2. GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()
Have you considered modifying step T4 to pass T4
instead of T1
to the Upsert
call? It seems a bit suspicious that the time value passed in is subject to a race.
To clarify, is T2 CT GC or FQDN GC?
a766b95
to
b955cbe
Compare
I initially didn't do this, because the expiration time is relevant for determining which name to reap in the event of an overflow. However, I did wind up taking this suggestion with a bit of extra logic to handle the expiration-in-the-future case (i.e. over limit).
Be careful, lest the ghost of Leslie Lamport come down and cause race conditions in all of your code that relies on clocks :-). Remember, the scheduler can and will break all your assumptions. Besides, I am uncomfortable with the fiddliness of exactly how the CT GC timestamp is interpreted. If we want to ensure CT GC runs twice, I'd rather use a counter. |
/test |
Nevermind, I'm wrong. We can insert a time of T4, but not a time of T4 + grace. Complicated. |
b955cbe
to
10c6275
Compare
/test |
Here's one way it could happen:
So, we really need the explicit conntrack generation -- it ensures a full CT GC pass has run, giving us a chance to mark the IP as alive. We also need the timestamps, so we can correctly compute grace period and prioritize newer connections in the event of overflow. |
This is probably not useful as a drive-by comment, so feel free to ignore me, but don't forget to think about time going backwards too. |
Would this impact v1.14.2 and above? I'm pretty sure I'm seeing this in my clusters running v1.14.2. Excited to see this fixed, either way. Thanks for your efforts @squeed |
I’m not sure of the exact point version, but I’ve seen this in v1.14.5 clusters for sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! 💯
Just a non-blocking nit left inline.
A bug was found where a low-TTL name was incorrectly reaped despite being part of an active connection. After looking at logs, and reproducing locally, it was determined that there is an unfortunate interleaving between the DNS and CT GC loops. The code attempts to prevent this issue by ensuring that names inserted after CT GC has started are exempt from reaping. However, we don't actually track the insertion time, we track the DNS TTL expiration time, which is strictly in the past. In fact, it can be up to a minute in the past. We shouldn't rely on timestamps anyways, as the scheduler can always play tricks on us. So, if a CT GC run has started and finished in the time between name expiration and insertion in to zombies, the IP address is immediately considered dead and unnecessarily reaped. Timeline: T1. name expires T2. CT GC starts and finishes T3. Zombies.SetCTGCTime(T2) T4. Zombies.Upsert(name, T1) T5. Zombies.GC() At T5, zombies.GC will remove IPs associated with name, because T2 > T1. The solution is to use an explicit serial number to ensure that CTGC has completed a full run before we are allowed to delete an IP. We actually need to let CT GC run twice, as it may have started before this zombie was added and thus not marked it alive. Additionally, we already have a grace period, the idle connection timeout, that gives applications a chance to re-use an expired IP. However, we did not respect this grace period if the IP in question did not have an entry in conntrack. So, pad deletion time by this grace period as well, just to be sure this grace period applies to all possible deletions. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
8c40694
to
3bb47aa
Compare
/test |
@squeed just wanted to thank you for this! 1.14.8 works great! |
A bug was found where a low-TTL name was incorrectly reaped despite
being part of an active connection. After looking at logs, and
reproducing locally, it was determined that there is an unfortunate
interleaving between the DNS and CT GC loops.
The code attempts to prevent this issue by ensuring that names inserted
after CT GC has started are exempt from reaping. However, we don't
actually track the insertion time, we track the DNS TTL expiration time,
which is strictly in the past. In fact, it can be up to a minute in the
past. We shouldn't rely on timestamps anyways, as the scheduler can
always play tricks on us.
So, if a CT GC run has started and finished in the time between name
expiration and insertion in to zombies, the IP address is immediately
considered dead and unnecessarily reaped.
Timeline:
T1. name expires
T2. CT GC starts and finishes
T3. Zombies.SetCTGCTime(T2)
T4. Zombies.Upsert(name, T1)
T5. Zombies.GC()
At T5, zombies.GC will remove IPs associated with name, because T2 > T1.
The solution is to use an explicit serial number to ensure that CTGC has
completed a full run before we are allowed to delete an IP. We actually
need to let CT GC run twice, as it may have started before this zombie
was added and thus not marked it alive.
Additionally, we already have a grace period, the idle connection
timeout, that gives applications a chance to re-use an expired IP.
However, we did not respect this grace period if the IP in question did
not have an entry in conntrack. So, pad deletion time by this grace
period as well, just to be sure this grace period applies to all
possible deletions.
Separately, don't tell FQDN that CT GC is complete if it encountered an error, or didn't do a full pass.