Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about the "resync" of informers #1315

Open
stillya opened this issue Nov 6, 2023 · 4 comments
Open

Confusion about the "resync" of informers #1315

stillya opened this issue Nov 6, 2023 · 4 comments

Comments

@stillya
Copy link

stillya commented Nov 6, 2023

This is a request to reopen the existing issue for further discussion. The issue pertains to the need for periodic resync in the work process of informers.

The work process of infomers that i figured out:

  1. List all the resources according to the options given at first, and then initialize the indexer (local cache)
  2. A Watch Loop watch the events of ADD,UPDATE,DELETE , put (obj, event) into delta_fifo.
  3. Pop the (obj, event) from delta_fifo, sync the indexer(local cache) by the event, and distribute event to listeners which is interested in the events of these resources.

But i see the delta type: sync;
when doing sync(or called resync method) operation, the delta fifo gets all objects from indexer(local cache), and then re put them into delat fifo(if the obj is not in the fifo currently) and then trigger an update event to listeners.
If there is a risk that the client will lost some events, why not just sync them from api server? because i think the data source of indexer (local cache) is just from the delta fifo, whatwill we benefit from the periodic resync method?

This becomes even more interesting when we examine the typical informer's handlers, where we check events by their resource version. As a result, resync may become redundant(e.g. traefik)

@soulless-viewer
Copy link

Hi @stillya,

From your issue I see that the main questions you're trying to answer are (correct me if I'm wrong):

  1. Why not just sync events from the API server?
  2. What are the benefits of the periodic resyncing?
  3. resync may become redundant* (not a question, but it's still here)

All these questions come down to the conceptual difference between caching and direct accessing:

  1. Why not just sync events from the API server?

    While it's possible to obtain the events via the Kube API (in fact, from etcd) there are number of reasons to avoid this:

    • cache contains already deserialized and decompressed data
    • requesting from etcd might be very time-consuming, especially in a large clusters or when dealing with the high event frequency
    • this increasing Kube API load, which may affect the overall health of the cluster
  2. What are the benefits of the periodic resyncing?

    • cache is a redundancy* layer - if API is unreachable or if there're other net issues, cache ensures you still can operate using the cached data
    • cache reduces the load on the API server, allowing you to avoid costly operations

* - the thing is that redundancy in case of caching is a good thing, because it effectively allows you to avoid data loss, repeated requests and improves the overal performance.


Let me rephrase Brown's quote:
«It's better to be prepared for an issue and not have one, than to have an issue and not be prepared.»

@stillya
Copy link
Author

stillya commented Dec 7, 2023

Thank you, @soulless-viewer.
It's becoming clearer, but do you have any best practices for using resync? Because all informers I've seen so far look like this, so when resync happens, it will have the same resource version, making the resync operation useless. Should we consider storing events that fail to be handled in some way or something like that?

@sarathmekala
Copy link

@stillya I think your question is more around "Relist" vs "Resync". If so, this link should provide you with more clarity https://hex108.gitbook.io/kubernetes-notes/fu-lu-rtfsc/informer

@oldwang12
Copy link

This is a request to reopen the existing issue for further discussion. The issue pertains to the need for periodic resync in the work process of informers.

The work process of infomers that i figured out:

  1. List all the resources according to the options given at first, and then initialize the indexer (local cache)
  2. A Watch Loop watch the events of ADD,UPDATE,DELETE , put (obj, event) into delta_fifo.
  3. Pop the (obj, event) from delta_fifo, sync the indexer(local cache) by the event, and distribute event to listeners which is interested in the events of these resources.

But i see the delta type: sync;
when doing sync(or called resync method) operation, the delta fifo gets all objects from indexer(local cache), and then re put them into delat fifo(if the obj is not in the fifo currently) and then trigger an update event to listeners.
If there is a risk that the client will lost some events, why not just sync them from api server? because i think the data source of indexer (local cache) is just from the delta fifo, whatwill we benefit from the periodic resync method?

This becomes even more interesting when we examine the typical informer's handlers, where we check events by their resource version. As a result, resync may become redundant(e.g. traefik)

Resync will put all the data of the indexer back into FIFO and trigger an update.
My question is: if this logic is designed to worry about errors during the first update processing, then there may be execution errors in the update, and ADD and Delete may also have this problem. Why did ADD and Delete not callback again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants