Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨: pkg/cache: add options for cache miss policy #2406

Merged

Conversation

stevekuznetsov
Copy link
Contributor

This commit allows users to opt out of the "start informers in the background" behavior that the current cache implementation uses. Additionally, when opting out of this behavior, the client can be configured to do a live lookup on a cache miss. The default behaviors are:

pkg/cache: backfill data on a miss (today's default, unchanged)
pkg/client: live lookup when cache is configured to miss

Fixes #2397

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 13, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 13, 2023
pkg/cache/cache.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/cache/cache.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
@stevekuznetsov stevekuznetsov force-pushed the skuznets/cache-miss-options branch 2 times, most recently from 9cf53b5 to a993837 Compare July 14, 2023 14:56
@stevekuznetsov
Copy link
Contributor Author

Seeing two failures in test that must be unrelated, kind of confused. Would appreciate knowing if these are known flakes?

@stevekuznetsov
Copy link
Contributor Author

/retest

pkg/cache/informer_cache.go Outdated Show resolved Hide resolved
pkg/cache/informer_cache.go Outdated Show resolved Hide resolved
pkg/cache/informer_cache.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
@stevekuznetsov stevekuznetsov force-pushed the skuznets/cache-miss-options branch 3 times, most recently from 5c064f3 to a4f7424 Compare July 14, 2023 17:56
pkg/client/client.go Outdated Show resolved Hide resolved
pkg/client/client.go Outdated Show resolved Hide resolved
@joelanford
Copy link
Member

Am I reading correctly that the default policy in cache.go is Backfill and the default policy in client.go is Fail?

If so, a few questions:

  1. Why are they different?
  2. Why do we need to set the default in two different places?

If we're changing the default to Fail, we should update the PR emoji to ⚠ to reflect that this is a breaking change.

@alvaroaleman
Copy link
Member

If we're changing the default to Fail, we should update the PR emoji to ⚠ to reflect that this is a breaking change.

This is confusing: The setting in the client is for when the cache tells it that it doesn't have this resource and won't set up an informer. If that client setting is not fail, then it will read from the api directly instead of the cache. The overall default behavior should be unchanged.

Do you think there is a way we could better clarify the above?

@stevekuznetsov
Copy link
Contributor Author

@joelanford this PR adds configuration to both the client and the cache. Regardless of what you configure the cache to do, the client acts independently. We're not "defining the default in two places" since we're configuring two different bits entirely.

The matrix looks like:

  Client UncachedResourcePolicy
Cache UnknownResourcePolicy "fail" "live-lookup"
"backfill" 1 2
"fail" 3 4

The states are:

  1. This is the default behavior. The cache always spins up new informers (like today) and the client, while configured to fail in case of an uncached resource, will never encounter that.
  2. The cache always spins up new informers and the client's configuration is similarly uninteresting.
  3. The cache does not spin up new informers, and the client passes along the error. This is equivalent to using an upstream Lister.
  4. The cache does not spin up new informers, but the client does a live lookup. Might be more ergonomic.

My intent here is to allow using option 3 and APIReader() to get the equivalent of upstream's Lister + Clientset.

@joelanford
Copy link
Member

@stevekuznetsov Thanks. I see the difference now. Another follow-up:

It looks like the behavior of client.CacheOptions.DisableFor before and after this change is to go to the direct client, regardless of the UncachedResourcePolicy. Is that right? If so, that's sort of an implicit "do a live lookup of uncached resources" behavior.

On the other hand, it doesn't look like UncachedResourcePolicy applies to DisableFor'd objects, so choosing Fail as the default wouldn't break existing uses of DisableFor. But this setup seems confusing to me. Intuitively, I'd expect UncachedResourcePolicy to apply to DisableFor, which is literally a list of GVKs that should not be cached (i.e. they are uncached).

Thoughts?

@joelanford
Copy link
Member

Oh and a second question. In the case that cache.UnknownResourcePolicy is backfill, that makes client.UncachedResourcePolicy moot (unless it also applies to DisableFor objects; see previous comment). IMO, its a poor UX for two configurations to interact in a way where one configuration causes the other configuration to be ignored. Is there a way to avoid that?

@stevekuznetsov
Copy link
Contributor Author

@joelanford

It looks like the behavior of client.CacheOptions.DisableFor before and after this change is to go to the direct client,

Good catch. I can update those and add to the docs for DisableFor. Seems like we need the live lookup client policy to be default.

IMO, its a poor UX for two configurations to interact in a way where one configuration causes the other configuration to be ignored. Is there a way to avoid that?

I'm open to suggestions here.

@vincepri
Copy link
Member

IMO, its a poor UX for two configurations to interact in a way where one configuration causes the other configuration to be ignored. Is there a way to avoid that?

In which case the configuration would be ignored? Maybe it's a matter of documentation, but DisableFor is very specific and should always take precedence.

@stevekuznetsov
Copy link
Contributor Author

@vincepri I understood @joelanford to mean that if you configure client.Options.Cache.UncachedResourcePolicy but do not set cache.Options.UnknownResourcePolicy=fail then the client's configuration is obviated since it's configuring for a case that never occurs.

@vincepri
Copy link
Member

🤔 Yeah that needs to be fixed probably, the client's cache configuration portion should inform the internal cache and those option need to be related to each other.

@stevekuznetsov
Copy link
Contributor Author

@vincepri is that a hard requirement? I don't really see this being that much of a UX problem to be honest and I don't know that the amount of work it might take to reconfigure these options into a hierarchy whereby the client can peer into the cache or vice versa would be worth it. The older configuration options that take a constructor closure are opaque, as well, so there's no trivial way to ensure folks using those (eg manager.Options.NewCache) are validated.

@stevekuznetsov
Copy link
Contributor Author

@joelanford updated the client options to affect client.Options.Cache.DisableFor
@vincepri added some best-effort validation to the manager.New() code

Copy link
Contributor Author

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny comments, but LGTM

// errors.NotFound.
//
// Defaults to false, which means that the cache will start a new informer
// for every new requested resource.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth it to note that by default this creates a cluster-wide cache to help folks understand why they might choose to set it to true if they don't have the background we do

pkg/cache/informer_cache.go Outdated Show resolved Hide resolved
pkg/cache/cache.go Outdated Show resolved Hide resolved
pkg/cache/cache_test.go Outdated Show resolved Hide resolved
pkg/cache/cache_test.go Outdated Show resolved Hide resolved
pkg/cache/cache_test.go Show resolved Hide resolved
pkg/client/client_test.go Outdated Show resolved Hide resolved
@sbueringer
Copy link
Member

In general lgtm from my side +/- the open nits and squash before merge (my nit is not a blocker)

@alvaroaleman
Copy link
Member

/retitle ✨: pkg/cache: add options for cache miss policy

@k8s-ci-robot k8s-ci-robot changed the title ✨: pkg/{cache,client}: add options for cache miss policy ✨: pkg/cache: add options for cache miss policy Jul 24, 2023
Copy link
Member

@alvaroaleman alvaroaleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, other than that LGTM

"k8s.io/apimachinery/pkg/runtime/schema"
)

// ErrResourceNotCached indicates that the resource type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I told you something else over Slack, but I realized there is another case of a cache error which is not in a subpackage:

type ErrCacheNotStarted struct{}

Lets maybe move this into the cache package, too?

@alvaroaleman alvaroaleman added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jul 24, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 25, 2023
@stevekuznetsov
Copy link
Contributor Author

@vincepri did we want to rebase this?

@sbueringer
Copy link
Member

I think apart from a rebase and the nit above we're ready to merge

@sbueringer sbueringer added this to the v0.16.x milestone Aug 4, 2023
@sbueringer
Copy link
Member

sbueringer commented Aug 10, 2023

Hey folks,
we're getting closer to the Kubernetes 1.28 / controller-runtime v0.16 releases.
Would be really nice to get this merged before :)

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 16, 2023
@sbueringer sbueringer force-pushed the skuznets/cache-miss-options branch 3 times, most recently from 82691f3 to 8beaa5c Compare August 16, 2023 05:26
@sbueringer
Copy link
Member

@alvaroaleman @vincepri Rebased, should be ready now

//
// Defaults to false, which means that the cache will start a new informer
// for every new requested resource.
FailOnUnknownResource bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it's not directly clear what an UnknownResource is.
Could we maybe rename this to FailReadWithoutRegisteredInformer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ReaderFailOnMissingInformer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Will update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the GoDoc here clear it up sufficiently? We've gone back and forth on the names here a couple of times and I'm not sure we will find something perfect. If the doc makes it sufficiently clear I think it's probably a good stopping place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(disregard, thanks @sbueringer )

This commit allows users to opt out of the "start informers in the
background" behavior that the current cache implementation uses.
Additionally, when opting out of this behavior, the client can be
configured to do a live lookup on a cache miss. The default behaviors
are:

  pkg/cache: backfill data on a miss (today's default, unchanged)
  pkg/client: live lookup when cache is configured to miss

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2023
@k8s-ci-robot k8s-ci-robot merged commit f30e11d into kubernetes-sigs:main Aug 16, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Allow Setting Cache Miss Policy in Cache Options
9 participants