Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix cache sync timeout functionality #1428

Merged
merged 1 commit into from Mar 16, 2021

Conversation

alvaroaleman
Copy link
Member

@alvaroaleman alvaroaleman commented Mar 14, 2021

So far, timing out the cache sync in most realistic scenarios didn't work,
bceause source.Kind gets an already started cache from the Manager. A
cache that is already started will block forever on GetInformer which we
called in source.Kinds start and not its WaitForSync. The context passed
to Start however defines the entire lifecycle of the Source, not the
Start timeout.

This change makes source.Kind call GetInformer in Start but in a new
go routine and WaitForSync just wait for that to finish or for its
context to be cancelled. This preserves the existing semantic of
starting in Start but waint for ready in WaitForSync.

/cc @varshaprasad96 @estroz @vincepri

Fixes #1346

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 14, 2021
@@ -104,40 +106,21 @@ var _ = Describe("controller", func() {
close(done)
})

It("should wait for each informer to sync", func(done Done) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing this test ever tested was that controller.Start with a cancelled context doesn't error. With the changes in this PR however it will always error because the context passed from the Controller to WaitForSync is cancelled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case do you think it's worth having a test case that checks that an error is always returned if the context is cancelled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the only thing someone who starts a Controller with a cancelled context can realistically expect is that it ends quickly, if it errors or not along the way ends up being an implementation detail. It is not going to result in anything useful.

@@ -90,6 +90,7 @@ var _ = Describe("Source", func() {
},
}, q)
Expect(err).NotTo(HaveOccurred())
Expect(instance.WaitForSync(context.Background())).NotTo(HaveOccurred())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call WaitForSync here because before that finished, the Informer won't have the EventHandler attached. Not waiting makes the test racy.

@alvaroaleman
Copy link
Member Author

apidiff fails because:

  Incompatible changes:
  - Kind: old is comparable, new is not

I don't think Kind being comparable is something we consider part of the API, WDYT?

pkg/source/source.go Outdated Show resolved Hide resolved
@alvaroaleman alvaroaleman changed the title 🐛 Fix wait for cache sync functionality 🐛 Fix cache sync timeout functionality Mar 16, 2021
So far, timing out the cache sync in most realistic scenarios didn't work,
bceause source.Kind gets an already started cache from the Manager. A
cache that is already started will block forever on GetInformer which we
called in source.Kinds start and not its WaitForSync. The context passed
to Start however defines the entire lifecycle of the Source, not the
Start timeout.

This change makes source.Kind call GetInformer in Start but in a new
go routine and WaitForSync just wait for that to finish or for its
context to be cancelled. This preserves the existing semantic of
starting in Start but waint for ready in WaitForSync.
@alvaroaleman
Copy link
Member Author

@estroz addressed your comment, ptal. I've also renamed the pull as the issue wasn't with the cache sync per se but with the cache sync timeout.

Just to mention that, another possible approach to this would have been to make the GetInformer call not wait for the given informer to be synced after the cache is started and only do that in Get/List, that would allow folks like ourselves to have one non-blocking and one explicitly blocking call. The reason I didn't do that is that it might break others that rely on GetInformer blocking. WDYT?

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alvaroaleman,vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vincepri
Copy link
Member

/hold

for @estroz

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 16, 2021
@estroz
Copy link
Contributor

estroz commented Mar 16, 2021

Just to mention that, another possible approach to this would have been to make the GetInformer call not wait for the given informer to be synced after the cache is started and only do that in Get/List, that would allow folks like ourselves to have one non-blocking and one explicitly blocking call.

@alvaroaleman that depends, can you think of any reason GetInformer() returning an unsync'd informer would be beneficial?

The current approach is fine with me.

/lgtm

@alvaroaleman
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 16, 2021
@alvaroaleman
Copy link
Member Author

/retest

1 similar comment
@alvaroaleman
Copy link
Member Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 16, 2021

@alvaroaleman: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-controller-runtime-apidiff-master 4d059e8 link /test pull-controller-runtime-apidiff-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 2a448a7 into kubernetes-sigs:master Mar 16, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.9.x milestone Mar 16, 2021
@alvaroaleman alvaroaleman deleted the fix-cachesync branch November 25, 2022 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

source.Kind and source.Channel interpret the ctx arg to their Start method differently
4 participants