New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Eventually()
missing Should()
statement and sync error
#11101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xpivarc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Required labels detected, running phase 2 presubmits: |
/hold |
690e4e8
to
ab9a255
Compare
12ec7ae
to
788b5b6
Compare
Eventually()
missing Should()
statementEventually()
missing Should()
statement and sync error
788b5b6
to
29018dd
Compare
tests/vm_test.go
Outdated
// FIXME: this is just a test to see if the flakiness is reduced | ||
migrationBandwidth := resource.MustParse("1Mi") | ||
kv := util.GetCurrentKv(virtClient) | ||
kv.Spec.Configuration.MigrationConfiguration = &v1.MigrationConfiguration{ | ||
BandwidthPerMigration: &migrationBandwidth, | ||
} | ||
kv = testsuite.UpdateKubeVirtConfigValue(kv.Spec.Configuration) | ||
tests.WaitForConfigToBePropagatedToComponent("kubevirt.io=virt-handler", kv.ResourceVersion, tests.ExpectResourceVersionToBeLessEqualThanConfigVersion, 120*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcanocan Please remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I was able to test, this has been the key piece to stabilize the test. I've taken it from this other test:
kubevirt/tests/virt_control_plane_test.go
Lines 271 to 279 in 7cfc625
By("changing a setting and ensuring that the config update watcher eventually resumes and picks it up") | |
migrationBandwidth := resource.MustParse("1Mi") | |
kv := util.GetCurrentKv(virtCli) | |
kv.Spec.Configuration.MigrationConfiguration = &k6sv1.MigrationConfiguration{ | |
BandwidthPerMigration: &migrationBandwidth, | |
} | |
kv = testsuite.UpdateKubeVirtConfigValue(kv.Spec.Configuration) | |
tests.WaitForConfigToBePropagatedToComponent("kubevirt.io=virt-handler", kv.ResourceVersion, tests.ExpectResourceVersionToBeLessEqualThanConfigVersion, 60*time.Second) | |
}) |
Dropped the
FIXME
comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give the flakes line a couple of extra runs to be sure.
tests/vm_test.go
Outdated
|
||
DeferCleanup(func() { | ||
defer func() { | ||
libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These kind of hacks are causing the whole tests suite to collapse on itself.
There is nothing safe that can be done to fix something like this and I hope to see this whole test removed.
But this is not really related to this fix. I just think the fix is to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped the usage of defer
for this purpose. I agree, it's not elegant. Thanks for the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually referring to hacking the virt-handler network.
It is indeed nicer without the defer
, but what actually is done here is very risky and IMO does not fit our tests.
If for some reason the pod state is not fixed (e.g. the route is not deleted), we are left with a broken cluster and following tests will flake in a hard to understand manner.
IMO we should not have such tests around, we cannot cover every scenario in a reasonable safe manner.
Our resources are limited, tests are long and flaky enough not to risk them further.
If someone thinks it is worth the resources and risk, it would be better to allocate a dedicate job for such tests. I think reality dictates that we need to manage our e2e tests and focus on the main functionality and leave the edge scenarios to the community to detect (reporting bugs, running more tests on D/S projects and contributing back).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is precedence for destructive tests, if we want to run them on separate lanes or not have them at all, the decision should be made in wider consensus + maintainers.
I am not sure what is better about the AfterEach
, other than it is wrong afaik.
Note: I don't agree. This is not a conformance test and if anything this does fail loudly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I disagree with the presence of cluster-level destructive tests and I reasoned about it in the prev message. The fact that they exist is not a sign of them being good for the overall project.
This is not a conformance test and if anything this does fail loudly!
If tests like this fail, they either mess up all following tests or destabilize the cluster in ways we may not predict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is precedence for destructive tests, if we want to run them on separate lanes or not have them at all, the decision should be made in wider consensus + maintainers.
Well, I disagree with the presence of cluster-level destructive tests and I reasoned about it in the prev message. The fact that they exist is not a sign of them being good for the overall project.
This is not a conformance test and if anything this does fail loudly!
If tests like this fail, they either mess up all following tests or destabilize the cluster in ways we may not predict.
We have no rush to get this PR merged and from my point of view these observations are worth to be discussed like @xpivarc proposed. What would be a good way of shifting this discussion away from the PR to a wider forum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be drafting a proposal for the wider kubevirt community to move these kind of destructive
and edge case tests to their own separate test lane and out of the gating lanes. We are currently testing everything in the gating lanes including severe edge cases which does not make sense and makes CI very costly.
We will look at how to separate out these tests and where we want to run them.
@@ -1969,12 +1969,25 @@ status: | |||
|
|||
By("Blocking virt-handler from reconciling the VMI") | |||
libpod.AddKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName) | |||
Eventually(getHandlerNodePod(virtClient, nodeName).Items[0], 120*time.Second, time.Second, HaveConditionFalse(k8sv1.PodReady)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
Eventually(func(g Gomega) {
g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionFalse(k8sv1.PodReady))
}, 120*time.Second, time.Second).Should(Succeed())
DeferCleanup(func() {
libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName)
Eventually(func(g Gomega) {
g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionTrue(k8sv1.PodReady))
}, 120*time.Second, time.Second).Should(Succeed())
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with DeferCleanup
is that it is executed after the resetToDefaultConfig()
inside the global AfterEach
. So when the AfterEach
tries to reset kubevirt, the virt-handler
is offline (unable to communicate with the kubernetes API). That creates this error: "resource & config versions (5548 and 4736 respectively) are not as expected. component: \"virt-handler\", pod: \"virt-handler-zdv7f\" "
.
I like the Eventually
proposal, looks elegant. Also implemented in the blackhole deleting check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we revert the defer
or JustAfterEach
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it.
The defer
is something that we can avoid as solution to cleanup. The JustAfterEach
is generally used for different scopes.
Here, the problem was the DeferCleanup
. After some searches I found this onsi/ginkgo#1284 (comment)
In reality Ginkgo runs the After* family followed by the Defer* family.
which explain the source of the issue.
(We should check our other DeferCleanup
usages IMHO).
Can you explain your doubts?
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcanocan also observed that an AfterEach
is only run after the global AfterEach
afaik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fossedihelm for this nice piece of information. This explains all the issues faced.
As far as I was able to test, the nearest AfterEach
(the one inside the Context
) is always executed before the global AfterEach
. However, we need a way to wait until the handler
and kubevirt
versions are in sync. That's why I've observed in the past the error mentioned in the description. Sorry for the confusion.
29018dd
to
beea79a
Compare
tests/vm_test.go
Outdated
@@ -1957,24 +1957,37 @@ status: | |||
|
|||
Context("[Serial] when node becomes unhealthy", Serial, func() { | |||
const componentName = "virt-handler" | |||
var nodeName string | |||
|
|||
JustAfterEach(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AfterEach
is enough here. Inner AfterEach
are executed before the outer one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
The linter enforces the usage of `Should()` statements when a `Eventually` check is used. Also, `DeferCleanup` has been dropped in favor of `AfterEach`. The [`resetToDefaultConfig()` ](https://github.com/kubevirt/kubevirt/blob/cb1b6e53540189d6664c4a8c126ab6e0a84ff8c4/tests/utils.go#L1842) is called before the test `DeferCleanup`, creating the following error: "resource & config versions (5548 and 4736 respectively) are not as expected. component: \"virt-handler\", pod: \"virt-handler-zdv7f\" " This is because the `virt-handler` will not be ready (intentionally for the purposes of the test), and the `resetToDefaultConfig()` will force the `virt-handler` to reconcile, which will fail, but the `virt-handler` `resourceVersion` will be updated. Therefore, the Kubevirt object will be out of sync. Signed-off-by: Javier Cano Cano <jcanocan@redhat.com>
beea79a
to
e43402d
Compare
Dropped the configuration update in the |
/lgtm |
Required labels detected, running phase 2 presubmits: |
AfterEach(func() { | ||
libpod.DeleteKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName) | ||
Eventually(func(g Gomega) { | ||
g.Expect(getHandlerNodePod(virtClient, nodeName).Items[0]).To(HaveConditionTrue(k8sv1.PodReady)) | ||
}, 120*time.Second, time.Second).Should(Succeed()) | ||
|
||
tests.WaitForConfigToBePropagatedToComponent("kubevirt.io=virt-handler", util.GetCurrentKv(virtClient).ResourceVersion, | ||
tests.ExpectResourceVersionToBeLessEqualThanConfigVersion, 120*time.Second) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not flaky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to find out if this may lead to flaky. The purpose is to wait until configuration versions are in sync. I ran out of ideas in this regard. So I think we should rerun a couple of times the pull-kubevirt-check-tests-for-flakes
to find it out. Sorry for the brute force approach 😞
@@ -1969,12 +1969,25 @@ status: | |||
|
|||
By("Blocking virt-handler from reconciling the VMI") | |||
libpod.AddKubernetesApiBlackhole(getHandlerNodePod(virtClient, nodeName), componentName) | |||
Eventually(getHandlerNodePod(virtClient, nodeName).Items[0], 120*time.Second, time.Second, HaveConditionFalse(k8sv1.PodReady)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we revert the defer
or JustAfterEach
?
/retest pull-kubevirt-check-tests-for-flakes |
@xpivarc: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-kubevirt-check-tests-for-flakes |
2 similar comments
/test pull-kubevirt-check-tests-for-flakes |
/test pull-kubevirt-check-tests-for-flakes |
/retest-required |
/test pull-kubevirt-check-tests-for-flakes |
@xpivarc flakes failed, but no because this test. Please, retest it when you have taken a look. |
/hold cancel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Required labels detected, running phase 2 presubmits: |
/retest-required |
@jcanocan: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
1 similar comment
/retest-required |
What this PR does / why we need it:
The linter enforces the usage of
Should()
statements when aEventually
check is used.It adds the missing
Eventually()
statementsShould()
checks.Also,
DeferCleanup
has been dropped in favor ofdefer
. TheresetToDefaultConfig()
is called before the testDeferCleanup
, creating the following error:This is because the
virt-handler
will not be ready (intentionally for the purposes of the test), and theresetToDefaultConfig()
will force thevirt-handler
to reconcile, which will fail, but thevirt-handler
resourceVersion
will be updated. Therefore, the Kubevirt object will be out of sync.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: