connectivity tests: wait functions improvements and refactoring #1758

giorio94 · 2023-06-23T16:57:47Z

Please refer to the individual commit messages for additional details.

giorio94 · 2023-06-23T17:10:10Z

Converting back to draft to check failures.

giorio94 · 2023-06-26T07:49:39Z

All checks are green (it appears that the previous failures were unrelated from these changes).
Marking again ready for review.

derailed

@giorio94 Nice work Marco! A few picks and perhaps another potential refactor.

connectivity/check/deployment.go

derailed · 2023-06-26T14:20:03Z

connectivity/check/wait.go

+
+// WaitForDeployment waits until the specified deployment becomes ready.
+func WaitForDeployment(ctx context.Context, log Logger, client *k8s.Client, namespace string, name string) error {
+	log.Logf("⌛ [%s] Waiting for deployment %s/%s to become ready...", client.ClusterName(), namespace, name)


This is a common pattern across all subsequent waitFor calls. It seems it might be a prime candidates for abstracting out these calls.
Also do these need to be exported?

This is a common pattern across all subsequent waitFor calls. It seems it might be a prime candidates for abstracting out these calls.

I agree that the WaitFor functions look pretty similar among each other, but I'm not seeing many possible improvements. Extracting common parts of the logic (I was thinking mainly about the for loop) would mean either dropping the customized error messages or just passing them as parameters, which wouldn't be much better. WDYT?

Also do these need to be exported?

Yes, to enable them being reused when cilium-cli packages are used externally.

Thanks Marco. I see. Figured this is common logic and best to abstract with a closure.
Personally I am not keen about making logging decisions deep in the stack as it should be the responsibility of the call site to determine whether to handle the error, log it, etc... If potentially multi errors can be surfaced then we should return these accordingly. If this is outside the scope of this review, I am good with that but figured I'll flag.

I appreciate the clarification and I see your point. I'd personally not change it further in this PR, to avoid modifying the pre-existing behavior. Especially considering the logs, given that the CLI used to output a debug log for every retry in case of errors.

connectivity/check/wait.go

derailed · 2023-06-26T14:21:24Z

connectivity/check/wait.go

+	defer cancel()
+
+	for {
+		err := validateIPCache(ctx, agent, pods)


nit: one liner check

connectivity/check/wait.go

asauber · 2023-06-26T21:01:01Z

Is part of this PR an attempt to quiet down this error message that we have been seeing in the multicluster workflow?

2023-06-26T06:50:33.155740215Z E0626 06:50:33.155593      26 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Fine if not, we can address that elsewhere.

giorio94 · 2023-06-27T06:49:11Z

Is part of this PR an attempt to quiet down this error message that we have been seeing in the multicluster workflow?
2023-06-26T06:50:33.155740215Z E0626 06:50:33.155593      26 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Fine if not, we can address that elsewhere.

Nope, we'll need another PR for that.

This commit introduces a logger interface which abstracts the logging functionalities implemented by the test suite, individual tests and actions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit extracts, uniforms and generalizes the waitFor* functions used by the connectivity checks, for better separation and to allow them to be reused outside of this package. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

This commit introduces an additional check to wait until all test services have been completely synchronized by the cilium agents on the same nodes hosting the clients before running the tests, to prevent spurious failures. This applies in particular to the multi cluster tests, as affected by additional propagation latency. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 · 2023-06-27T13:08:48Z

Rebased onto main to make CI happy.

michi-covalent · 2023-06-27T17:16:04Z

all the feedback comments have been addressed either in this pull request or via some offline discussions. time to 🚢

giorio94 requested a review from a team as a code owner June 23, 2023 16:57

giorio94 requested a review from derailed June 23, 2023 16:57

giorio94 temporarily deployed to ci June 23, 2023 16:57 — with GitHub Actions Inactive

giorio94 marked this pull request as draft June 23, 2023 17:10

giorio94 force-pushed the mio/wait branch from 65a1353 to b291022 Compare June 26, 2023 06:42

giorio94 temporarily deployed to ci June 26, 2023 06:42 — with GitHub Actions Inactive

giorio94 marked this pull request as ready for review June 26, 2023 07:49

derailed requested changes Jun 26, 2023

View reviewed changes

giorio94 force-pushed the mio/wait branch from b291022 to 8ba621e Compare June 27, 2023 07:11

giorio94 temporarily deployed to ci June 27, 2023 07:11 — with GitHub Actions Inactive

giorio94 requested a review from derailed June 27, 2023 07:12

giorio94 added 3 commits June 27, 2023 15:06

connectivity: add Logger interface

1e2afc6

This commit introduces a logger interface which abstracts the logging functionalities implemented by the test suite, individual tests and actions. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

connectivity: extract and uniform wait functions

dc7de67

This commit extracts, uniforms and generalizes the waitFor* functions used by the connectivity checks, for better separation and to allow them to be reused outside of this package. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>

giorio94 force-pushed the mio/wait branch from 8ba621e to f955338 Compare June 27, 2023 13:08

giorio94 temporarily deployed to ci June 27, 2023 13:08 — with GitHub Actions Inactive

michi-covalent merged commit 0dccab2 into cilium:main Jun 27, 2023
19 checks passed

giorio94 mentioned this pull request Jun 28, 2023

CI: ClusterMesh (ci-multicluster) connectivity test failed: timeout reached waiting lookup for localhost from pod cilium-test/client2 cilium/cilium#26513

Closed

giorio94 mentioned this pull request Aug 2, 2023

CI: ClusterMesh (ci-multicluster) Setup & Test (4, disabled, ipv6, disabled, none) job failure cilium/cilium#25064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connectivity tests: wait functions improvements and refactoring #1758

connectivity tests: wait functions improvements and refactoring #1758

giorio94 commented Jun 23, 2023

giorio94 commented Jun 23, 2023

giorio94 commented Jun 26, 2023

derailed left a comment

derailed Jun 26, 2023

giorio94 Jun 26, 2023

derailed Jun 26, 2023

giorio94 Jun 27, 2023

derailed Jun 26, 2023

asauber commented Jun 26, 2023

giorio94 commented Jun 27, 2023

giorio94 commented Jun 27, 2023

michi-covalent commented Jun 27, 2023

connectivity tests: wait functions improvements and refactoring #1758

connectivity tests: wait functions improvements and refactoring #1758

Conversation

giorio94 commented Jun 23, 2023

giorio94 commented Jun 23, 2023

giorio94 commented Jun 26, 2023

derailed left a comment

Choose a reason for hiding this comment

derailed Jun 26, 2023

Choose a reason for hiding this comment

giorio94 Jun 26, 2023

Choose a reason for hiding this comment

derailed Jun 26, 2023

Choose a reason for hiding this comment

giorio94 Jun 27, 2023

Choose a reason for hiding this comment

derailed Jun 26, 2023

Choose a reason for hiding this comment

asauber commented Jun 26, 2023

giorio94 commented Jun 27, 2023

giorio94 commented Jun 27, 2023

michi-covalent commented Jun 27, 2023