Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connectivity: add retry to external targets #1540

Merged
merged 1 commit into from
May 3, 2023

Conversation

brlbil
Copy link
Contributor

@brlbil brlbil commented Apr 26, 2023

Adds --retry and --retry-delay flags to the connectivity test.
They are used in external pod-to-world and po-to-cidr tests to try to reconnect
on failures thus making flaky tests less likely.

@brlbil brlbil requested a review from a team as a code owner April 26, 2023 10:18
@brlbil brlbil requested a review from squeed April 26, 2023 10:18
@brlbil brlbil temporarily deployed to ci April 26, 2023 10:18 — with GitHub Actions Inactive
@squeed
Copy link
Contributor

squeed commented Apr 26, 2023

Hmm, it seems like the retries for cases where failure is expected are causing the tests to take too long.

Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy way to disable retries when the test is expected to fail?

@brlbil
Copy link
Contributor Author

brlbil commented Apr 26, 2023

The default retry policy is 3 times and 3s delay. We can reduce this number, but first I will check the test cases for expected fails and see if we approach them differently.

@pchaigno pchaigno marked this pull request as draft April 26, 2023 16:23
@brlbil brlbil temporarily deployed to ci April 27, 2023 12:55 — with GitHub Actions Inactive
@brlbil brlbil temporarily deployed to ci April 27, 2023 13:42 — with GitHub Actions Inactive
@brlbil
Copy link
Contributor Author

brlbil commented Apr 27, 2023

Is there an easy way to disable retries when the test is expected to fail?

@squeed I have made changes that retry options are added for commands in which successful result is expected.
Also, the second commit fixes a couple of comments.

@brlbil brlbil marked this pull request as ready for review April 27, 2023 13:45
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the second commit fixes a couple of comments.

Could you please squash the two commits? It looks like the 2nd commit only changes comments that got introduced/changed in the 1st commit.

connectivity/tests/common.go Outdated Show resolved Hide resolved
connectivity/tests/to-cidr.go Outdated Show resolved Hide resolved
Comment on lines 86 to 96
// all conditions is set return early
if rc.all {
return opts
}
// destIP is checked with pod-to-cidr cases
if peer.Address(ipFam) == rc.destIP {
return opts
}
// destPort and podLabel are checked with pod-to-world case
if peer.Port() == rc.destPort {
for n, v := range rc.podLabels {
if !pod.HasLabel(n, v) {
return []string{}
}
}
return opts
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit strange this this is essentially all OR destIP matches OR (desPort matches AND label matches). Though it currently doesn't have a practical consequence, I'd say it's more straight forward to apply the options if either all is true OR all specified options are true, i.e. the AND of all specified options. Something like:

func (rc *retryCondition) CurlOptions(peer check.TestPeer, ipFam check.IPFamily, pod check.Pod, params check.Parameters) []string {                                                                                                                
        if params.Retry == 0 { 
                return []string{}
        } 

        opts := []string{
                "--retry", strconv.FormatInt(int64(connectRetry), 10),
                "--retry-all-errors", // add --retry-all-errors to retry on all possible errors
        } 

        if retryDelay := params.RetryDelay.Seconds(); retryDelay > 0.0 { 
                opts = append(opts, "--retry-delay", strconv.FormatFloat(retryDelay, 'f', -1, 64))
        } 
        if rc.all { 
                return opts
        } 
        if rc.destIP != "" && peer.Address(ipFam) != rc.destIP { 
                return []string{}
        } 
        if rc.destPort != 0 && peer.Port() != rc.destPort { 
                return []string{}
        } 
        for n, v := range rc.podLabels { 
                if !pod.HasLabel(n, v) { 
                        return []string{}
                } 
        } 

        return opts
}

That way it's independent of the test cases these options are currently used with and allows for easier extensibility in the future.

What do you think?

Copy link
Contributor Author

@brlbil brlbil Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is much more elegant 👍 but it is missing one thing if no options are set it still sets the retry options. So I added a check earlier to return in this case

	if !rc.all && rc.destIP == "" && rc.destPort == 0 {
		return []string{}
	}

I am assuming this change is okay with you and pushing the changes 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it is missing one thing if no options are set it still sets the retry options. So I added a check earlier to return in this case

Ah, good point. I missed that. Shouldn't we also consider pod labels in the options not being set (ref. my inline comment)?

connectivity/tests/common.go Outdated Show resolved Hide resolved
connectivity/tests/world.go Outdated Show resolved Hide resolved
This commit adds --retry and --retry-delay
flags to connectivity test. They are used in external
pod-to-world and po-to-cidr tests to try to reconnect
on failures thus making flaky tests less likely.

Fixes comments of PodToWorld and PodToCIDR functions.

Signed-off-by: Birol Bilgin <birol@cilium.io>
@brlbil brlbil temporarily deployed to ci April 28, 2023 16:59 — with GitHub Actions Inactive
@brlbil
Copy link
Contributor Author

brlbil commented Apr 28, 2023

I am guessing the failing test is related to #1527

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 3, 2023
@tklauser tklauser merged commit bfd1983 into cilium:main May 3, 2023
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants