Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore Cilium endpoints not found error for the status command. #2320

Merged
merged 2 commits into from
Feb 27, 2024

Conversation

viktor-kurchenko
Copy link
Contributor

@viktor-kurchenko viktor-kurchenko commented Feb 20, 2024

The fix is required for the Customers that have disableEndpointCRD: "true" in the Cilium configuration.

Status output example when disableEndpointCRD: "true":
Screenshot 2024-02-20 at 11 29 00

The logic was stolen from: https://github.com/cilium/cilium-cli/blob/main/install/install.go#L582-L587

Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
Copy link
Member

@asauber asauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your screenshot with this fix or without? Is the intention to show 0 pods managed?

@viktor-kurchenko
Copy link
Contributor Author

viktor-kurchenko commented Feb 22, 2024

Is your screenshot with this fix or without? Is the intention to show 0 pods managed?

This is with fix. I think 0 pods is a side-effect of disableEndpointCRD: "true".

That's how it looks like without fix:
Screenshot 2024-02-22 at 07 43 27

@margamanterola
Copy link
Member

Some additional context on why this PR is needed:
#2233 changed the exit code of the status subcommand so that if there was an error in any of the calls triggered to gather the status, the command finishes with error. However, when the cluster is configured with disableEndpointCRD: "true", the "error" is expected and exiting with a non-zero value is actually incorrect.

A better solution would be for the CLI to support this option, but until we have that, this change allows the status subcommand to exit with zero when there aren't any actual errors.

Copy link
Member

@asauber asauber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I do think the overall status command should be changed to be made aware of disableEndpointCRD.

It seems like the correct behavior would be to hide the Cluster Pods: section entirely. Is it true that Cilium is managing "0/0" Pods if Endpoints are disabled? I think yes.

status/k8s.go Outdated
return err
// When the CEP has not been registered yet, it's impossible
// for any pods to be managed by Cilium.
if err.Error() != "the server could not find the requested resource (get ciliumendpoints.cilium.io)" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors in Go should not be asserted by string matching, this is likely to lead to bugs when error strings change. Instead the errors.As (https://pkg.go.dev/errors#As) method can be used to search for a given error type in the wrapped error tree.

In this case http.StatusNotFound is the error type that you want to assert with errors.As.

https://github.com/kubernetes/apimachinery/blob/master/pkg/api/errors/errors.go#L446C7-L446C26

You can also move this predicate up into line 174 to avoid an extra level of indentation and simplify things. That is:

if err != nil and !errors.As(err, http.StatusNotFound) {
	return err
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree that error string matching is a bad approach.
But I don't think we can use errors.As because we don't have error wrapping here.

So, I used type assertion and status code check.

status/k8s.go Outdated Show resolved Hide resolved
@margamanterola
Copy link
Member

I do think the overall status command should be changed to be made aware of disableEndpointCRD

Yes, but this is outside the scope of this PR, the goal of which is to make the status subcommand not return an error when nothing is wrong.

It seems like the correct behavior would be to hide the Cluster Pods: section entirely. Is it true that Cilium is managing "0/0" Pods if Endpoints are disabled? I think yes.

I don't think so. I guess it depends on what is the expected meaning of "managed by Cilium". In setups that use kvstore mode it's common to disable the endpoint CRD to reduce unnecessary strain on the API server. From my point of view, this doesn't mean that the pods are not managed by Cilium, but only that there's no CiliumEndpoint CRD for those pods.

Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 26, 2024
@asauber
Copy link
Member

asauber commented Feb 26, 2024

I don't think so. I guess it depends on what is the expected meaning of "managed by Cilium". In setups that use kvstore mode it's common to disable the endpoint CRD to reduce unnecessary strain on the API server. From my point of view, this doesn't mean that the pods are not managed by Cilium, but only that there's no CiliumEndpoint CRD for those pods.

Understood, here's an Issue to track making this status output more accurate: #2334

@michi-covalent michi-covalent merged commit fd298d6 into main Feb 27, 2024
13 checks passed
@michi-covalent michi-covalent deleted the pr/vk/status/ignore/cilium/endpoints/error branch February 27, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants