New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: always return nil clusterInfo on failure #9347
Conversation
pkg/operator/ceph/cluster/cluster.go
Outdated
@@ -168,7 +168,7 @@ func (c *ClusterController) initializeCluster(cluster *cluster) error { | |||
|
|||
clusterInfo, _, _, err := mon.LoadClusterInfo(c.context, c.OpManagerCtx, cluster.Namespace) | |||
if err != nil { | |||
logger.Infof("clusterInfo not yet found, must be a new cluster") | |||
logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err) | |
logger.Errorf("clusterInfo not yet found, must be a new cluster. %v", err) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an error perse, so info is fine since it's likely a new cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this seems like a problem that we are always printing the error and continuing. If we know 100% that the failure is because it's a new cluster, then we can continue. But if it's a transient k8s api error, we don't want to continue and we need to fail the reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check for IsNotFound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this help with #9314? Will it log when the cluster info fails to load, but the operator would still crash?
I think it's it should crash anymore since we always return nil on errors now. Logging the error seems useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking as changed requested until IsNotFound is checked for the err
We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>
Let's have a more accurate and correct error message that reflects what the failing function tried to do. Signed-off-by: Sébastien Han <seb@redhat.com>
Let's catch the correct error when no cluster info exists yet. If we have another error, we fail the orchestration and try again. This could help up catching small API hiccups for example. Signed-off-by: Sébastien Han <seb@redhat.com>
core: always return nil clusterInfo on failure (backport #9347)
core: always return nil clusterInfo on failure (backport #9347)
Description of your changes:
core: print error even if skipped
Previously, the underlying error was ignored, now we print it.
core: always return nil on error
We should always return a nil pointer of clusterInfo if
CreateOrLoadClusterInfo() returns an error.
Closes: #9314
Signed-off-by: Sébastien Han seb@redhat.com
Which issue is resolved by this Pull Request:
Resolves #9314
Checklist:
make codegen
) has been run to update object specifications, if necessary.