Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: always return nil clusterInfo on failure #9347

Merged
merged 3 commits into from Dec 8, 2021
Merged

Conversation

leseb
Copy link
Member

@leseb leseb commented Dec 8, 2021

Description of your changes:

core: print error even if skipped
Previously, the underlying error was ignored, now we print it.

core: always return nil on error
We should always return a nil pointer of clusterInfo if
CreateOrLoadClusterInfo() returns an error.

Closes: #9314
Signed-off-by: Sébastien Han seb@redhat.com

Which issue is resolved by this Pull Request:
Resolves #9314

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Skip Tests for Docs: Add the flag for skipping the build if this is only a documentation change. See here for the flag.
  • Skip Unrelated Tests: Add a flag to run tests for a specific storage provider. See test options.
  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.

@@ -168,7 +168,7 @@ func (c *ClusterController) initializeCluster(cluster *cluster) error {

clusterInfo, _, _, err := mon.LoadClusterInfo(c.context, c.OpManagerCtx, cluster.Namespace)
if err != nil {
logger.Infof("clusterInfo not yet found, must be a new cluster")
logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err)
logger.Errorf("clusterInfo not yet found, must be a new cluster. %v", err)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an error perse, so info is fine since it's likely a new cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this seems like a problem that we are always printing the error and continuing. If we know 100% that the failure is because it's a new cluster, then we can continue. But if it's a transient k8s api error, we don't want to continue and we need to fail the reconcile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check for IsNotFound.

Copy link
Contributor

@subhamkrai subhamkrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this help with #9314? Will it log when the cluster info fails to load, but the operator would still crash?

@leseb
Copy link
Member Author

leseb commented Dec 8, 2021

How does this help with #9314? Will it log when the cluster info fails to load, but the operator would still crash?

I think it's it should crash anymore since we always return nil on errors now. Logging the error seems useful.

@leseb leseb added this to In progress in v1.8 via automation Dec 8, 2021
@leseb leseb added this to In progress in v1.7 via automation Dec 8, 2021
v1.7 automation moved this from In progress to Review in progress Dec 8, 2021
v1.8 automation moved this from In progress to Review in progress Dec 8, 2021
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as changed requested until IsNotFound is checked for the err

We should always return a nil pointer of clusterInfo if
CreateOrLoadClusterInfo() returns an error.

Closes: rook#9314
Signed-off-by: Sébastien Han <seb@redhat.com>
Let's have a more accurate and correct error message that reflects what
the failing function tried to do.

Signed-off-by: Sébastien Han <seb@redhat.com>
Let's catch the correct error when no cluster info exists yet. If we
have another error, we fail the orchestration and try again. This could
help up catching small API hiccups for example.

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb merged commit 6a10adc into rook:master Dec 8, 2021
v1.7 automation moved this from Review in progress to Done Dec 8, 2021
v1.8 automation moved this from Review in progress to Done Dec 8, 2021
@leseb leseb deleted the fix-9314 branch December 8, 2021 17:08
travisn added a commit that referenced this pull request Dec 8, 2021
core: always return nil clusterInfo on failure (backport #9347)
mergify bot added a commit that referenced this pull request Dec 9, 2021
core: always return nil clusterInfo on failure (backport #9347)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v1.7
Done
v1.8
Done
Development

Successfully merging this pull request may close these issues.

Operator error after K8s master token changed
3 participants