core: always return nil clusterInfo on failure #9347

leseb · 2021-12-08T10:26:23Z

Description of your changes:

core: print error even if skipped
Previously, the underlying error was ignored, now we print it.

core: always return nil on error
We should always return a nil pointer of clusterInfo if
CreateOrLoadClusterInfo() returns an error.

Closes: #9314
Signed-off-by: Sébastien Han seb@redhat.com

Which issue is resolved by this Pull Request:
Resolves #9314

Checklist:

subhamkrai · 2021-12-08T12:01:58Z

pkg/operator/ceph/cluster/cluster.go

@@ -168,7 +168,7 @@ func (c *ClusterController) initializeCluster(cluster *cluster) error {

 	clusterInfo, _, _, err := mon.LoadClusterInfo(c.context, c.OpManagerCtx, cluster.Namespace)
 	if err != nil {
-		logger.Infof("clusterInfo not yet found, must be a new cluster")
+		logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err)


Suggested change

logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err)

logger.Errorf("clusterInfo not yet found, must be a new cluster. %v", err)

?

It's not an error perse, so info is fine since it's likely a new cluster.

Actually, this seems like a problem that we are always printing the error and continuing. If we know 100% that the failure is because it's a new cluster, then we can continue. But if it's a transient k8s api error, we don't want to continue and we need to fail the reconcile.

Let's check for IsNotFound.

subhamkrai

lgtm

travisn

How does this help with #9314? Will it log when the cluster info fails to load, but the operator would still crash?

leseb · 2021-12-08T14:55:47Z

How does this help with #9314? Will it log when the cluster info fails to load, but the operator would still crash?

I think it's it should crash anymore since we always return nil on errors now. Logging the error seems useful.

travisn

Marking as changed requested until IsNotFound is checked for the err

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

Let's have a more accurate and correct error message that reflects what the failing function tried to do. Signed-off-by: Sébastien Han <seb@redhat.com>

Let's catch the correct error when no cluster info exists yet. If we have another error, we fail the orchestration and try again. This could help up catching small API hiccups for example. Signed-off-by: Sébastien Han <seb@redhat.com>

core: always return nil clusterInfo on failure (backport #9347)

leseb mentioned this pull request Dec 8, 2021

Operator error after K8s master token changed #9314

Closed

subhamkrai reviewed Dec 8, 2021

View reviewed changes

subhamkrai approved these changes Dec 8, 2021

View reviewed changes

travisn reviewed Dec 8, 2021

View reviewed changes

leseb added this to In progress in v1.8 via automation Dec 8, 2021

leseb added this to In progress in v1.7 via automation Dec 8, 2021

v1.7 automation moved this from In progress to Review in progress Dec 8, 2021

v1.8 automation moved this from In progress to Review in progress Dec 8, 2021

travisn requested changes Dec 8, 2021

View reviewed changes

leseb added 3 commits December 8, 2021 17:30

core: always return nil on error

fdd243d

We should always return a nil pointer of clusterInfo if CreateOrLoadClusterInfo() returns an error. Closes: rook#9314 Signed-off-by: Sébastien Han <seb@redhat.com>

core: fix error message

4f1a2d4

Let's have a more accurate and correct error message that reflects what the failing function tried to do. Signed-off-by: Sébastien Han <seb@redhat.com>

leseb force-pushed the fix-9314 branch from b45887f to e5ce6ab Compare December 8, 2021 16:33

leseb requested a review from travisn December 8, 2021 16:33

travisn approved these changes Dec 8, 2021

View reviewed changes

travisn added backport-release-1.7 labels Dec 8, 2021

leseb merged commit 6a10adc into rook:master Dec 8, 2021

v1.7 automation moved this from Review in progress to Done Dec 8, 2021

v1.8 automation moved this from Review in progress to Done Dec 8, 2021

leseb deleted the fix-9314 branch December 8, 2021 17:08

This was referenced Dec 8, 2021

core: always return nil clusterInfo on failure (backport #9347) #9353

Merged

core: always return nil clusterInfo on failure (backport #9347) #9354

Merged

travisn added a commit that referenced this pull request Dec 8, 2021

Merge pull request #9354 from rook/mergify/bp/release-1.8/pr-9347

bf7a6d6

core: always return nil clusterInfo on failure (backport #9347)

mergify bot added a commit that referenced this pull request Dec 9, 2021

Merge pull request #9353 from rook/mergify/bp/release-1.7/pr-9347

8bbfb94

core: always return nil clusterInfo on failure (backport #9347)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: always return nil clusterInfo on failure #9347

core: always return nil clusterInfo on failure #9347

leseb commented Dec 8, 2021

subhamkrai Dec 8, 2021

leseb Dec 8, 2021

travisn Dec 8, 2021

leseb Dec 8, 2021

subhamkrai left a comment

travisn left a comment

leseb commented Dec 8, 2021

travisn left a comment

	logger.Infof("clusterInfo not yet found, must be a new cluster. %v", err)
	logger.Errorf("clusterInfo not yet found, must be a new cluster. %v", err)

core: always return nil clusterInfo on failure #9347

core: always return nil clusterInfo on failure #9347

Conversation

leseb commented Dec 8, 2021

subhamkrai Dec 8, 2021

Choose a reason for hiding this comment

leseb Dec 8, 2021

Choose a reason for hiding this comment

travisn Dec 8, 2021

Choose a reason for hiding this comment

leseb Dec 8, 2021

Choose a reason for hiding this comment

subhamkrai left a comment

Choose a reason for hiding this comment

travisn left a comment

Choose a reason for hiding this comment

leseb commented Dec 8, 2021

travisn left a comment

Choose a reason for hiding this comment