mds downgrade from higher version to lower version #8576

fengjiankui121 · 2021-08-23T09:03:53Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:

Expected behavior:
not allowed mds downgrade

How to reproduce it (minimal and precise):

step 1: install ceph with higher version
step 2: upgrade ceph with lower version
step 3: Wait for the osd deployment to complete, restart the operator
step 4: mds will downgrade from higher version to lower version

File(s) to submit:

Cluster CR (custom resource), typically called cluster.yaml, if necessary
Operator's logs, if necessary
Crashing pod(s) logs, if necessary

To get logs, use kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the insert code button from the Github UI.
Read Github documentation if you need help.

Environment:

OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Cloud provider or hardware configuration:
Rook version (use rook version inside of a Rook Pod):
Storage backend version (e.g. for ceph do ceph -v):
Kubernetes version (use kubectl version):
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

The text was updated successfully, but these errors were encountered:

travisn · 2021-08-23T20:42:02Z

@fengjiankui121 If you don't restart the operator do you see the same behavior? The operator intends to prevent the ceph downgrades, but there may be an issue when the operator is restarted.

fengjiankui121 · 2021-08-31T10:52:09Z

@travisn The same phenomenon will occur if other operations cause the operator to re-run. For example, cephcluster configuration changes

fengjiankui121 · 2021-08-31T10:57:31Z

@travisn This problem is mainly due to the inconsistency of the version of the ceph component, which triggers the upgrade of mds. The relevant code is as follows:

rook\pkg\operator\ceph\cluster\version.go

if numberOfCephVersions > 1 {
// let's return immediately
logger.Warningf("it looks like we have more than one ceph version running. triggering upgrade. %+v:", runningVersions.Overall)
return true, nil
}

fengjiankui121 · 2021-08-31T10:59:24Z

@travisn OSD, monitor, etc. are allowed to upgrade to lower versions, but mds is not allowed, which will lead to inconsistent versions of ceph components, and inconsistent versions of ceph components will trigger the upgrade of mds

travisn · 2021-09-03T16:43:22Z

@fengjiankui121 To be clear, are you seeing that the mds is not updated during a downgraded Ceph version? But if you restart the operator, the mds is updated to the downgraded version? And the bug is that you expect the mds to downgrade without the operator restart?

fengjiankui121 · 2021-09-14T00:37:50Z

@travisn yes, it is

travisn · 2021-09-14T19:42:59Z

Ok, sounds like we just need to relax the check and allow the mds downgrade. It's not really supported to downgrade, but the reality is that sometimes it is better to risk the downgrade than to stay in a broken state after an upgrade

sp98 · 2021-09-15T07:02:21Z

@fengjiankui121 Need help the reproduce this behavior. Steps I followed are below:

Deploy rook-ceph (master branch) with ceph 16.2.5.
Create cluster/examples/kubernetes/ceph/filesystem-test.yaml
Downgrade ceph version to image: quay.io/ceph/ceph:v16.2.4 in the cephCluster yaml
Wait for downgrade to complete.

Before downgrade:

 versions:
      mds:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 2
      mgr:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 1
      mon:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 3
      osd:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 1
      overall:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 7

After downgrade :

    versions:
      mds:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 2
      mgr:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 1
      mon:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 3
      osd:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 1
      overall:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 7

mds got downgraded to 16.2.4 just by downgrading the ceph version from 16.2.5 to 16.2.4 in the cephCluster yaml

Let me know if I'm missing something.

fengjiankui121 · 2021-09-15T07:27:47Z

From the rook code, it can be found that mds is not allowed to be upgraded, unless the versions between the ceph components are inconsistent, the relevant code is as follows：

if numberOfCephVersions > 1 {
// let's return immediately
logger.Warningf("it looks like we have more than one ceph version running. triggering upgrade. %+v:", runningVersions.Overall)
return true, nil
}

......
......

if cephver.IsInferior(imageSpecVersion, clusterRunningVersion) {
return true, errors.Errorf("image spec version %s is lower than the running cluster version %s, downgrading is not supported", imageSpecVersion.String(), clusterRunningVersion.String())}

fengjiankui121 · 2021-09-15T07:28:43Z

@sp98 From the rook code, it can be found that mds is not allowed to be upgraded, unless the versions between the ceph components are inconsistent, the relevant code is as follows：

if numberOfCephVersions > 1 {
// let's return immediately
logger.Warningf("it looks like we have more than one ceph version running. triggering upgrade. %+v:", runningVersions.Overall)
return true, nil
}

......
......

if cephver.IsInferior(imageSpecVersion, clusterRunningVersion) {
return true, errors.Errorf("image spec version %s is lower than the running cluster version %s, downgrading is not supported", imageSpecVersion.String(), clusterRunningVersion.String())}

sp98 · 2021-09-16T06:13:33Z

Able to reproduce this issue.

  versions:
      mds:
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 2
      mgr:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 1
      mon:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 3
      osd:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 1
      overall:
        ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable): 5
        ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 2

Discussed about this issue in the huddle. General consensus is to allow users to downgrade for scenarios where upgrades could lead to a broken state . See comment

Rook ( and also ceph) does not support downgrades.

Waiting for a major refactoring PR. I'll test this issue again after that PR is merged.

github-actions · 2021-11-15T20:02:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

travisn · 2021-11-15T20:33:26Z

This was actually fixed by #9098 in v1.7.7

fengjiankui121 added the bug label Aug 23, 2021

sp98 self-assigned this Sep 15, 2021

github-actions bot added the wontfix label Nov 15, 2021

travisn closed this as completed Nov 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mds downgrade from higher version to lower version #8576

mds downgrade from higher version to lower version #8576

fengjiankui121 commented Aug 23, 2021

travisn commented Aug 23, 2021

fengjiankui121 commented Aug 31, 2021

fengjiankui121 commented Aug 31, 2021

fengjiankui121 commented Aug 31, 2021

travisn commented Sep 3, 2021

fengjiankui121 commented Sep 14, 2021

travisn commented Sep 14, 2021

sp98 commented Sep 15, 2021 •

edited

fengjiankui121 commented Sep 15, 2021

fengjiankui121 commented Sep 15, 2021

sp98 commented Sep 16, 2021 •

edited

github-actions bot commented Nov 15, 2021

travisn commented Nov 15, 2021

mds downgrade from higher version to lower version #8576

mds downgrade from higher version to lower version #8576

Comments

fengjiankui121 commented Aug 23, 2021

travisn commented Aug 23, 2021

fengjiankui121 commented Aug 31, 2021

fengjiankui121 commented Aug 31, 2021

fengjiankui121 commented Aug 31, 2021

travisn commented Sep 3, 2021

fengjiankui121 commented Sep 14, 2021

travisn commented Sep 14, 2021

sp98 commented Sep 15, 2021 • edited

fengjiankui121 commented Sep 15, 2021

fengjiankui121 commented Sep 15, 2021

sp98 commented Sep 16, 2021 • edited

github-actions bot commented Nov 15, 2021

travisn commented Nov 15, 2021

sp98 commented Sep 15, 2021 •

edited

sp98 commented Sep 16, 2021 •

edited