Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why should we set max_mds to 1 when upgradeMDS ? #14039

Open
hhstu opened this issue Apr 7, 2024 · 1 comment
Open

why should we set max_mds to 1 when upgradeMDS ? #14039

hhstu opened this issue Apr 7, 2024 · 1 comment
Labels

Comments

@hhstu
Copy link

hhstu commented Apr 7, 2024

I just want upgrade mds from ceph-v1.18 -> v1.18.2
Environment:

  • Rook version (use rook version inside of a Rook Pod):
    v1.12.9 - > v1.14.*
func (c *Cluster) upgradeMDS() error {

	logger.Infof("upgrading MDS cluster for filesystem %q", c.fs.Name)

	// 1. set allow_standby_replay to false
	if err := cephclient.AllowStandbyReplay(c.context, c.clusterInfo, c.fs.Name, false); err != nil {
		return errors.Wrap(err, "failed to setting allow_standby_replay to false")
	}

	// Standby-replay daemons are stopped automatically.
	if err := cephclient.FailAllStandbyReplayMDS(c.context, c.clusterInfo, c.fs.Name); err != nil {
		return errors.Wrap(err, "failed to fail mds agent in up:standby-replay state")
	}

	// 2. set max_mds to 1
	logger.Debug("start setting active mds count to 1")
	if err := cephclient.SetNumMDSRanks(c.context, c.clusterInfo, c.fs.Name, 1); err != nil {
		return errors.Wrapf(err, "failed setting active mds count to %d", 1)
	}

	// 3. wait for ranks to be 0
	if err := cephclient.WaitForActiveRanks(c.context, c.clusterInfo, c.fs.Name, 1, false, fsWaitForActiveTimeout); err != nil {
		return errors.Wrap(err, "failed waiting for active ranks to be 1")
	}

	// 4. stop standby daemons
	daemonName, err := cephclient.GetMdsIdByRank(c.context, c.clusterInfo, c.fs.Name, 0)
	if err != nil {
		return errors.Wrap(err, "failed to get mds id from rank 0")
	}
	daemonNameTokens := strings.Split(daemonName, "-")
	daemonLetterID := daemonNameTokens[len(daemonNameTokens)-1]
	desiredDeployments := map[string]bool{
		fmt.Sprintf("%s-%s-%s", AppName, c.fs.Name, daemonLetterID): true,
	}
	logger.Debugf("stop mds other than %s", daemonName)
	err = c.scaleDownDeployments(1, 1, desiredDeployments, false)
	if err != nil {
		return errors.Wrap(err, "failed to scale down deployments during upgrade")
	}
	logger.Debugf("waiting for all standbys gone")
	if err := cephclient.WaitForNoStandbys(c.context, c.clusterInfo, 120*time.Second); err != nil {
		return errors.Wrap(err, "failed to wait for stopping all standbys")
	}

	// 5. upgrade current active deployment and wait for it come back
	_, err = c.startDeployment(c.clusterInfo.Context, daemonLetterID)
	if err != nil {
		return errors.Wrapf(err, "failed to upgrade mds %q", daemonName)
	}
	logger.Debugf("successfully started daemon %q", daemonName)

	// 6. all other MDS daemons will be updated and restarted by main MDS code path

	// 7. max_mds & allow_standby_replay will be reset in deferred function finishedWithDaemonUpgrade

	return nil
}
@hhstu hhstu added the bug label Apr 7, 2024
@hhstu
Copy link
Author

hhstu commented Apr 7, 2024

I have review the code, can we skip the func of upgradeMDS when a minor version ceph update ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant