Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd: Legacy LVM-based OSDs on PVCs crash on resize init container (backport #14100) #14104

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Apr 22, 2024

OSDs on LVM-mode PVCs are failing to come up and crashing in the expand-bluefs init container. To avoid the crash and allow the OSDs to start, a workaround was found to simply remove that init container. Now we disable the OSD resize for this case to avoid others hitting this during upgrade as well.

I am not able to repro this issue with currently available types of OSDs. All new OSDs on PVCs are being created in raw mode, even for encrypted and if they have a metadata device. But this could affect old OSDs that have been upgraded since long ago (as far back as Rook v1.1).

An error is first since in the "osd init" init container where an argument is missing:

Error: ceph-username is required for osd
rook error: ceph-username is required for osd
Usage:
  rook ceph osd init [flags]

But this does not fail the container since other containers are allowed to continue starting. Then the expand container fails with the below error because the ceph config was not initialized because of the previous init container issue:

inferring bluefs devices from bluestore path
unable to read label for /var/lib/ceph/osd/ceph-1: (2) No such file or directory
2024-04-04T13:22:38.461+0000 7f41cddbf900 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory

This seems related to the removal of some variables that were thought to be obsolete in #11331. However, since we can't find a repro and confirm that adding those back actually fixes the issue, the most reliable and low risk solution seems to be just remove the resize init container complete, and then encourage users to replace these legacy OSDs.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

This is an automatic backport of pull request #14100 done by [Mergify](https://mergify.com).

OSDs on LVM-mode PVCs are failing to come up and crashing
in the expand-bluefs init container. To avoid the crash
and allow the OSDs to start, a workaround was found to
simply remove that init container. Now we disable the
OSD resize for this case to avoid others hitting this
during upgrade as well.

Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
(cherry picked from commit acd7b4f)
@satoru-takeuchi satoru-takeuchi merged commit fe845c0 into release-1.13 Apr 22, 2024
52 checks passed
@mergify mergify bot deleted the mergify/bp/release-1.13/pr-14100 branch April 22, 2024 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants