Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stratis-fstab-setup@.service might use the network but does not wait for it #3348

Open
mvollmer opened this issue May 22, 2023 · 3 comments
Open
Assignees
Projects

Comments

@mvollmer
Copy link
Collaborator

For a pool with NBDE, stratis-fstab-setup will run clevis during boot, as it should. For clevis to have a chance to work, the network needs to be up enough for the the tang server to be reachable, but the stratis-fstab-setup@.service units don't have any dependency on network-online.target or similar.

Putting "_netdev" into the fstab entry doesn't help either: it delays the actual mounting of the filesystem, but it doesn't delay starting of stratis-fstab-setup@.service, which is started as early as allowed by its own dependencies.

I think stratis-fstab-setup (the script) should probably just sit in a loop and retry clevis on every change to the network status, whenever there might be a chance for it to work. (But it's not clear when to give up, hmm.)

Just waiting for network-online.target and then running clevis once might be enough for most cases, but please see https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

@jbaublitz
Copy link
Member

Just for reference, we use network-online.target for the root fs. CoreOS wanted to write their own handling of Stratis in the root filesystem as a result of this. We could potentially add an exponential backoff retry mechanism instead for both as suggested by CoreOS, but a stopgap that would probably be workable sooner would be to add a requirement for network-online.target. What would be your preference?

@mvollmer
Copy link
Collaborator Author

Just for reference, we use network-online.target for the root fs.

I don't immediately see how this is possible... Doesn't network-online.target indirectly depend on -.mount? Or does this all happen inside the initrd?

What would be your preference?

Something like this in stratis-fstab-setup, maybe:

if $(stratis-min pool is-stopped "$POOL_UUID"); then
	if $(stratis-min pool is-bound "$POOL_UUID"); then
		while ! systemctl is-active network-online.target; do
		  	echo Waiting for network
			sleep 1
		done
		if ! stratis pool start --unlock-method=clevis --uuid "$POOL_UUID"; then
			echo Failed to start pool with UUID $POOL_UUID using Csslevis. >&2
			exit 1
		fi
       else ...

(Note that by the time network-online.target has been reached, stratis-min has stopped working...)

@jbaublitz
Copy link
Member

jbaublitz commented May 25, 2023

Just for reference, we use network-online.target for the root fs.

I don't immediately see how this is possible... Doesn't network-online.target indirectly depend on -.mount? Or does this all happen inside the initrd?

We actually worked with systemd on this because previously this did not work since NetworkManager was using legacy dracut functionality so we couldn't wait on network-online.target and have NetworkManager set up the network. I see no indication anywhere that .mount is required by network-online.target and we get no warning messages in the initrd about cyclical dependencies around that. Perhaps the reason we can do this is that it is indeed all in the initrd, but I don't see any indication that it wouldn't work for stratis-fstab-setup too.

What would be your preference?

Something like this in stratis-fstab-setup, maybe:

if $(stratis-min pool is-stopped "$POOL_UUID"); then
	if $(stratis-min pool is-bound "$POOL_UUID"); then
		while ! systemctl is-active network-online.target; do
		  	echo Waiting for network
			sleep 1
		done
		if ! stratis pool start --unlock-method=clevis --uuid "$POOL_UUID"; then
			echo Failed to start pool with UUID $POOL_UUID using Csslevis. >&2
			exit 1
		fi
       else ...

(Note that by the time network-online.target has been reached, stratis-min has stopped working...)

Just because of the warning in the documentation, it may be time to make a more robust out of the box solution with exponential backoff retry logic, but this might be a quicker fix in the meantime if requiring network-online.target doesn't work outside of the initrd.

@jbaublitz jbaublitz self-assigned this May 25, 2023
@jbaublitz jbaublitz added this to To do in 2023May via automation May 25, 2023
@mulkieran mulkieran removed this from To do in 2023May Jun 5, 2023
@mulkieran mulkieran added this to To do in 2023June via automation Jun 5, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023June Jun 5, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023June Jul 10, 2023
@mulkieran mulkieran added this to To do in 2023July via automation Jul 10, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023July Jul 10, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023July Aug 7, 2023
@mulkieran mulkieran added this to To do in 2023August via automation Aug 7, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023August Aug 7, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023August Sep 5, 2023
@mulkieran mulkieran added this to To do in 2023September via automation Sep 5, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023September Sep 5, 2023
@mulkieran mulkieran added this to To do in 2023October via automation Oct 3, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023September Oct 3, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023October Oct 3, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023October Oct 30, 2023
@mulkieran mulkieran added this to To do in 2023November via automation Oct 30, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023November Oct 30, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023November Dec 4, 2023
@mulkieran mulkieran added this to To do in 2023December via automation Dec 4, 2023
@mulkieran mulkieran moved this from To do to In progress (long term) in 2023December Dec 4, 2023
@mulkieran mulkieran removed this from In progress (long term) in 2023December Jan 2, 2024
@mulkieran mulkieran added this to To do in 2024January via automation Jan 2, 2024
@mulkieran mulkieran moved this from To do to In progress (long term) in 2024January Jan 2, 2024
@mulkieran mulkieran removed this from In progress (long term) in 2024January Feb 5, 2024
@mulkieran mulkieran added this to To do in 2024February via automation Feb 5, 2024
@mulkieran mulkieran moved this from To do to In progress (long term) in 2024February Feb 5, 2024
@mulkieran mulkieran removed this from In progress (long term) in 2024February Mar 5, 2024
@mulkieran mulkieran added this to To do in 2024March via automation Mar 5, 2024
@mulkieran mulkieran removed this from To do in 2024March Apr 2, 2024
@mulkieran mulkieran added this to To do in 2024April via automation Apr 2, 2024
@mulkieran mulkieran removed this from To do in 2024April Apr 29, 2024
@mulkieran mulkieran added this to To do in 2024May via automation Apr 29, 2024
@mulkieran mulkieran removed this from To do in 2024May May 28, 2024
@mulkieran mulkieran added this to To do in 2024June via automation May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do
Development

No branches or pull requests

2 participants