disk: Recreate the bootc disk image when passed certain parameters #33

ckyrouac · 2024-05-15T15:45:28Z

Sometimes the disk image needs to be recreated even if reusing
the same container image, for example when passing --disk-size
to create a bootc disk with a larger disk size.

disk: Refactor image pull and cache dir creation

This creates two new structs, image and cache, and removes the code to
pull the image and create the cache directory from bootc_disk. This is
done in order to check if the VM is running before creating the disk to
fail early. We need the pulled container image ID before checking
if the VM is running.

This also gets us closer to separately managing the cache dir which should
simplify some of the lock code.

This is in preparation for the code to clear the cached
disk image when modifying the disk image size or filesystem type. Prior
to these changes, the disk could be recreated before checking if the VM
is running.

ckyrouac · 2024-05-15T18:15:45Z

I just realized the behavior of this is still a little odd.

e.g. the following will continue to use the 20G disk size.

podman-bootc run --disk-size 20G <image>
podman-bootc stop <image>
podman-bootc run <image>

Also running podman-bootc run --disk-size 20G <image> a second time after running podman-bootc stop will recreate the disk image. This is just an optimization but it's still not ideal.

I think the general weirdness stems from after the initial run command, modifying the disk-size, etc. would behave more like an edit operation rather than run. I'm not sure we want to add an explicit edit operation like podman-bootc edit <image> --disk-size 20G though, since the cache is meant to be transparent.

Anyways, this PR gets us closer to the correct behavior so I'd prefer to get this in and refine the behavior in the future. The fixes for the above known issues will require us to load the existing cache/config earlier which likely means more refactoring.

cmd/run.go

germag · 2024-05-16T15:47:26Z

cmd/run.go

+	// create the disk image
+	bootcDisk := bootc.NewBootcDisk(containerImage, ctx, user, cache)
+	err = bootcDisk.Install(vmConfig.Quiet, diskImageConfigInstance)
+


(the very useful and great github UI /s, doesn't allow me to add a comment including no-changed lines, so this comment includes line 120 bootcVM, err := vm.NewVM(vm.NewVMParameters{)

This is incorrect locking, the locks from bootcDisk.Install() now are in the cache.Create(), that part is fine, but now we are holding the "shared" lock from bootcVM, but we are changing the content of the cache directory, so we must get an "exclusive" lock, but we cannot ask for an exclusive lock in NewVM() because will be not allowed to ssh into it

cache.Create() takes an exclusive lock and frees it after creating the cache directory. vm.NewVM() in run takes a shared lock in main, so this is not introducing a new bug. This will be much simpler to fix with the cache/config loading refactor I am working on for a separate MR. I'd rather not try to fix everything in this MR and just keep the scope to what is needed for the disk size fix.

germag · 2024-05-16T15:49:39Z

cmd/run.go

-	if err = bootcVM.WriteConfig(*bootcDisk); err != nil {
+	if err = bootcVM.WriteConfig(*bootcDisk, containerImage); err != nil {


also changing the cache using a shared lock instead of an exclusive one, however currently this is incorrect in the main branch

similar to the previous comment, I'd prefer to fix what's in main in a separate MR and keep this to what's needed for the disk size fix.

pkg/bootc/bootc_disk.go

germag

I think we need to design a proper cache API first. And split the information that persists in the disk from the info that only makes sense if the VM runs (like ssh port).

So, I think is better to do that in a different PR, so I prefer if in this PR you drop this commit and only make the changes for the disk size (if is that possible)

germag · 2024-05-16T16:14:49Z

I just realized the behavior of this is still a little odd.

e.g. the following will continue to use the 20G disk size.
podman-bootc run --disk-size 20G <image>
podman-bootc stop <image>
podman-bootc run <image>
Also running podman-bootc run --disk-size 20G <image> a second time after running podman-bootc stop will recreate the disk image. This is just an optimization but it's still not ideal.

Hmmm we should not do that, we need tp store the disk size in the xattr or the json file and only re-create it if is different

I think the general weirdness stems from after the initial run command, modifying the disk-size, etc. would behave more like an edit operation rather than run. I'm not sure we want to add an explicit edit operation like podman-bootc edit <image> --disk-size 20G though, since the cache is meant to be transparent.

Anyways, this PR gets us closer to the correct behavior so I'd prefer to get this in and refine the behavior in the future. The fixes for the above known issues will require us to load the existing cache/config earlier which likely means more refactoring.

ckyrouac · 2024-05-16T16:59:32Z

I think we need to design a proper cache API first. And split the information that persists in the disk from the info that only makes sense if the VM runs (like ssh port).

So, I think is better to do that in a different PR, so I prefer if in this PR you drop this commit and only make the changes for the disk size (if is that possible)

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

Hmmm we should not do that, we need tp store the disk size in the xattr or the json file and only re-create it if is different

Agreed. This will require loading the cache/config before the disk creation which requires more refactoring. I'm currently working through that. The refactor in this PR along with the cache/config refactor will get us most of the way to the simplified locking/cache loading.

I still think doing these separately makes sense to avoid a giant MR, and this fixes an existing bug. I'll fix the issues from the other suggestions and let you know when I'm done.

ckyrouac · 2024-05-16T19:05:59Z

actually not sure there are any code changes needed so this is ready for another look.

germag · 2024-05-17T08:49:18Z

I still think we can make this much simpler, without a partial refactor that makes the locking non-obviously worse.
Like, if we add a DiskSize field to diskFromContainerMeta:

type diskFromContainerMeta struct {
	// imageDigest is the digested sha256 of the container that was used to build this disk
	ImageDigest string `json:"imageDigest"`
        DiskSize string `json:"diskSize"`
}

and in bootcInstallImageToDisk():

bootcInstallImageToDisk() {
...
	serializedMeta := diskFromContainerMeta{
		ImageDigest: p.ImageId,
		DiskSize: diskConfig.DiskSize,
	}
...   
}

and finally checking the size ingetOrInstallImageToDisk():

getOrInstallImageToDisk() {
    ...
	if serializedMeta.ImageDigest == p.ImageId 
	   && (diskConfig.DiskSize == "" 
	       || serializedMeta.DiskSize == diskConfig.DiskSize) {
		return nil
	}
    ...
}

I think that will be enough, or am I missing something?

ckyrouac · 2024-05-17T12:16:25Z

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

If we don't add the code to check if the VM is running before doing the disk creation then the behavior is weird. If the VM is running, then doing podman-bootc run --disk-size will recreate the disk image before erroring because the VM is running.

that makes the locking non-obviously worse.

I don't see how this makes it worse. Could you give an example use case that would break with this?

germag · 2024-05-17T14:36:11Z

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

If we don't add the code to check if the VM is running before doing the disk creation then the behavior is weird. If the VM is running, then doing podman-bootc run --disk-size will recreate the disk image before erroring because the VM is running.

This is a problem right now in the main branch, so I'm ok of checking if the VM is running, and checking the error of os.Remove(), but I think those 2 changes should be in different commits. Sadly, we cornered ourselves requiring to call NewVm() to call .isRunning(), so let me think it a bit more, probably we can just extract the isRunning() to be a function instead a method.

My comment was only related to how to rebuild the image if the disk size requested is different, without the need to refactor

that makes the locking non-obviously worse.

I don't see how this makes it worse. Could you give an example use case that would break with this?

with this PR we install the disk holding a read-only lock instead of a write lock

ckyrouac · 2024-05-17T14:58:18Z

This is a problem right now in the main branch

If it already exists, the disk is never recreated in the main branch (which is what this PR fixes). So while the disk is created before checking if the vm is running in main, it doesn't result in a bug.

with this PR we install the disk holding a read-only lock instead of a write lock

Ah, I got it. I can add the exclusive lock back to the bootc.Install(). Would that be sufficient?

germag · 2024-05-20T13:20:35Z

This is a problem right now in the main branch

If it already exists, the disk is never recreated in the main branch (which is what this PR fixes). So while the disk is created before checking if the vm is running in main, it doesn't result in a bug.

with this PR we install the disk holding a read-only lock instead of a write lock

Ah, I got it. I can add the exclusive lock back to the bootc.Install(). Would that be sufficient?

Nop, in that case you will be unable to ssh into the VM, because the ssh command will try t acquire a read-only lock.
An ugly solution, will be to create a new VM inside of BootcDisk.Install(), so instead of :

locked, err := lock.TryLock(utils.Exclusive)

we do

vm.NewVM(vm.NewVMParameters{
    ...
    Locking:    utils.Exclusive,
    ...
}

so we can run

isRunning, err := bootcVM.IsRunning()

inside Install()

we can also move bootcVM.WriteConfig(*bootcDisk) inside Install() so we do that while holding a ReadWrite lock, we still have a race on the selection of the ssh port, but that is something we cannot fix rigth now

ckyrouac · 2024-05-20T14:52:56Z

I realized with this refactor, we are already able to decouple the lock from the VM/disk and do the lock once for each command instead. This is because the container image id and cache directory are now initialized first. Pushed a PR showing how this would work. All the e2e tests passed.

ckyrouac · 2024-05-20T18:55:00Z

@germag this is ready for another look

germag · 2024-05-22T15:00:23Z

cmd/list.go

 		LibvirtUri: libvirtUri,
-		Locking:    utils.Shared,
 	})

 	if err != nil {
 		return nil, err
 	}

-	// Let's be explicit instead of relying on the defer exec order
 	defer func() {
 		bootcVM.CloseConnection()
-		if err := bootcVM.Unlock(); err != nil {
-			logrus.Warningf("unable to unlock VM %s: %v", imageId, err)
-		}
 	}()

 	cfg, err := bootcVM.GetConfig()


bootcVM.GetConfig() is not holding any lock while reading the cache,

This creates two new structs, image and cache, and removes the code to pull the image and create the cache directory from bootc_disk. This is done in order to check if the VM is running before creating the disk to fail early. We need the pulled container image ID before checking if the VM is running. This also gets us closer to separately managing the cache dir which should simplify some of the lock code. This is in preparation for the code to clear the cached disk image when modifying the disk image size or filesystem type. Prior to these changes, the disk could be recreated before checking if the VM is running. Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>

Sometimes the disk image needs to be recreated even if reusing the same container image, for example when passing --disk-size to create a bootc disk with a larger disk size. Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>

germag · 2024-05-22T16:20:26Z

cmd/rm.go

+	cacheDir, err := cache.NewCache(id, user)
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Create()
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Lock(cache.Exclusive)
+	if err != nil {
+		return err
+	}


NewCache() calls FullImageIdFromPartial()that can potentially read the cache directory without a lock, also cacheDir.Create() creates a directory in the cache without a lock (the lock is requested later cacheDir.Lock())

so this is a bit of a chicken or the egg scenario. The source of truth for the fullImageId is currently the list of directories in the cache dir. So, the top level cache directory needs to be read before creating the lock. This is the same behavior as main. There are some options to avoid this but I think they are out of the scope of this PR, e.g. we could refactor the directories to be the shortImageId.

https://github.com/containers/podman-bootc/blob/main/pkg/vm/vm_linux.go#L42-L50

podman-bootc/pkg/vm/vm.go

Line 29 in 47a710d

files, err := os.ReadDir(user.CacheDir())

Yes, for reading we have a race, reading the dir and acquiring the lock, we can barely work around it, by checking the dir again after the lock. But, the real fix will be to use sqlite to store the list of VMs instead of reading the cache dir

germag · 2024-05-22T16:21:11Z

cmd/run.go

+	}
+	err = cacheDir.Create()
+	if err != nil {
+		return fmt.Errorf("unable to create cache: %w", err)
 	}

-	//start the VM
-	println("Booting the VM...")
-	sshPort, err := utils.GetFreeLocalTcpPort()
+	err = cacheDir.Lock(cache.Exclusive)


same, we are creating the dir before holding the lock

germag · 2024-05-22T16:23:31Z

cmd/ssh.go

+	cacheDir, err := cache.NewCache(id, user)
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Create()
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Lock(cache.Shared)
+	if err != nil {


why the ssh command needs to create a new dir? (also before the lock)

germag · 2024-05-22T16:23:59Z

cmd/stop.go

+	cacheDir, err := cache.NewCache(id, user)
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Create()
+	if err != nil {
+		return err
+	}
+	err = cacheDir.Lock(cache.Exclusive)
+	if err != nil {


Ditto ssh command

germag · 2024-05-22T16:25:51Z

pkg/bootc/bootc_disk.go

-		os.Remove(diskPath)
+		err = os.Remove(diskPath)
+		if err != nil {
+			return err
+		}


nit: this should be in its own commit

ckyrouac · 2024-05-22T19:26:25Z

@germag force pushed fixes

extracted the FullImageIdFromPartial call out of NewCache to enable locking before creating the cache
added lock to the list command
removed the unnecessary Create calls

Previously, due to the coupling of the container image pull to the bootc disk code, the cache directory couldn't be locked until the image id was obtained. Now that the image ID is retrieved first in the run function, the locks can be bound to each command. Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>

ckyrouac · 2024-05-28T12:47:58Z

@germag did you get a chance to look at this again?

germag reviewed May 16, 2024

View reviewed changes

germag requested changes May 16, 2024

View reviewed changes

ckyrouac force-pushed the disk-cache2 branch 3 times, most recently from 3b85830 to e1681d8 Compare May 20, 2024 18:53

germag reviewed May 22, 2024

View reviewed changes

ckyrouac added 2 commits May 22, 2024 12:14

germag reviewed May 22, 2024

View reviewed changes

ckyrouac force-pushed the disk-cache2 branch from e1681d8 to 7c4c34f Compare May 22, 2024 19:23

ckyrouac force-pushed the disk-cache2 branch from 7c4c34f to 20cadbb Compare May 22, 2024 19:27

ckyrouac force-pushed the disk-cache2 branch from 20cadbb to 0521272 Compare May 22, 2024 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk: Recreate the bootc disk image when passed certain parameters #33

disk: Recreate the bootc disk image when passed certain parameters #33

ckyrouac commented May 15, 2024

ckyrouac commented May 15, 2024

germag May 16, 2024 •

edited

ckyrouac May 16, 2024

germag May 16, 2024

ckyrouac May 16, 2024

germag left a comment

germag commented May 16, 2024

ckyrouac commented May 16, 2024

ckyrouac commented May 16, 2024

germag commented May 17, 2024

ckyrouac commented May 17, 2024

germag commented May 17, 2024

ckyrouac commented May 17, 2024

germag commented May 20, 2024

ckyrouac commented May 20, 2024

ckyrouac commented May 20, 2024

germag May 22, 2024

germag May 22, 2024

ckyrouac May 22, 2024

germag May 28, 2024

germag May 22, 2024

germag May 22, 2024

germag May 22, 2024

germag May 22, 2024

ckyrouac commented May 22, 2024

ckyrouac commented May 28, 2024

		if err = bootcVM.WriteConfig(*bootcDisk); err != nil {
		if err = bootcVM.WriteConfig(*bootcDisk, containerImage); err != nil {

disk: Recreate the bootc disk image when passed certain parameters #33

Are you sure you want to change the base?

disk: Recreate the bootc disk image when passed certain parameters #33

Conversation

ckyrouac commented May 15, 2024

ckyrouac commented May 15, 2024

germag May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

germag left a comment

Choose a reason for hiding this comment

germag commented May 16, 2024

ckyrouac commented May 16, 2024

ckyrouac commented May 16, 2024

germag commented May 17, 2024

ckyrouac commented May 17, 2024

germag commented May 17, 2024

ckyrouac commented May 17, 2024

germag commented May 20, 2024

ckyrouac commented May 20, 2024

ckyrouac commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ckyrouac commented May 22, 2024

ckyrouac commented May 28, 2024

germag May 16, 2024 •

edited