Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk: Recreate the bootc disk image when passed certain parameters #33

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ckyrouac
Copy link
Collaborator

Sometimes the disk image needs to be recreated even if reusing
the same container image, for example when passing --disk-size
to create a bootc disk with a larger disk size.


disk: Refactor image pull and cache dir creation

This creates two new structs, image and cache, and removes the code to
pull the image and create the cache directory from bootc_disk. This is
done in order to check if the VM is running before creating the disk to
fail early. We need the pulled container image ID before checking
if the VM is running.

This also gets us closer to separately managing the cache dir which should
simplify some of the lock code.

This is in preparation for the code to clear the cached
disk image when modifying the disk image size or filesystem type. Prior
to these changes, the disk could be recreated before checking if the VM
is running.

@ckyrouac
Copy link
Collaborator Author

I just realized the behavior of this is still a little odd.

e.g. the following will continue to use the 20G disk size.

podman-bootc run --disk-size 20G <image>
podman-bootc stop <image>
podman-bootc run <image>

Also running podman-bootc run --disk-size 20G <image> a second time after running podman-bootc stop will recreate the disk image. This is just an optimization but it's still not ideal.

I think the general weirdness stems from after the initial run command, modifying the disk-size, etc. would behave more like an edit operation rather than run. I'm not sure we want to add an explicit edit operation like podman-bootc edit <image> --disk-size 20G though, since the cache is meant to be transparent.

Anyways, this PR gets us closer to the correct behavior so I'd prefer to get this in and refine the behavior in the future. The fixes for the above known issues will require us to load the existing cache/config earlier which likely means more refactoring.

cmd/run.go Show resolved Hide resolved
cmd/run.go Outdated
Comment on lines 139 to 164
// create the disk image
bootcDisk := bootc.NewBootcDisk(containerImage, ctx, user, cache)
err = bootcDisk.Install(vmConfig.Quiet, diskImageConfigInstance)

Copy link
Collaborator

@germag germag May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the very useful and great github UI /s, doesn't allow me to add a comment including no-changed lines, so this comment includes line 120 bootcVM, err := vm.NewVM(vm.NewVMParameters{)

This is incorrect locking, the locks from bootcDisk.Install() now are in the cache.Create(), that part is fine, but now we are holding the "shared" lock from bootcVM, but we are changing the content of the cache directory, so we must get an "exclusive" lock, but we cannot ask for an exclusive lock in NewVM() because will be not allowed to ssh into it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache.Create() takes an exclusive lock and frees it after creating the cache directory. vm.NewVM() in run takes a shared lock in main, so this is not introducing a new bug. This will be much simpler to fix with the cache/config loading refactor I am working on for a separate MR. I'd rather not try to fix everything in this MR and just keep the scope to what is needed for the disk size fix.

Comment on lines -156 to +194
if err = bootcVM.WriteConfig(*bootcDisk); err != nil {
if err = bootcVM.WriteConfig(*bootcDisk, containerImage); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also changing the cache using a shared lock instead of an exclusive one, however currently this is incorrect in the main branch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the previous comment, I'd prefer to fix what's in main in a separate MR and keep this to what's needed for the disk size fix.

pkg/bootc/bootc_disk.go Show resolved Hide resolved
Copy link
Collaborator

@germag germag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to design a proper cache API first. And split the information that persists in the disk from the info that only makes sense if the VM runs (like ssh port).

So, I think is better to do that in a different PR, so I prefer if in this PR you drop this commit and only make the changes for the disk size (if is that possible)

@germag
Copy link
Collaborator

germag commented May 16, 2024

I just realized the behavior of this is still a little odd.

e.g. the following will continue to use the 20G disk size.

podman-bootc run --disk-size 20G <image>
podman-bootc stop <image>
podman-bootc run <image>

Also running podman-bootc run --disk-size 20G <image> a second time after running podman-bootc stop will recreate the disk image. This is just an optimization but it's still not ideal.

Hmmm we should not do that, we need tp store the disk size in the xattr or the json file and only re-create it if is different

I think the general weirdness stems from after the initial run command, modifying the disk-size, etc. would behave more like an edit operation rather than run. I'm not sure we want to add an explicit edit operation like podman-bootc edit <image> --disk-size 20G though, since the cache is meant to be transparent.

Anyways, this PR gets us closer to the correct behavior so I'd prefer to get this in and refine the behavior in the future. The fixes for the above known issues will require us to load the existing cache/config earlier which likely means more refactoring.

@ckyrouac
Copy link
Collaborator Author

I think we need to design a proper cache API first. And split the information that persists in the disk from the info that only makes sense if the VM runs (like ssh port).

So, I think is better to do that in a different PR, so I prefer if in this PR you drop this commit and only make the changes for the disk size (if is that possible)

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

Hmmm we should not do that, we need tp store the disk size in the xattr or the json file and only re-create it if is different

Agreed. This will require loading the cache/config before the disk creation which requires more refactoring. I'm currently working through that. The refactor in this PR along with the cache/config refactor will get us most of the way to the simplified locking/cache loading.

I still think doing these separately makes sense to avoid a giant MR, and this fixes an existing bug. I'll fix the issues from the other suggestions and let you know when I'm done.

@ckyrouac
Copy link
Collaborator Author

actually not sure there are any code changes needed so this is ready for another look.

@germag
Copy link
Collaborator

germag commented May 17, 2024

I still think we can make this much simpler, without a partial refactor that makes the locking non-obviously worse.
Like, if we add a DiskSize field to diskFromContainerMeta:

type diskFromContainerMeta struct {
	// imageDigest is the digested sha256 of the container that was used to build this disk
	ImageDigest string `json:"imageDigest"`
        DiskSize string `json:"diskSize"`
}

and in bootcInstallImageToDisk():

bootcInstallImageToDisk() {
...
	serializedMeta := diskFromContainerMeta{
		ImageDigest: p.ImageId,
		DiskSize: diskConfig.DiskSize,
	}
...   
}

and finally checking the size ingetOrInstallImageToDisk():

getOrInstallImageToDisk() {
    ...
	if serializedMeta.ImageDigest == p.ImageId 
	   && (diskConfig.DiskSize == "" 
	       || serializedMeta.DiskSize == diskConfig.DiskSize) {
		return nil
	}
    ...
}

I think that will be enough, or am I missing something?

@ckyrouac
Copy link
Collaborator Author

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

If we don't add the code to check if the VM is running before doing the disk creation then the behavior is weird. If the VM is running, then doing podman-bootc run --disk-size will recreate the disk image before erroring because the VM is running.

that makes the locking non-obviously worse.

I don't see how this makes it worse. Could you give an example use case that would break with this?

@germag
Copy link
Collaborator

germag commented May 17, 2024

Unfortunately the refactor commit is required. Otherwise, the disk will be recreated even if the VM is running because currently the VM.isRunning gate happens after the disk creation.

If we don't add the code to check if the VM is running before doing the disk creation then the behavior is weird. If the VM is running, then doing podman-bootc run --disk-size will recreate the disk image before erroring because the VM is running.

This is a problem right now in the main branch, so I'm ok of checking if the VM is running, and checking the error of os.Remove(), but I think those 2 changes should be in different commits. Sadly, we cornered ourselves requiring to call NewVm() to call .isRunning(), so let me think it a bit more, probably we can just extract the isRunning() to be a function instead a method.

My comment was only related to how to rebuild the image if the disk size requested is different, without the need to refactor

that makes the locking non-obviously worse.

I don't see how this makes it worse. Could you give an example use case that would break with this?

with this PR we install the disk holding a read-only lock instead of a write lock

@ckyrouac
Copy link
Collaborator Author

This is a problem right now in the main branch

If it already exists, the disk is never recreated in the main branch (which is what this PR fixes). So while the disk is created before checking if the vm is running in main, it doesn't result in a bug.

with this PR we install the disk holding a read-only lock instead of a write lock

Ah, I got it. I can add the exclusive lock back to the bootc.Install(). Would that be sufficient?

@germag
Copy link
Collaborator

germag commented May 20, 2024

This is a problem right now in the main branch

If it already exists, the disk is never recreated in the main branch (which is what this PR fixes). So while the disk is created before checking if the vm is running in main, it doesn't result in a bug.

with this PR we install the disk holding a read-only lock instead of a write lock

Ah, I got it. I can add the exclusive lock back to the bootc.Install(). Would that be sufficient?

Nop, in that case you will be unable to ssh into the VM, because the ssh command will try t acquire a read-only lock.
An ugly solution, will be to create a new VM inside of BootcDisk.Install(), so instead of :

locked, err := lock.TryLock(utils.Exclusive)

we do

vm.NewVM(vm.NewVMParameters{
    ...
    Locking:    utils.Exclusive,
    ...
}

so we can run

isRunning, err := bootcVM.IsRunning()

inside Install()

we can also move bootcVM.WriteConfig(*bootcDisk) inside Install() so we do that while holding a ReadWrite lock, we still have a race on the selection of the ssh port, but that is something we cannot fix rigth now

@ckyrouac
Copy link
Collaborator Author

I realized with this refactor, we are already able to decouple the lock from the VM/disk and do the lock once for each command instead. This is because the container image id and cache directory are now initialized first. Pushed a PR showing how this would work. All the e2e tests passed.

@ckyrouac ckyrouac force-pushed the disk-cache2 branch 3 times, most recently from 3b85830 to e1681d8 Compare May 20, 2024 18:53
@ckyrouac
Copy link
Collaborator Author

@germag this is ready for another look

Comment on lines 85 to 112
LibvirtUri: libvirtUri,
Locking: utils.Shared,
})

if err != nil {
return nil, err
}

// Let's be explicit instead of relying on the defer exec order
defer func() {
bootcVM.CloseConnection()
if err := bootcVM.Unlock(); err != nil {
logrus.Warningf("unable to unlock VM %s: %v", imageId, err)
}
}()

cfg, err := bootcVM.GetConfig()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootcVM.GetConfig() is not holding any lock while reading the cache,

This creates two new structs, image and cache, and removes the code to
pull the image and create the cache directory from bootc_disk. This is
done in order to check if the VM is running before creating the disk to
fail early. We need the pulled container image ID before checking
if the VM is running.

This also gets us closer to separately managing the cache dir which should
simplify some of the lock code.

This is in preparation for the code to clear the cached
disk image when modifying the disk image size or filesystem type. Prior
to these changes, the disk could be recreated before checking if the VM
is running.

Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>
Sometimes the disk image needs to be recreated even if reusing
the same container image, for example when passing --disk-size
to create a bootc disk with a larger disk size.

Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>
cmd/rm.go Outdated
Comment on lines 62 to 74
cacheDir, err := cache.NewCache(id, user)
if err != nil {
return err
}
err = cacheDir.Create()
if err != nil {
return err
}
err = cacheDir.Lock(cache.Exclusive)
if err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewCache() calls FullImageIdFromPartial()that can potentially read the cache directory without a lock, also cacheDir.Create() creates a directory in the cache without a lock (the lock is requested later cacheDir.Lock())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is a bit of a chicken or the egg scenario. The source of truth for the fullImageId is currently the list of directories in the cache dir. So, the top level cache directory needs to be read before creating the lock. This is the same behavior as main. There are some options to avoid this but I think they are out of the scope of this PR, e.g. we could refactor the directories to be the shortImageId.

https://github.com/containers/podman-bootc/blob/main/pkg/vm/vm_linux.go#L42-L50

files, err := os.ReadDir(user.CacheDir())

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for reading we have a race, reading the dir and acquiring the lock, we can barely work around it, by checking the dir again after the lock. But, the real fix will be to use sqlite to store the list of VMs instead of reading the cache dir

cmd/run.go Outdated
Comment on lines 116 to 117
}
err = cacheDir.Create()
if err != nil {
return fmt.Errorf("unable to create cache: %w", err)
}

//start the VM
println("Booting the VM...")
sshPort, err := utils.GetFreeLocalTcpPort()
err = cacheDir.Lock(cache.Exclusive)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, we are creating the dir before holding the lock

cmd/ssh.go Outdated
Comment on lines 37 to 46
cacheDir, err := cache.NewCache(id, user)
if err != nil {
return err
}
err = cacheDir.Create()
if err != nil {
return err
}
err = cacheDir.Lock(cache.Shared)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the ssh command needs to create a new dir? (also before the lock)

cmd/stop.go Outdated
Comment on lines 34 to 44
cacheDir, err := cache.NewCache(id, user)
if err != nil {
return err
}
err = cacheDir.Create()
if err != nil {
return err
}
err = cacheDir.Lock(cache.Exclusive)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto ssh command

Comment on lines -155 to +171
os.Remove(diskPath)
err = os.Remove(diskPath)
if err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be in its own commit

@ckyrouac
Copy link
Collaborator Author

@germag force pushed fixes

  • extracted the FullImageIdFromPartial call out of NewCache to enable locking before creating the cache
  • added lock to the list command
  • removed the unnecessary Create calls

Previously, due to the coupling of the container image pull to the bootc
disk code, the cache directory couldn't be locked until the image id was
obtained. Now that the image ID is retrieved first in the run function,
the locks can be bound to each command.

Signed-off-by: Chris Kyrouac <ckyrouac@redhat.com>
@ckyrouac
Copy link
Collaborator Author

@germag did you get a chance to look at this again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants