-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fail to update the domain.memory on existing running vm 'Call to virDomainSetMemory failed' #1812
Comments
first attempt , it prompt errors and it shutdown the VM.
|
On the 2nd attempt, after no changes, there was no error displayed and it start the VM
serial logs from /tmp/worker-1.log |
I found a very dirty workaround :
it shows the new memory size. |
hi, i cant reproduce this. Following verisons used:
ive tested with the following steps using a standard generic/debian12 box:
your logging output suggests that there is already something wrong at the point you first try to
-> the graceful shutdown is suspicious. I dont think this is an bug in vagrant-libvirt, rather an special situation with the virtual machine image you What i think happens:
According to the vagrant documentation: https://developer.hashicorp.com/vagrant/docs/cli/reload the "reload" action is an "halt" followed by an "up". In your case i guess "vagrant halt" is not working and as such |
There may be a bug with https://github.com/vagrant-libvirt/vagrant-libvirt/blob/main/lib/vagrant-libvirt/action/halt_domain.rb not waiting 1-2 seconds for the power off to take affect and ensure the machine is not running before continuing. Could be reaching the memory set call before the domain is powered off resulting in an attempt to change the memory being called on a running machine rather than a halted machine. |
By passing a flag to libvirt's API to apply the memory size change to the domain's configuration file, the following issue was solved. - fail to update the domain.memory on existing running vm 'Call to virDomainSetMemory failed' vagrant-libvirt#1812 vagrant-libvirt#1812 The problem was caused by passing only memory size to #memory= method in start_domain.rb. Specifying a flag in addition to the memory size causes libvirt's API, virDomainSetMemoryFlags, to be called with that flag, explicitly choosing whether to apply the size setting to the running domain or the config file for that domain. The missing flag was causing the above issue by attempting to change the memory size of a running domain that did not exist. Therefore, a list of possible flags is introduced, and a flag that make changes to the domain's configuration file is selected and used. Below are links to some related documents. - https://ruby.libvirt.org/api/Libvirt/Domain.html#method-i-memory-3D - https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainSetMemoryFlags
Hi, I experienced the same things that happened to @rubber-ant, and came across this page. Below is a supplement to the pull request. The problem seems to be solved by changing a line of if config.numa_nodes == nil
if config.memory.to_i * 1024 != libvirt_domain.max_memory
libvirt_domain.max_memory = config.memory.to_i * 1024
- libvirt_domain.memory = libvirt_domain.max_memory
+ # 2 corresponds to VIR_DOMAIN_AFFECT_CONFIG of the following flags
+ # https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainModificationImpact
+ libvirt_domain.memory = libvirt_domain.max_memory, 2
end
end I think what was happening was the following. Since the domain is not actually running at the time the memory size change is made, the change only needs to be made to the configuration file. The following that happened to @rubber-ant also happened to me.
|
I don't believe this is the correct fix, start domain action assumes it is starting from a halted domain and I'd expect other issues to show up if other config settings are changed and it reaches this point and the domain is still running. Suggest looking at performing a short wait after the forced halt checking for the domain to be powered off before continuing. That will ensure start calls the memory update only when the domain is stopped. |
Thank you for your reply!
Yes. Therefore, there is no running domain. Nevertheless, in
Sorry, I am not clear enough on this, what do you mean?
A |
This is assuming that the power off on the domain which hasn't taken effect before the memory change was called will be completed before the start of the domain is attempted as part of the start domain action. Essentially this focuses on the symptom of a race condition. The failure to apply the memory change indicates that when the force halt behaviour is needed (no ssh and no acpi shutdown is working), the poweroff (pull the plug) isn't waiting for the change in state to have taken effect before continuing to the next step. In this case it is reaching the memory set before libvirt has fully powered off the machine. In other systems even with this fix, it could reach trying to update the domain xml with other config settings and then power it on before it was fully shutdown. Essentially there is a race bug here.
The behaviour above indicates the domain is still running at the time and with this fix it is managing to complete a poweroff at some point between when the memory is being changed and the subsequent request to power on again. This is a race and appears likely to be a cause of further issues at some point in the future. It appears that it would be more reliable to fix the shutdown to ensure the domain is actually powered off fully. Most of the paths through the halt action that are called by reload appear to do this, however given the above log outputs, it becomes clear there is one path that is not handling this correctly. The issue is caused because the halt domain that is called as part of the reload vagrant-libvirt/lib/vagrant-libvirt/action.rb Lines 230 to 234 in a94ce0d
Should wait for the domain to finish halting within the vagrant-libvirt/lib/vagrant-libvirt/action.rb Lines 203 to 212 in a94ce0d
If the code goes through vagrant-libvirt/lib/vagrant-libvirt/action/halt_domain.rb Lines 16 to 20 in a94ce0d
What it should probably do is (unless vagrant-libvirt/lib/vagrant-libvirt/action/shutdown_domain.rb Lines 60 to 64 in a94ce0d
Sorry, I didn't mean it should be done in shell, I meant that within the plugin the halt action should be updated to wait for the halt to complete (unless the force flag is set) before continuing/returning.
|
Thanks for taking the time to explain it to me. My bad, I did not put up my logs in the first place, so here they are. Describe the bug To Reproduce
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box = "fedora/39-cloud-base"
config.vm.define "example" do |server|
server.vm.provider :libvirt do |domain|
domain.driver = "kvm"
#domain.memory = 2048
end
end
end
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box = "fedora/39-cloud-base"
config.vm.define "example" do |server|
server.vm.provider :libvirt do |domain|
domain.driver = "kvm"
domain.memory = 2048
end
end
end
The log is pasted at the bottom of this post.
Checking with virsh, the change is applied only to the maximum memory size.
This time the virtual machine starts up.
However, only the size of the swap is changed.
Checking with virsh, only the maximum memory size is still changed.
Expected behavior Versions (please complete the following information)::
Debug Log vagrant.log
A Vagrantfile to reproduce the issue: # -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box = "fedora/39-cloud-base"
config.vm.define "example" do |server|
server.vm.provider :libvirt do |domain|
domain.driver = "kvm"
domain.memory = 2048
end
end
end |
Sorry to bother you, but I still don't see how this problem can be attributed to a race condition caused by a series of In my case, the log attached to the above post appear to show that the shutdown is being done gracefully.
If the problem is really caused by moving from vagrant-libvirt/lib/vagrant-libvirt/action.rb Lines 203 to 212 in a94ce0d
b3.use StartShutdownTimer
b3.use Call, GracefulHalt, :shutoff, :running do |env3, b4|
if !env3[:result]
b4.use Call, ShutdownDomain, :shutoff, :running do |env4, b5|
if !env4[:result]
b5.use HaltDomain
end
end
end
> require "date"
> puts "sleep start"
> puts DateTime.now
> sleep 300
> puts "sleep end"
> puts DateTime.now
end Doing exactly the same thing as in the above post would create the same problem at
For this reason I don't see the cause on the By the way, I did the exact same thing as posted above with the following vagrant_debian.log
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
config.vm.box = "generic/debian12"
config.vm.define "example" do |server|
server.vm.provider :libvirt do |domain|
domain.driver = "kvm"
domain.memory = 4096
end
end
end |
Thanks, my original assumption was if it worked in one case and not in another it had to be to do with what was different between the two runs, the need for a power off rather than graceful shutdown. I need to go look at how this ever worked though and see if there is anything else that needs to be looked at (e.g. did libvirt behaviour change over time) |
It's entirely possible this never worked and the previous outter begin/rescue that I removed was previously masking that it was failing here. |
Describe the bug
after upgrade Vagrantfile from
domain.memory=10240
todomain.memory = 15240
and runsudo vagrant reload worker-1 --provision
it prompt an error message. ( see below)To Reproduce
Steps to reproduce the behavior:
vagrant up --provider=libvirt
sudo vagrant reload worker-1 --provision
Expected behavior
Reload the VM with new memory size
Versions (please complete the following information)::
vagrant version
]:Using Ubuntu 22.04
vagrant plugin list
]:Debug Log
Attach Output of
VAGRANT_LOG=debug vagrant ... --provider=libvirt >vagrant.log 2>&1
A Vagrantfile to reproduce the issue:
The text was updated successfully, but these errors were encountered: