Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vt acceleration options for qemu #94

Open
rowlandwatkins opened this issue Feb 6, 2018 · 35 comments
Open

vt acceleration options for qemu #94

rowlandwatkins opened this issue Feb 6, 2018 · 35 comments

Comments

@rowlandwatkins
Copy link

Hi folks,

Is there any way I can add the vmx option to qemu's commandline arguments? Running under qemu at present is rather sloooow....

Cheers,

Rowland

@miha-plesko
Copy link

Hi @rowlandwatkins , there are two things to check:

  • is KVM acceleration enabled?
  • is QEMU aio type set to native?
$ capstan config print
--- global configuration
CAPSTAN_ROOT: /home/miha/.capstan
CAPSTAN_REPO_URL: https://mikelangelo-capstan.s3.amazonaws.com/
CAPSTAN_DISABLE_KVM: false      # <--------------------------------
CAPSTAN_QEMU_AIO_TYPE: native   # <--------------------------------

Please see this document about how to configure these parameters, but basically you just need to create $HOME/.capstan/config.yaml file with following content:

disable_kvm: false
qemu_aio_type: native

However, please make sure that KVM acceleration (which is basically vmx that you've mentioned) is really supported:

$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

and that your QEMU supports native aio type (see this PR).

What step is slow for you, capstan package compose?

@rowlandwatkins
Copy link
Author

rowlandwatkins commented Feb 7, 2018

Hi @miha-plesko yeah I'm noticing a slowdown compared to the old Capstan. Here are the command line arguments:

Old Capstan:

/usr/bin/qemu-system-x86_64 [-nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/rowland/.capstan/instances/qemu/blah/disk.qcow2,if=none,id=hd0,aio=native,cache.direct=off,cache=writeback -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/rowland/.capstan/instances/qemu/blah/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -enable-kvm -cpu host,+x2apic]

New Capstan:

/usr/bin/qemu-system-x86_64 -nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/rowland/.capstan/instances/qemu/bobunikernel/disk.qcow2,if=none,id=hd0,aio=native,cache=none -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/rowland/.capstan/instances/qemu/bobunikernel/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control

One other thing worth mentioning is that I'm running all this in nester KVM. This might be why --enable-kvm is in the old capstan. When we build images on jenkins it's also dog slow, but we know that AWS doesn't support KVM so we disable it.

Cheers,

Rowland

@miha-plesko
Copy link

Hey @rowlandwatkins , as you say, the most notable difference between the two commands above is -enable-kvm switch. I can confirm that QEMU operates horribly slow when KVM acceleration is not supported - so could you please try to turn it on?

When I run image with new Capstan, I can see that the -enable-kvm switch is turned on as well:

/usr/bin/qemu-system-x86_64 -nographic -m 1024 -smp 2 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0 -drive file=/home/miha/.capstan/instances/qemu/demo/disk.qcow2,if=none,id=hd0,aio=native,cache=none -device virtio-rng-pci -chardev stdio,mux=on,id=stdio,signal=off -device isa-serial,chardev=stdio -netdev user,id=un0,net=192.168.122.0/24,host=192.168.122.1 -device virtio-net-pci,netdev=un0 -chardev socket,id=charmonitor,path=/home/miha/.capstan/instances/qemu/demo/osv.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -enable-kvm -cpu host,+x2apic

So could you please try to either provide $HOME/.capstan/config.yaml (see my comment above) or specify following environment variable:

$ export DISABLE_KVM=false
$ capstan run myunikernel

Doing this, the -enable-kvm switch should appear and things should work horse fast 😄 Naturally, this won't work on the AWS, if nested KVM is not supported there. But still - I see no reason why old Capstan would work any faster than the new one, because they literally use the same command underneath.

@miha-plesko
Copy link

@rowlandwatkins in case you are interested, there is a way to speed up your Jenkins build on AWS, but it's not really tested yet (I've only tested it locally, see this PR and this thread). In short, you can run OSv unikernels natively on AWS instead inside Jenkins VM - this way KVM acceleration will be present.

"capstan compose-remote" approach could be used like this:

  • upload base unikernel to the AWS as an AMI image (needed only once)
  • run EC2 instance out of it (when building new unikernel)
  • inside Jenkins VM then just replace your current capstan package compose myunikernel command with capstan package compose-remote IP where IP is the IP of the OSv EC2 instance

And viola! You have your OSv unikernel composed natively on the AWS. You will need to restart it in order for new bootcmd to be run.

Now the odd thing with this approach is that AWS interaction is not yet integrated into Casptan (upload AMI image, boot EC2 instace out of it, reboot EC2 instance...) so you'd need to automate this by yourself (e.g. with Jenkins). I'd be glad to support you, but please bear in mind I'm doing it in my spare time.

BTW: Here is some documentation for preparing base unikernel: compose-remote-base, and for contextualizing it remotely: compose-remote.

@wkozaczuk - did you happen to test compose-remote approach on AWS? I only tested it locally and it worked.

@rowlandwatkins
Copy link
Author

Hi @miha-plesko strange, I modified $HOME/.capstan/config.yaml, perhaps I need to restart my shell. I'll also try setting the env variable as you suggest.

Regarding your idea on AWS - we currently do something similar - we have a horrific groovy script which manages AMI lifecycle on EC2. We build a custom OSv image, copy it into an ebs blockstore, snapshot it, then create an AMI, then start the AMI, set DNS etc (including all the code to poll for readiness). One reason for doing this is that I'm not keen on modifying a running image - I'd like to have Jenkins tag image versions so we can handle changes and effectively make each image immutable. We can replace an image in nearly under 10 mins, which isn't bad. I'll take a look at the remote approach, if anything, we may like to snapshot after capstan package compose-remote IP, then create another AMI from that.

@miha-plesko
Copy link

You can use capstan config print to print current Capstan configuration into console

@wkozaczuk
Copy link

wkozaczuk commented Feb 7, 2018 via email

@rowlandwatkins
Copy link
Author

rowlandwatkins commented Feb 7, 2018

@wkozaczuk very cool, I'll try this approach then. Currently, snapshots take over a minute, so will be nice to try this route. In particular, this will remove the need to copy an entire base image to ebs before snapshotting, saving several minutes needed to create, copy and then delete the ebs.

@rowlandwatkins
Copy link
Author

@miha-plesko Found the issue with KVM acceleration detection - you assume kvm-ok exists (Ubuntu) has it, but Arch-derivatives don't. I just nabbed the bash script from launchpad and it worked. You might want to update the docs to reflect the need for kvm-ok (cpu-checker in Ubuntu), but not sure how many other distros keep it in their repos.

Cheers,

Rowland

1 similar comment
@rowlandwatkins
Copy link
Author

@miha-plesko Found the issue with KVM acceleration detection - you assume kvm-ok exists (Ubuntu) has it, but Arch-derivatives don't. I just nabbed the bash script from launchpad and it worked. You might want to update the docs to reflect the need for kvm-ok (cpu-checker in Ubuntu), but not sure how many other distros keep it in their repos.

Cheers,

Rowland

@rowlandwatkins
Copy link
Author

hehe, my problem now is that the VM doesn't boot ;) Just hangs on "Booting from disk..."

@miha-plesko
Copy link

Hm, that's strange, I didn't know we have this kvm-ok dependency in code - I've opened an issue for that.

Unfortunately I have no idea what prevents your unikernel from starting, but is it possible that KVM actually isn't enabled? Could you try to verify that by other means than kvm-ok, e.g.:

$ cat /proc/cpuinfo | grep vmx
# should yield at least one CPU with vmx flag

See this article. Looking forward to be fixing this problem!

@rowlandwatkins
Copy link
Author

Ok, so digging further, KVM appears to be fragged - not sure if it’s my kernel or my distro (Artix). If I turn off cpu counters it gets upset about msrs, but otherwise it crashes on pretty much everything which is frustrating. Cpuinfo has vmx listed so the capabilities are there, but it’s also possible qemu has changed a lot too (previously using 2.5.0, now using 2.11.x).

@miha-plesko
Copy link

@rowlandwatkins can you please try setting aio to "threads"? I remember we had some problems with aio "native" even on QEMU 2.5.0, that's why we now set it to "threads" by default now. Just put this into your $HOME/.capstan/config.yaml:

qemu_aio_type: threads

@rowlandwatkins
Copy link
Author

Hi @miha-plesko alas no improvement, all diagnostics suggest that KVM acceleration should be available, but it's clearly not working. I'll try a clean VM tomorrow to test nested KVM again.

@rowlandwatkins
Copy link
Author

Soooo, I've torn through several Artix, Devuan and Ubuntu installs to come to the conclusion that something is seriously wrong with qemu/kvm on recent kernels.

My current symptom is running capstan package compose somekernel --pull-missing just freezes on "setting cmdline:: --norandom "

KVM claims to work and it did prior to updating to Ubuntu 17.04 - capstan run somekernel now just freezes qemu, requiring a killall.

Do you folks know any way to find out where qemu is failing?

@rowlandwatkins
Copy link
Author

current configuration:
VMware workstation 14.1.1 build-7528167
Ubuntu 17.04
Qemu 2.10.1
Linux kernel 4.13.0-36-generic

@miha-plesko
Copy link

Does adding capstan package compose ... -v flag yield any more logs?

@rowlandwatkins
Copy link
Author

@miha-plesko I see the list of .capstanignore folders but that's it - qemu stops working whether I tell it to use kvm or not

@rowlandwatkins
Copy link
Author

OK, so a fresh 16.04 with qemu 2.5.0 works fine....

@rowlandwatkins
Copy link
Author

super bizarre

@miha-plesko
Copy link

@gasper-vrhovsek I think you managed to run Capstan on Ubuntu 17.04, right?
@rowlandwatkins can you please show us your capstan config print?

@rowlandwatkins
Copy link
Author

Alas, no, it seems my earlier successes were actually 16.04 LTS. I then upgraded to 17.04 for bug fixes, spectre, etc.

The bigger issues it seems in the qemu 2.10.x line revolves around the need to set cpu counters when doing nested virtualization. Qemu 2.5.0 doesn’t throw an error complaining about failed to set msrs - only by passing through the cpu counters in qemu 2.10.x does it stop failing but then silently fails forever on doing anything useful. This happens on Artix, Devuan and Ubuntu, so perhaps less a kernel issue, but this seems to be a combinatorial packaging issue.

I’m currently upgrading my 16.04 test vm to 17.04 to validate my assumption regarding qemu 2.10.x. If it does fail in a reliable fashion, I’ll try upgrading again to 17.10 to see if qemu 2.11.x acts any differently.

@rowlandwatkins
Copy link
Author

Apologies, misunderstood who you were addressing regarding qemu under 17.04! I’m curious if Gasper has a similar setup under VMware...

@gasper-vrhovsek
Copy link

Hi @rowlandwatkins @miha-plesko i think i was still on 16.10 at the time, and for me the problem was solved with the qemu_aio_type fix @miha-plesko suggested earlier (#77). Hadn't yet tried on 17.04.

@rowlandwatkins
Copy link
Author

Hi @gasper-vrhovsek, thanks for the tip, I’ll also try changing the qemu_aio_type on 17.04 and see what happens.

@miha-plesko
Copy link

@rowlandwatkins if you run

$ capstan config print

you will see in console if you have this enabled or not.

@rowlandwatkins
Copy link
Author

@miha-plesko, yeah I think I do have “native” activated. I’ll try removing from the config first, and test again in the morning. Not sure if this will remove the cpu counters issue, but at least we can rule out if aio is misbehaving.

@rowlandwatkins
Copy link
Author

Right, the plot thickens...

Running capstan 0.3.0 on Ubuntu 17.10 causes the msr failure:

kvm.c:1797:kvm_put_msrs: Assertion ret == cpu->kvm_msr_buf->nmsrs failed

Qemu version: Debian 1:2.10+dfsg-0ubuntu3.5

According to Ubuntu package lists, 2.11 is slated for Bionic, which I think is 18.04.

Now, activating "virtualise CPU performance counters" removes the above assertion error, but leaves qemu forever hanging.

So, 17.10 upgrade won't help, this looks more like a qemu issue, perhaps I'll even run a custom version because of this...

@rowlandwatkins
Copy link
Author

rowlandwatkins commented Mar 16, 2018

OK, I think we have a winner!

See: https://bugs.launchpad.net/qemu/+bug/1636217

Environment:

  1. nested virtualisation on VMWare (could be VMWare specific)
  2. qemu 2.7+
  3. Virtio PCI

There appears to be a qemu bug when using Virtio disk drive. The above link gives a rundown, specifically affecting KVM-accel in nested virtualisation. It looks like this was partially caused by changes to SeaBIOS, but doesn't always help. The following solutions exist:

a) don't use KVM
b) add "-machine type=pc-i440fx-x" where x <= 2.6 (I guess this then uses a different bios?)
c) with pci device "disable-modern=on"

Is there any way with the new capstan to add arbitrary qemu options?

I'm currently using option c (qemu 2.10.1) and capstan works just like using qemu 2.5.0.

EDIT: the reason why qemu was hanging on boot was due to the bios not behaving with virtio - the result is no boot device, so seabios just sits there wondering what to do next...

Cheers!

@rowlandwatkins
Copy link
Author

I've modified my local capstan to use the "-machine" switch and now vm building and running work correctly!

@miha-plesko
Copy link

miha-plesko commented Mar 16, 2018

We want PullRequest or it didn't happen! 😄

(But thans for diving into this, I'm so happy you managed to get it to work!)

@rowlandwatkins
Copy link
Author

@miha-plesko Well this is a interesting question: how do we want to patch it? I don't know how many others use nested virtualisation while using kvm on newer versions of qemu. Do you guys want it as a conditional? I really don't know the extent of the issue or whether setting -machine on qemu < 2.7 would be an issue.

All I've done is modify hypervisor/qemu/qemu.go:322
args = append(args, "-machine", "type=pc-i440fx-2.6")

Cheers

@miha-plesko
Copy link

Thanks for the exact diff, now I can easily integrate it myself (unless you want to have official contribution on Capstan repository ;). I think introducing a new option qemu_machine in .capstan/config.yaml:

qemu_machine_type: pc-i440fx-2.6

would be best. And if user specifies nothing, then no -machine flag would be added. This way we have all systems covered, even if not automatically.

@rowlandwatkins
Copy link
Author

My pleasure, capstan and osv are great projects, and have helped me a lot. This particular issue has been plaguing me for some time, so it's nice to have a solution that stops me fire fighting. Cool, qemu_machine_type sounds good :)

Maybe also add some blurb to the wiki to help others in a similar position:

For those using nested virtualisation in VMWare, be aware that for qemu versions > 2.6.0
there are some virtio Virtual Disk issues when run with KVM acceleration:

  1. In Virtual Machine Settings | Processors, set Virtualize Intel VT-x/EPT, and Virtualize CPU Performance Counters
  2. In .capstan/config.yaml set qemu_machine_type: pc-i440fx-2.6

With the above modifications, generating and running OSv image should be painless and fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants