* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) [not found] <CAPUexz9mg8wtAkWKfQLqoFgTQ6i+2pC4bGSkTwCEq-nQZin1hg@mail.gmail.com> @ 2018-02-07 15:31 ` Kashyap Chamarthy 2018-02-07 22:26 ` David Hildenbrand 0 siblings, 1 reply; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-07 15:31 UTC (permalink / raw) To: Florian Haas; +Cc: libvirt-users, kvm [-- Attachment #1: Type: text/plain, Size: 2768 bytes --] [Cc: KVM upstream list.] On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: > Hi everyone, > > I hope this is the correct list to discuss this issue; please feel > free to redirect me otherwise. > > I have a nested virtualization setup that looks as follows: > > - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) > - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default > - Nested guest: SLES 12, kernel 3.12.28-4-default > > The nested guest is configured with "<type arch='x86_64' > machine='pc-i440fx-1.4'>hvm</type>". > > This is working just beautifully, except when the L0 guest wakes up > from managed save (openstack server resume in OpenStack parlance). > Then, in the L0 guest we immediately see this: [...] # Snip the call trace from Florian. It is here: https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html > What does fix things, of course, is to switch from the nested guest > from KVM to Qemu — but that also makes things significantly slower. > > So I'm wondering: is there someone reading this who does run nested > KVM and has managed to successfully live-migrate or managed-save? If > so, would you be able to share a working host kernel / L0 guest kernel > / nested guest kernel combination, or any other hints for tuning the > L0 guest to support managed save and live migration? Following up from our IRC discussion (on #kvm, Freenode). Re-posting my comment here: So I just did a test of 'managedsave' (which is just "save the state of the running VM to a file" in libvirt parlance) of L1, _while_ L2 is running, and I seem to reproduce your case (see the call trace attached). # Ensure L2 (the nested guest) is running on L1. Then, from L0, do # the following: [L0] $ virsh managedsave L1 [L0] $ virsh start L1 --console Result: See the call trace attached to this bug. But L1 goes on to start "fine", and L2 keeps running, too. But things start to seem weird. As in: I try to safely, read-only mount the L2 disk image via libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call trace again on the L1 serial console. And the `guestfish` command just sits there forever - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 - L2 is a CirrOS 3.5 image I can reproduce this at least 3 times, with the above versions. I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in QEMU parlance) for both L1 and L2. My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. Thoughts? --- [/me wonders if I'll be asked to reproduce this with newest upstream kernels.] [...] -- /kashyap [-- Attachment #2: L1-call-trace-on-start-from-managed-save.txt --] [-- Type: text/plain, Size: 3975 bytes --] $> virsh start f26-devstack --console Domain f26-devstack started Connected to domain f26-devstack Escape character is ^] [ 1323.605321] ------------[ cut here ]------------ [ 1323.608653] kernel BUG at arch/x86/kvm/x86.c:336! [ 1323.611661] invalid opcode: 0000 [#1] SMP [ 1323.614221] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sb_edac edac_core kvm_intel openvswitch nf_conntrack_ipv6 kvm nf_nat_ipv6 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack irqbypass cr ct10dif_pclmul sunrpc crc32_pclmul ppdev ghash_clmulni_intel parport_pc joydev virtio_net virtio_balloon parport tpm_tis i2c_piix4 tpm_tis_core tpm xfs libcrc32c virtio_blk virtio_console vi rtio_rng crc32c_intel serio_raw virtio_pci ata_generic virtio_ring virtio pata_acpi qemu_fw_cfg [ 1323.645674] CPU: 0 PID: 18587 Comm: CPU 0/KVM Not tainted 4.11.10-300.fc26.x86_64 #1 [ 1323.649592] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-1.fc27 04/01/2014 [ 1323.653935] task: ffff8b5be13ca580 task.stack: ffffa8b78147c000 [ 1323.656783] RIP: 0010:kvm_spurious_fault+0x9/0x10 [kvm] [ 1323.659317] RSP: 0018:ffffa8b78147fc78 EFLAGS: 00010246 [ 1323.661808] RAX: 0000000000000000 RBX: ffff8b5be13c0000 RCX: 0000000000000000 [ 1323.665077] RDX: 0000000000006820 RSI: 0000000000000292 RDI: ffff8b5be13c0000 [ 1323.668287] RBP: ffffa8b78147fc78 R08: ffff8b5be13c0090 R09: 0000000000000000 [ 1323.671515] R10: ffffa8b78147fbf8 R11: 0000000000000000 R12: ffff8b5be13c0088 [ 1323.674598] R13: 0000000000000001 R14: 00000131e2372ee6 R15: ffff8b5be1360040 [ 1323.677643] FS: 00007fd602aff700(0000) GS:ffff8b5bffc00000(0000) knlGS:0000000000000000 [ 1323.681130] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1323.683628] CR2: 000055d650532c20 CR3: 0000000221260000 CR4: 00000000001426f0 [ 1323.686697] Call Trace: [ 1323.687817] intel_pmu_get_msr+0xd23/0x3f44 [kvm_intel] [ 1323.690151] ? vmx_interrupt_allowed+0x19/0x40 [kvm_intel] [ 1323.692583] kvm_arch_vcpu_runnable+0xa5/0xe0 [kvm] [ 1323.694767] kvm_vcpu_check_block+0x12/0x50 [kvm] [ 1323.696858] kvm_vcpu_block+0xa3/0x2f0 [kvm] [ 1323.698762] kvm_arch_vcpu_ioctl_run+0x165/0x16a0 [kvm] [ 1323.701079] ? kvm_arch_vcpu_load+0x6d/0x290 [kvm] [ 1323.703175] ? __check_object_size+0xbb/0x1b3 [ 1323.705109] kvm_vcpu_ioctl+0x2a6/0x620 [kvm] [ 1323.707021] ? kvm_vcpu_ioctl+0x2a6/0x620 [kvm] [ 1323.709006] do_vfs_ioctl+0xa5/0x600 [ 1323.710570] SyS_ioctl+0x79/0x90 [ 1323.712011] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 1323.714033] RIP: 0033:0x7fd610fb35e7 [ 1323.715601] RSP: 002b:00007fd602afe7c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 1323.718869] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fd610fb35e7 [ 1323.721972] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000013 [ 1323.725044] RBP: 0000563dab190300 R08: 0000563dab1ab7d0 R09: 01fc2de3f821e99c [ 1323.728124] R10: 000000003b9aca00 R11: 0000000000000246 R12: 0000563dadce20a6 [ 1323.731195] R13: 0000000000000000 R14: 00007fd61a84c000 R15: 0000563dadce2000 [ 1323.734268] Code: 8d 00 00 01 c7 05 1c e6 05 00 01 00 00 00 41 bd 01 00 00 00 44 8b 25 2f e6 05 00 e9 db fe ff ff 66 90 0f 1f 44 00 00 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 89 ff 48 89 e5 41 54 53 [ 1323.742385] RIP: kvm_spurious_fault+0x9/0x10 [kvm] RSP: ffffa8b78147fc78 [ 1323.745438] ---[ end trace 92fa23c974db8b7e ]--- ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-07 15:31 ` [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) Kashyap Chamarthy @ 2018-02-07 22:26 ` David Hildenbrand 2018-02-08 8:19 ` Florian Haas 2018-02-08 10:46 ` Kashyap Chamarthy 0 siblings, 2 replies; 22+ messages in thread From: David Hildenbrand @ 2018-02-07 22:26 UTC (permalink / raw) To: Kashyap Chamarthy, Florian Haas; +Cc: libvirt-users, kvm On 07.02.2018 16:31, Kashyap Chamarthy wrote: > [Cc: KVM upstream list.] > > On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: >> Hi everyone, >> >> I hope this is the correct list to discuss this issue; please feel >> free to redirect me otherwise. >> >> I have a nested virtualization setup that looks as follows: >> >> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) >> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default >> - Nested guest: SLES 12, kernel 3.12.28-4-default >> >> The nested guest is configured with "<type arch='x86_64' >> machine='pc-i440fx-1.4'>hvm</type>". >> >> This is working just beautifully, except when the L0 guest wakes up >> from managed save (openstack server resume in OpenStack parlance). >> Then, in the L0 guest we immediately see this: > > [...] # Snip the call trace from Florian. It is here: > https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html > >> What does fix things, of course, is to switch from the nested guest >> from KVM to Qemu — but that also makes things significantly slower. >> >> So I'm wondering: is there someone reading this who does run nested >> KVM and has managed to successfully live-migrate or managed-save? If >> so, would you be able to share a working host kernel / L0 guest kernel >> / nested guest kernel combination, or any other hints for tuning the >> L0 guest to support managed save and live migration? > > Following up from our IRC discussion (on #kvm, Freenode). Re-posting my > comment here: > > So I just did a test of 'managedsave' (which is just "save the state of > the running VM to a file" in libvirt parlance) of L1, _while_ L2 is > running, and I seem to reproduce your case (see the call trace > attached). > > # Ensure L2 (the nested guest) is running on L1. Then, from L0, do > # the following: > [L0] $ virsh managedsave L1 > [L0] $ virsh start L1 --console > > Result: See the call trace attached to this bug. But L1 goes on to > start "fine", and L2 keeps running, too. But things start to seem > weird. As in: I try to safely, read-only mount the L2 disk image via > libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses > direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call > trace again on the L1 serial console. And the `guestfish` command just > sits there forever > > > - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug > - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 > - L2 is a CirrOS 3.5 image > > I can reproduce this at least 3 times, with the above versions. > > I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in > QEMU parlance) for both L1 and L2. > > My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. > > Thoughts? Sounds like a similar problem as in https://bugzilla.kernel.org/show_bug.cgi?id=198621 In short: there is no (live) migration support for nested VMX yet. So as soon as your guest is using VMX itself ("nVMX"), this is not expected to work. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-07 22:26 ` David Hildenbrand @ 2018-02-08 8:19 ` Florian Haas 2018-02-08 12:07 ` David Hildenbrand 2018-02-08 10:46 ` Kashyap Chamarthy 1 sibling, 1 reply; 22+ messages in thread From: Florian Haas @ 2018-02-08 8:19 UTC (permalink / raw) To: David Hildenbrand; +Cc: Kashyap Chamarthy, libvirt-users, kvm On Wed, Feb 7, 2018 at 11:26 PM, David Hildenbrand <david@redhat.com> wrote: > On 07.02.2018 16:31, Kashyap Chamarthy wrote: >> [Cc: KVM upstream list.] >> >> On Tue, Feb 06, 2018 at 04:11:46PM +0100, Florian Haas wrote: >>> Hi everyone, >>> >>> I hope this is the correct list to discuss this issue; please feel >>> free to redirect me otherwise. >>> >>> I have a nested virtualization setup that looks as follows: >>> >>> - Host: Ubuntu 16.04, kernel 4.4.0 (an OpenStack Nova compute node) >>> - L0 guest: openSUSE Leap 42.3, kernel 4.4.104-39-default >>> - Nested guest: SLES 12, kernel 3.12.28-4-default >>> >>> The nested guest is configured with "<type arch='x86_64' >>> machine='pc-i440fx-1.4'>hvm</type>". >>> >>> This is working just beautifully, except when the L0 guest wakes up >>> from managed save (openstack server resume in OpenStack parlance). >>> Then, in the L0 guest we immediately see this: >> >> [...] # Snip the call trace from Florian. It is here: >> https://www.redhat.com/archives/libvirt-users/2018-February/msg00014.html >> >>> What does fix things, of course, is to switch from the nested guest >>> from KVM to Qemu — but that also makes things significantly slower. >>> >>> So I'm wondering: is there someone reading this who does run nested >>> KVM and has managed to successfully live-migrate or managed-save? If >>> so, would you be able to share a working host kernel / L0 guest kernel >>> / nested guest kernel combination, or any other hints for tuning the >>> L0 guest to support managed save and live migration? >> >> Following up from our IRC discussion (on #kvm, Freenode). Re-posting my >> comment here: >> >> So I just did a test of 'managedsave' (which is just "save the state of >> the running VM to a file" in libvirt parlance) of L1, _while_ L2 is >> running, and I seem to reproduce your case (see the call trace >> attached). >> >> # Ensure L2 (the nested guest) is running on L1. Then, from L0, do >> # the following: >> [L0] $ virsh managedsave L1 >> [L0] $ virsh start L1 --console >> >> Result: See the call trace attached to this bug. But L1 goes on to >> start "fine", and L2 keeps running, too. But things start to seem >> weird. As in: I try to safely, read-only mount the L2 disk image via >> libguestfs (by setting export LIBGUESTFS_BACKEND=direct, which uses >> direct QEMU): `guestfish --ro -a -i ./cirros.qcow2`. It throws the call >> trace again on the L1 serial console. And the `guestfish` command just >> sits there forever >> >> >> - L0 (bare metal) Kernel: 4.13.13-300.fc27.x86_64+debug >> - L1 (guest hypervisor) kernel: 4.11.10-300.fc26.x86_64 >> - L2 is a CirrOS 3.5 image >> >> I can reproduce this at least 3 times, with the above versions. >> >> I'm using libvirt 'host-passthrough' for CPU (meaning: '-cpu host' in >> QEMU parlance) for both L1 and L2. >> >> My L0 CPU is: Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz. >> >> Thoughts? > > Sounds like a similar problem as in > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > In short: there is no (live) migration support for nested VMX yet. So as > soon as your guest is using VMX itself ("nVMX"), this is not expected to > work. Hi David, thanks for getting back to us on this. I see your point, except the issue Kashyap and I are describing does not occur with live migration, it occurs with savevm/loadvm (virsh managedsave/virsh start in libvirt terms, nova suspend/resume in OpenStack lingo). And it's not immediately self-evident that the limitations for the former also apply to the latter. Even for the live migration limitation, I've been unsuccessful at finding documentation that warns users to not attempt live migration when using nesting, and this discussion sounds like a good opportunity for me to help fix that. Just to give an example, https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests from just last September talks explicitly about how "guests can be snapshot/resumed, migrated to other hypervisors and much more" in the opening paragraph, and then talks at length about nested guests — without ever pointing out that those very features aren't expected to work for them. :) So to clarify things, could you enumerate the currently known limitations when enabling nesting? I'd be happy to summarize those and add them to the linux-kvm.org FAQ so others are less likely to hit their head on this issue. In particular: - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM still accurate in that -cpu host (libvirt "host-passthrough") is the strongly recommended configuration for the L2 guest? - If so, are there any recommendations for how to configure the L1 guest with regard to CPU model? - Is live migration with nested guests _always_ expected to break on all architectures, and if not, which are safe? - Idem, for savevm/loadvm? - With regard to the problem that Kashyap and I (and Dennis, the kernel.org bugzilla reporter) are describing, is this expected to work any better on AMD CPUs? (All reports are on Intel) - Do you expect nested virtualization functionality to be adversely affected by KPTI and/or other Meltdown/Spectre mitigation patches? Kashyap, can you think of any other limitations that would benefit from improved documentation? Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 8:19 ` Florian Haas @ 2018-02-08 12:07 ` David Hildenbrand 2018-02-08 13:29 ` Florian Haas ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: David Hildenbrand @ 2018-02-08 12:07 UTC (permalink / raw) To: Florian Haas; +Cc: Kashyap Chamarthy, libvirt-users, kvm >> In short: there is no (live) migration support for nested VMX yet. So as >> soon as your guest is using VMX itself ("nVMX"), this is not expected to >> work. > > Hi David, thanks for getting back to us on this. Hi Florian, (sombeody please correct me if I'm wrong) > > I see your point, except the issue Kashyap and I are describing does > not occur with live migration, it occurs with savevm/loadvm (virsh > managedsave/virsh start in libvirt terms, nova suspend/resume in > OpenStack lingo). And it's not immediately self-evident that the > limitations for the former also apply to the latter. Even for the live > migration limitation, I've been unsuccessful at finding documentation > that warns users to not attempt live migration when using nesting, and > this discussion sounds like a good opportunity for me to help fix > that. > > Just to give an example, > https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests > from just last September talks explicitly about how "guests can be > snapshot/resumed, migrated to other hypervisors and much more" in the > opening paragraph, and then talks at length about nested guests — > without ever pointing out that those very features aren't expected to > work for them. :) Well, it still is a kernel parameter "nested" that is disabled by default. So things should be expected to be shaky. :) While running nested guests work usually fine, migrating a nested hypervisor is the problem. Especially see e.g. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/nested_virt "However, note that nested virtualization is not supported or recommended in production user environments, and is primarily intended for development and testing. " > > So to clarify things, could you enumerate the currently known > limitations when enabling nesting? I'd be happy to summarize those and > add them to the linux-kvm.org FAQ so others are less likely to hit > their head on this issue. In particular: The general problem is that migration of an L1 will not work when it is running L2, so when L1 is using VMX ("nVMX"). Migrating an L2 should work as before. The problem is, in order for L1 to make use of VMX to run L2, we have to run L2 in L0, simulating VMX -> nested VMX a.k.a. nVMX . This requires additional state information about L1 ("nVMX" state), which is not properly migrated when migrating L1. Therefore, after migration, the CPU state of L1 might be screwed up after migration, resulting in L1 crashes. In addition, certain VMX features might be missing on the target, which also still has to be handled via the CPU model in the future. L0, should hopefully not crash, I hope that you are not seeing that. > > - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM > still accurate in that -cpu host (libvirt "host-passthrough") is the > strongly recommended configuration for the L2 guest? > > - If so, are there any recommendations for how to configure the L1 > guest with regard to CPU model? You have to indicate the VMX feature to your L1 ("nested hypervisor"), that is usually automatically done by using the "host-passthrough" or "host-model" value. If you're using a custom CPU model, you have to enable it explicitly. > > - Is live migration with nested guests _always_ expected to break on > all architectures, and if not, which are safe? x86 VMX: running nested guests works, migrating nested hypervisors does not work x86 SVM: running nested guests works, migrating nested hypervisor does not work (somebody correct me if I'm wrong) s390x: running nested guests works, migrating nested hypervisors works power: running nested guests works only via KVM-PR ("trap and emulate"). migrating nested hypervisors therefore works. But we are not using hardware virtualization for L1->L2. (my latest status) arm: running nested guests is in the works (my latest status), migration is therefore also not possible. > > - Idem, for savevm/loadvm? > savevm/loadvm is not expected to work correctly on an L1 if it is running L2 guests. It should work on L2 however. > - With regard to the problem that Kashyap and I (and Dennis, the > kernel.org bugzilla reporter) are describing, is this expected to work > any better on AMD CPUs? (All reports are on Intel) No, remeber that they are also still missing migration support of the nested SVM state. > > - Do you expect nested virtualization functionality to be adversely > affected by KPTI and/or other Meltdown/Spectre mitigation patches? Not an expert on this. I think it should be affected in a similar way as ordinary guests :) > > Kashyap, can you think of any other limitations that would benefit > from improved documentation? We should certainly document what I have summaries here properly at a central palce! > > Cheers, > Florian > -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 12:07 ` David Hildenbrand @ 2018-02-08 13:29 ` Florian Haas 2018-02-08 13:47 ` David Hildenbrand 2018-02-08 14:45 ` Kashyap Chamarthy 2018-02-08 17:44 ` Florian Haas 2 siblings, 1 reply; 22+ messages in thread From: Florian Haas @ 2018-02-08 13:29 UTC (permalink / raw) To: David Hildenbrand; +Cc: Kashyap Chamarthy, libvirt-users, kvm Hi David, thanks for the added input! I'm taking the liberty to snip a few paragraphs to trim this email down a bit. On Thu, Feb 8, 2018 at 1:07 PM, David Hildenbrand <david@redhat.com> wrote: >> Just to give an example, >> https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests >> from just last September talks explicitly about how "guests can be >> snapshot/resumed, migrated to other hypervisors and much more" in the >> opening paragraph, and then talks at length about nested guests — >> without ever pointing out that those very features aren't expected to >> work for them. :) > > Well, it still is a kernel parameter "nested" that is disabled by > default. So things should be expected to be shaky. :) While running > nested guests work usually fine, migrating a nested hypervisor is the > problem. > > Especially see e.g. > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/nested_virt > > "However, note that nested virtualization is not supported or > recommended in production user environments, and is primarily intended > for development and testing. " Sure, I do understand that Red Hat (or any other vendor) is taking no support responsibility for this. At this point I'd just like to contribute to a better understanding of what's expected to definitely _not_ work, so that people don't bloody their noses on that. :) >> So to clarify things, could you enumerate the currently known >> limitations when enabling nesting? I'd be happy to summarize those and >> add them to the linux-kvm.org FAQ so others are less likely to hit >> their head on this issue. In particular: > > The general problem is that migration of an L1 will not work when it is > running L2, so when L1 is using VMX ("nVMX"). > > Migrating an L2 should work as before. > > The problem is, in order for L1 to make use of VMX to run L2, we have to > run L2 in L0, simulating VMX -> nested VMX a.k.a. nVMX . This requires > additional state information about L1 ("nVMX" state), which is not > properly migrated when migrating L1. Therefore, after migration, the CPU > state of L1 might be screwed up after migration, resulting in L1 crashes. > > In addition, certain VMX features might be missing on the target, which > also still has to be handled via the CPU model in the future. Thanks a bunch for the added detail. Now I got a primer today from Kashyap on IRC on how savevm/loadvm is very similar to migration, but I'm still struggling to wrap my head around it. What you say makes perfect sense to me in that _migration_ might blow up in subtle ways, but can you try to explain to me why the same considerations would apply with savevm/loadvm? > L0, should hopefully not crash, I hope that you are not seeing that. No I am not; we're good there. :) >> - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM >> still accurate in that -cpu host (libvirt "host-passthrough") is the >> strongly recommended configuration for the L2 guest? >> >> - If so, are there any recommendations for how to configure the L1 >> guest with regard to CPU model? > > You have to indicate the VMX feature to your L1 ("nested hypervisor"), > that is usually automatically done by using the "host-passthrough" or > "host-model" value. If you're using a custom CPU model, you have to > enable it explicitly. Roger. Without that we can't do nesting at all. >> - Is live migration with nested guests _always_ expected to break on >> all architectures, and if not, which are safe? > > x86 VMX: running nested guests works, migrating nested hypervisors does > not work > > x86 SVM: running nested guests works, migrating nested hypervisor does > not work (somebody correct me if I'm wrong) > > s390x: running nested guests works, migrating nested hypervisors works > > power: running nested guests works only via KVM-PR ("trap and emulate"). > migrating nested hypervisors therefore works. But we are not using > hardware virtualization for L1->L2. (my latest status) > > arm: running nested guests is in the works (my latest status), migration > is therefore also not possible. Great summary, thanks! >> - Idem, for savevm/loadvm? >> > > savevm/loadvm is not expected to work correctly on an L1 if it is > running L2 guests. It should work on L2 however. Again, I'm somewhat struggling to understand this vs. live migration — but it's entirely possible that I'm sorely lacking in my knowledge of kernel and CPU internals. >> - With regard to the problem that Kashyap and I (and Dennis, the >> kernel.org bugzilla reporter) are describing, is this expected to work >> any better on AMD CPUs? (All reports are on Intel) > > No, remeber that they are also still missing migration support of the > nested SVM state. Understood, thanks. >> - Do you expect nested virtualization functionality to be adversely >> affected by KPTI and/or other Meltdown/Spectre mitigation patches? > > Not an expert on this. I think it should be affected in a similar way as > ordinary guests :) Fair enough. :) >> Kashyap, can you think of any other limitations that would benefit >> from improved documentation? > > We should certainly document what I have summaries here properly at a > central palce! I tried getting registered on the linux-kvm.org wiki to do exactly that, and ran into an SMTP/DNS configuration issue with the verification email. Kashyap said he was going to poke the site admin about that. Now, here's a bit more information on my continued testing. As I mentioned on IRC, one of the things that struck me as odd was that if I ran into the issue previously described, the L1 guest would enter a reboot loop if configured with kernel.panic_on_oops=1. In other words, I would savevm the L1 guest (with a running L2), then loadvm it, and then the L1 would stack-trace, reboot, and then keep doing that indefinitely. I found that weird because on the second reboot, I would expect the system to come up cleanly. I've now changed my L2 guest's CPU configuration so that libvirt (in L1) starts the L2 guest with the following settings: <cpu> <model fallback='forbid'>Haswell-noTSX</model> <vendor>Intel</vendor> <feature policy='disable' name='vme'/> <feature policy='disable' name='ss'/> <feature policy='disable' name='f16c'/> <feature policy='disable' name='rdrand'/> <feature policy='disable' name='hypervisor'/> <feature policy='disable' name='arat'/> <feature policy='disable' name='tsc_adjust'/> <feature policy='disable' name='xsaveopt'/> <feature policy='disable' name='abm'/> <feature policy='disable' name='aes'/> <feature policy='disable' name='invpcid'/> </cpu> Basically, I am disabling every single feature that my L1's "virsh capabilities" reports. Now this does not make my L1 come up happily from loadvm. But it does seem to initiate a clean reboot after loadvm, and after that clean reboot it lives happily. If this is as good as it gets (for now), then I can totally live with that. It certainly beats running the L2 guest with Qemu (without KVM acceleration). But I would still love to understand the issue a little bit better. Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 13:29 ` Florian Haas @ 2018-02-08 13:47 ` David Hildenbrand 2018-02-08 13:57 ` Florian Haas 2018-02-08 14:59 ` Daniel P. Berrangé 0 siblings, 2 replies; 22+ messages in thread From: David Hildenbrand @ 2018-02-08 13:47 UTC (permalink / raw) To: Florian Haas; +Cc: Kashyap Chamarthy, libvirt-users, kvm > Sure, I do understand that Red Hat (or any other vendor) is taking no > support responsibility for this. At this point I'd just like to > contribute to a better understanding of what's expected to definitely > _not_ work, so that people don't bloody their noses on that. :) Indeed. nesting is nice to enable as it works in 99% of all cases. It just doesn't work when trying to migrate a nested hypervisor. (on x86) That's what most people don't realize, as it works "just fine" for 99% of all use cases. [...] >> >> savevm/loadvm is not expected to work correctly on an L1 if it is >> running L2 guests. It should work on L2 however. > > Again, I'm somewhat struggling to understand this vs. live migration — > but it's entirely possible that I'm sorely lacking in my knowledge of > kernel and CPU internals. (savevm/loadvm is also called "migration to file") When we migrate to a file, it really is the same migration stream. You "dump" the VM state into a file, instead of sending it over to another (running) target. Once you load your VM state from that file, it is a completely fresh VM/KVM environment. So you have to restore all the state. Now, as nVMX state is not contained in the migration stream, you cannot restore that state. The L1 state is therefore "damaged" or incomplete. [...] >>> Kashyap, can you think of any other limitations that would benefit >>> from improved documentation? >> >> We should certainly document what I have summaries here properly at a >> central palce! > > I tried getting registered on the linux-kvm.org wiki to do exactly > that, and ran into an SMTP/DNS configuration issue with the > verification email. Kashyap said he was going to poke the site admin > about that. > > Now, here's a bit more information on my continued testing. As I > mentioned on IRC, one of the things that struck me as odd was that if > I ran into the issue previously described, the L1 guest would enter a > reboot loop if configured with kernel.panic_on_oops=1. In other words, > I would savevm the L1 guest (with a running L2), then loadvm it, and > then the L1 would stack-trace, reboot, and then keep doing that > indefinitely. I found that weird because on the second reboot, I would > expect the system to come up cleanly. Guess the L1 state (in the kernel) is broken that hard, that even a reset cannot fix it. > > I've now changed my L2 guest's CPU configuration so that libvirt (in > L1) starts the L2 guest with the following settings: > > <cpu> > <model fallback='forbid'>Haswell-noTSX</model> > <vendor>Intel</vendor> > <feature policy='disable' name='vme'/> > <feature policy='disable' name='ss'/> > <feature policy='disable' name='f16c'/> > <feature policy='disable' name='rdrand'/> > <feature policy='disable' name='hypervisor'/> > <feature policy='disable' name='arat'/> > <feature policy='disable' name='tsc_adjust'/> > <feature policy='disable' name='xsaveopt'/> > <feature policy='disable' name='abm'/> > <feature policy='disable' name='aes'/> > <feature policy='disable' name='invpcid'/> > </cpu> Maybe one of these features is the root cause of the "messed up" state in KVM. So disabling it also makes the L1 state "less broken". > > Basically, I am disabling every single feature that my L1's "virsh > capabilities" reports. Now this does not make my L1 come up happily > from loadvm. But it does seem to initiate a clean reboot after loadvm, > and after that clean reboot it lives happily. > > If this is as good as it gets (for now), then I can totally live with > that. It certainly beats running the L2 guest with Qemu (without KVM > acceleration). But I would still love to understand the issue a little > bit better. I mean the real solution to the problem is of course restoring the L1 state correctly (migrating nVMX state, what people are working on right now). So what you are seeing is a bad "side effect" of that. For now, nested=true should never be used along with savevm/loadvm/live migration > > Cheers, > Florian > -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 13:47 ` David Hildenbrand @ 2018-02-08 13:57 ` Florian Haas 2018-02-08 14:51 ` [Qemu-devel] " Paolo Bonzini 2018-02-08 14:55 ` David Hildenbrand 2018-02-08 14:59 ` Daniel P. Berrangé 1 sibling, 2 replies; 22+ messages in thread From: Florian Haas @ 2018-02-08 13:57 UTC (permalink / raw) To: David Hildenbrand; +Cc: Kashyap Chamarthy, libvirt-users, kvm On Thu, Feb 8, 2018 at 2:47 PM, David Hildenbrand <david@redhat.com> wrote: >> Again, I'm somewhat struggling to understand this vs. live migration — >> but it's entirely possible that I'm sorely lacking in my knowledge of >> kernel and CPU internals. > > (savevm/loadvm is also called "migration to file") > > When we migrate to a file, it really is the same migration stream. You > "dump" the VM state into a file, instead of sending it over to another > (running) target. > > Once you load your VM state from that file, it is a completely fresh > VM/KVM environment. So you have to restore all the state. Now, as nVMX > state is not contained in the migration stream, you cannot restore that > state. The L1 state is therefore "damaged" or incomplete. *lightbulb* Thanks a lot, that's a perfectly logical explanation. :) >> Now, here's a bit more information on my continued testing. As I >> mentioned on IRC, one of the things that struck me as odd was that if >> I ran into the issue previously described, the L1 guest would enter a >> reboot loop if configured with kernel.panic_on_oops=1. In other words, >> I would savevm the L1 guest (with a running L2), then loadvm it, and >> then the L1 would stack-trace, reboot, and then keep doing that >> indefinitely. I found that weird because on the second reboot, I would >> expect the system to come up cleanly. > > Guess the L1 state (in the kernel) is broken that hard, that even a > reset cannot fix it. ... which would also explain that in contrast to that, a virsh destroy/virsh start cycle does fix things. >> I've now changed my L2 guest's CPU configuration so that libvirt (in >> L1) starts the L2 guest with the following settings: >> >> <cpu> >> <model fallback='forbid'>Haswell-noTSX</model> >> <vendor>Intel</vendor> >> <feature policy='disable' name='vme'/> >> <feature policy='disable' name='ss'/> >> <feature policy='disable' name='f16c'/> >> <feature policy='disable' name='rdrand'/> >> <feature policy='disable' name='hypervisor'/> >> <feature policy='disable' name='arat'/> >> <feature policy='disable' name='tsc_adjust'/> >> <feature policy='disable' name='xsaveopt'/> >> <feature policy='disable' name='abm'/> >> <feature policy='disable' name='aes'/> >> <feature policy='disable' name='invpcid'/> >> </cpu> > > Maybe one of these features is the root cause of the "messed up" state > in KVM. So disabling it also makes the L1 state "less broken". Would you try a guess as to which of the above features is a likely culprit? >> Basically, I am disabling every single feature that my L1's "virsh >> capabilities" reports. Now this does not make my L1 come up happily >> from loadvm. But it does seem to initiate a clean reboot after loadvm, >> and after that clean reboot it lives happily. >> >> If this is as good as it gets (for now), then I can totally live with >> that. It certainly beats running the L2 guest with Qemu (without KVM >> acceleration). But I would still love to understand the issue a little >> bit better. > > I mean the real solution to the problem is of course restoring the L1 > state correctly (migrating nVMX state, what people are working on right > now). So what you are seeing is a bad "side effect" of that. > > For now, nested=true should never be used along with savevm/loadvm/live > migration. Yes, I gathered as much. :) Thanks again! Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 13:57 ` Florian Haas @ 2018-02-08 14:51 ` Paolo Bonzini 2018-02-08 14:55 ` David Hildenbrand 1 sibling, 0 replies; 22+ messages in thread From: Paolo Bonzini @ 2018-02-08 14:51 UTC (permalink / raw) To: Florian Haas, qemu-devel, KVM list On 08/02/2018 14:57, Florian Haas wrote: >>> <feature policy='disable' name='vme'/> >>> <feature policy='disable' name='ss'/> >>> <feature policy='disable' name='f16c'/> >>> <feature policy='disable' name='rdrand'/> >>> <feature policy='disable' name='hypervisor'/> >>> <feature policy='disable' name='arat'/> >>> <feature policy='disable' name='tsc_adjust'/> >>> <feature policy='disable' name='xsaveopt'/> >>> <feature policy='disable' name='abm'/> >>> <feature policy='disable' name='aes'/> >>> <feature policy='disable' name='invpcid'/> >>> </cpu> >> Maybe one of these features is the root cause of the "messed up" state >> in KVM. So disabling it also makes the L1 state "less broken". > > Would you try a guess as to which of the above features is a likely culprit? You're just being lucky. :) In fact, if you every migrate or save a VM that's running in L2, you would get an unholy mixture of source L1 and source L2 state running on the destination *as L1* (because the destination doesn't know it's running a nested guest!). It just cannot work yet---sorry about that! Paolo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) @ 2018-02-08 14:51 ` Paolo Bonzini 0 siblings, 0 replies; 22+ messages in thread From: Paolo Bonzini @ 2018-02-08 14:51 UTC (permalink / raw) To: Florian Haas, qemu-devel, KVM list On 08/02/2018 14:57, Florian Haas wrote: >>> <feature policy='disable' name='vme'/> >>> <feature policy='disable' name='ss'/> >>> <feature policy='disable' name='f16c'/> >>> <feature policy='disable' name='rdrand'/> >>> <feature policy='disable' name='hypervisor'/> >>> <feature policy='disable' name='arat'/> >>> <feature policy='disable' name='tsc_adjust'/> >>> <feature policy='disable' name='xsaveopt'/> >>> <feature policy='disable' name='abm'/> >>> <feature policy='disable' name='aes'/> >>> <feature policy='disable' name='invpcid'/> >>> </cpu> >> Maybe one of these features is the root cause of the "messed up" state >> in KVM. So disabling it also makes the L1 state "less broken". > > Would you try a guess as to which of the above features is a likely culprit? You're just being lucky. :) In fact, if you every migrate or save a VM that's running in L2, you would get an unholy mixture of source L1 and source L2 state running on the destination *as L1* (because the destination doesn't know it's running a nested guest!). It just cannot work yet---sorry about that! Paolo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 13:57 ` Florian Haas 2018-02-08 14:51 ` [Qemu-devel] " Paolo Bonzini @ 2018-02-08 14:55 ` David Hildenbrand 1 sibling, 0 replies; 22+ messages in thread From: David Hildenbrand @ 2018-02-08 14:55 UTC (permalink / raw) To: Florian Haas; +Cc: Kashyap Chamarthy, libvirt-users, kvm >>> I've now changed my L2 guest's CPU configuration so that libvirt (in >>> L1) starts the L2 guest with the following settings: >>> >>> <cpu> >>> <model fallback='forbid'>Haswell-noTSX</model> >>> <vendor>Intel</vendor> >>> <feature policy='disable' name='vme'/> >>> <feature policy='disable' name='ss'/> >>> <feature policy='disable' name='f16c'/> >>> <feature policy='disable' name='rdrand'/> >>> <feature policy='disable' name='hypervisor'/> >>> <feature policy='disable' name='arat'/> >>> <feature policy='disable' name='tsc_adjust'/> >>> <feature policy='disable' name='xsaveopt'/> >>> <feature policy='disable' name='abm'/> >>> <feature policy='disable' name='aes'/> >>> <feature policy='disable' name='invpcid'/> >>> </cpu> >> >> Maybe one of these features is the root cause of the "messed up" state >> in KVM. So disabling it also makes the L1 state "less broken". > > Would you try a guess as to which of the above features is a likely culprit? > Hmm, actually no idea, but you can bisect :) (but watch out, it could also just be "coincidence". Especially if you migrate while all VCPUs of L1 are currently not executing L2, chances might be better for L1 to survive a migration - L2 will still fail hard, and L1 certainly, too when trying to run L2 again) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 13:47 ` David Hildenbrand 2018-02-08 13:57 ` Florian Haas @ 2018-02-08 14:59 ` Daniel P. Berrangé 1 sibling, 0 replies; 22+ messages in thread From: Daniel P. Berrangé @ 2018-02-08 14:59 UTC (permalink / raw) To: David Hildenbrand; +Cc: Florian Haas, libvirt-users, kvm On Thu, Feb 08, 2018 at 02:47:26PM +0100, David Hildenbrand wrote: > > Sure, I do understand that Red Hat (or any other vendor) is taking no > > support responsibility for this. At this point I'd just like to > > contribute to a better understanding of what's expected to definitely > > _not_ work, so that people don't bloody their noses on that. :) > > Indeed. nesting is nice to enable as it works in 99% of all cases. It > just doesn't work when trying to migrate a nested hypervisor. (on x86) Hmm, if migration of the L1 is going to cause things to crash and burn, then ideally libvirt on L0 would block the migration from being done. Naively we could do that if the guest has vmx or svm features in its CPU, except that's probably way too conservative as many guests with those features won't actually do any nested VMs. It would also be desirable to still be able to migrate the L1, if no L2s are running currently. Is there any way QEMU can expose whether there's any L2s activated to libvirt, so we can prevent migration in that case ? Or should QEMU itself refuse to start migration perhaps ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 12:07 ` David Hildenbrand 2018-02-08 13:29 ` Florian Haas @ 2018-02-08 14:45 ` Kashyap Chamarthy 2018-02-08 17:44 ` Florian Haas 2 siblings, 0 replies; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-08 14:45 UTC (permalink / raw) To: David Hildenbrand; +Cc: Florian Haas, libvirt-users, kvm On Thu, Feb 08, 2018 at 01:07:33PM +0100, David Hildenbrand wrote: [...] > > So to clarify things, could you enumerate the currently known > > limitations when enabling nesting? I'd be happy to summarize those and > > add them to the linux-kvm.org FAQ so others are less likely to hit > > their head on this issue. In particular: > [...] # Snip description of what works in context of migration > > - Is https://fedoraproject.org/wiki/How_to_enable_nested_virtualization_in_KVM > > still accurate in that -cpu host (libvirt "host-passthrough") is the > > strongly recommended configuration for the L2 guest? That wiki is a bit outdated. And it is not accurate — if we can just expose the Intel 'vmx' (or AMD 'svm') CPU feature flag to the L2 guest, that should be sufficient. No need for a full passthrough. That above document should definitely be modified to add more verbiage comparing 'host-passthrough' vs. 'host-model' vs. custom CPU. > > - If so, are there any recommendations for how to configure the L1 > > guest with regard to CPU model? > > You have to indicate the VMX feature to your L1 ("nested hypervisor"), > that is usually automatically done by using the "host-passthrough" or > "host-model" value. If you're using a custom CPU model, you have to > enable it explicitly. > > > > > - Is live migration with nested guests _always_ expected to break on > > all architectures, and if not, which are safe? > > x86 VMX: running nested guests works, migrating nested hypervisors does > not work > > x86 SVM: running nested guests works, migrating nested hypervisor does > not work (somebody correct me if I'm wrong) > > s390x: running nested guests works, migrating nested hypervisors works > > power: running nested guests works only via KVM-PR ("trap and emulate"). > migrating nested hypervisors therefore works. But we are not using > hardware virtualization for L1->L2. (my latest status) > > arm: running nested guests is in the works (my latest status), migration > is therefore also not possible. That's a great summary. > > > > - Idem, for savevm/loadvm? > > > > savevm/loadvm is not expected to work correctly on an L1 if it is > running L2 guests. It should work on L2 however. Yes, that works as intended. > > - With regard to the problem that Kashyap and I (and Dennis, the > > kernel.org bugzilla reporter) are describing, is this expected to work > > any better on AMD CPUs? (All reports are on Intel) > > No, remeber that they are also still missing migration support of the > nested SVM state. Right. I partly mixed up migration of L1-running-L2 (which doesn't fly for reasons David already explained) vs. migrating L2 (which works). > > - Do you expect nested virtualization functionality to be adversely > > affected by KPTI and/or other Meltdown/Spectre mitigation patches? > > Not an expert on this. I think it should be affected in a similar way as > ordinary guests :) > > > > > Kashyap, can you think of any other limitations that would benefit > > from improved documentation? > > We should certainly document what I have summaries here properly at a > central palce! Yeah, agreed. Also, when documentation in context of nested, it'd be useful to explicitly spell out what works or doesn't work at each level — e.g. L2 can be migrated to a destination L1 just fine; mirating an L1-running-L2 to a destination L0 will be in dodgy waters for reasons X, etc. [...] -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 12:07 ` David Hildenbrand 2018-02-08 13:29 ` Florian Haas 2018-02-08 14:45 ` Kashyap Chamarthy @ 2018-02-08 17:44 ` Florian Haas 2018-02-09 10:48 ` Kashyap Chamarthy 2 siblings, 1 reply; 22+ messages in thread From: Florian Haas @ 2018-02-08 17:44 UTC (permalink / raw) To: David Hildenbrand; +Cc: Kashyap Chamarthy, libvirt-users, kvm On Thu, Feb 8, 2018 at 1:07 PM, David Hildenbrand <david@redhat.com> wrote: > We should certainly document what I have summaries here properly at a > central palce! Please review the three edits I've submitted to the wiki: https://www.linux-kvm.org/page/Special:Contributions/Fghaas Feel free to ruthlessly edit/roll back anything that is inaccurate. Thanks! Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 17:44 ` Florian Haas @ 2018-02-09 10:48 ` Kashyap Chamarthy 2018-02-09 11:02 ` Florian Haas 0 siblings, 1 reply; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-09 10:48 UTC (permalink / raw) To: Florian Haas; +Cc: David Hildenbrand, libvirt-users, kvm On Thu, Feb 08, 2018 at 06:44:43PM +0100, Florian Haas wrote: > On Thu, Feb 8, 2018 at 1:07 PM, David Hildenbrand <david@redhat.com> wrote: > > We should certainly document what I have summaries here properly at a > > central palce! > > Please review the three edits I've submitted to the wiki: > https://www.linux-kvm.org/page/Special:Contributions/Fghaas > > Feel free to ruthlessly edit/roll back anything that is inaccurate. > Thanks! I've made some minor edits to clarify a bunch of bits, and a link to the Kernel doc about Intel nVMX. (Hope that looks fine.) You wrote: "L2...which does no further virtualization". Not quite true — "under right circumstances" (read: sufficiently huge machine with tons of RAM), L2 _can_ in turn L3. :-) Last time I checked (this morning), Rich W.M. Jones had 4 levels of nesting tested with the 'supernested' program[1] he wrote. (Related aside: This program is packaged it as part of 2016 QEMU Advent Calendar[2] -- if you want to play around on a powerful test machine with tons of free memory.) [1] http://git.annexia.org/?p=supernested.git;a=blob;f=README [2] http://www.qemu-advent-calendar.org/2016/#day-13 -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-09 10:48 ` Kashyap Chamarthy @ 2018-02-09 11:02 ` Florian Haas 2018-02-12 9:27 ` Kashyap Chamarthy 0 siblings, 1 reply; 22+ messages in thread From: Florian Haas @ 2018-02-09 11:02 UTC (permalink / raw) To: Kashyap Chamarthy; +Cc: David Hildenbrand, libvirt-users, kvm On Fri, Feb 9, 2018 at 11:48 AM, Kashyap Chamarthy <kchamart@redhat.com> wrote: > On Thu, Feb 08, 2018 at 06:44:43PM +0100, Florian Haas wrote: >> On Thu, Feb 8, 2018 at 1:07 PM, David Hildenbrand <david@redhat.com> wrote: >> > We should certainly document what I have summaries here properly at a >> > central palce! >> >> Please review the three edits I've submitted to the wiki: >> https://www.linux-kvm.org/page/Special:Contributions/Fghaas >> >> Feel free to ruthlessly edit/roll back anything that is inaccurate. >> Thanks! > > I've made some minor edits to clarify a bunch of bits, and a link to the > Kernel doc about Intel nVMX. (Hope that looks fine.) I'm sure they it does, but just so you know I currently don't see any edits from you on the Nested Guests page. Are you sure you saved/published your changes? > You wrote: "L2...which does no further virtualization". Not quite true > — "under right circumstances" (read: sufficiently huge machine with tons > of RAM), L2 _can_ in turn L3. :-) Insert "normally" between "which" and "does", then. :) > Last time I checked (this morning), Rich W.M. Jones had 4 levels of > nesting tested with the 'supernested' program[1] he wrote. (Related > aside: This program is packaged it as part of 2016 QEMU Advent > Calendar[2] -- if you want to play around on a powerful test machine > with tons of free memory.) > > [1] http://git.annexia.org/?p=supernested.git;a=blob;f=README > [2] http://www.qemu-advent-calendar.org/2016/#day-13 Interesting, thanks for the pointer! Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-09 11:02 ` Florian Haas @ 2018-02-12 9:27 ` Kashyap Chamarthy 2018-02-12 14:07 ` Florian Haas 0 siblings, 1 reply; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-12 9:27 UTC (permalink / raw) To: Florian Haas; +Cc: David Hildenbrand, libvirt-users, kvm On Fri, Feb 09, 2018 at 12:02:25PM +0100, Florian Haas wrote: > On Fri, Feb 9, 2018 at 11:48 AM, Kashyap Chamarthy <kchamart@redhat.com> wrote: [...] > > I've made some minor edits to clarify a bunch of bits, and a link to the > > Kernel doc about Intel nVMX. (Hope that looks fine.) > > I'm sure they it does, but just so you know I currently don't see any > edits from you on the Nested Guests page. Are you sure you > saved/published your changes? Thanks for catching that. _Now_ it's updated. https://www.linux-kvm.org/page/Nested_Guests (I also didn't have permissions to add external links; had to get that sorted out with the admin.) > > You wrote: "L2...which does no further virtualization". Not quite true > > — "under right circumstances" (read: sufficiently huge machine with tons > > of RAM), L2 _can_ in turn L3. :-) > > Insert "normally" between "which" and "does", then. :) :-) > > Last time I checked (this morning), Rich W.M. Jones had 4 levels of > > nesting tested with the 'supernested' program[1] he wrote. (Related > > aside: This program is packaged it as part of 2016 QEMU Advent > > Calendar[2] -- if you want to play around on a powerful test machine > > with tons of free memory.) > > > > [1] http://git.annexia.org/?p=supernested.git;a=blob;f=README > > [2] http://www.qemu-advent-calendar.org/2016/#day-13 > > Interesting, thanks for the pointer! > > Cheers, > Florian -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-12 9:27 ` Kashyap Chamarthy @ 2018-02-12 14:07 ` Florian Haas 0 siblings, 0 replies; 22+ messages in thread From: Florian Haas @ 2018-02-12 14:07 UTC (permalink / raw) To: Kashyap Chamarthy; +Cc: David Hildenbrand, libvirt-users, kvm On Mon, Feb 12, 2018 at 10:27 AM, Kashyap Chamarthy <kchamart@redhat.com> wrote: > On Fri, Feb 09, 2018 at 12:02:25PM +0100, Florian Haas wrote: >> On Fri, Feb 9, 2018 at 11:48 AM, Kashyap Chamarthy <kchamart@redhat.com> wrote: > > [...] > >> > I've made some minor edits to clarify a bunch of bits, and a link to the >> > Kernel doc about Intel nVMX. (Hope that looks fine.) >> >> I'm sure they it does, but just so you know I currently don't see any >> edits from you on the Nested Guests page. Are you sure you >> saved/published your changes? > > Thanks for catching that. _Now_ it's updated. > > https://www.linux-kvm.org/page/Nested_Guests Got it. Thanks for those additions! > > (I also didn't have permissions to add external links; had to get that > sorted out with the admin.) Right, I saw that on my first edit attempt too. I took the liberty to back-reference this wiki page from those two bugzilla entries too (the kernel.org one and the Red Hat one). Since as I must confess, I don't follow KVM development on a day-to-day basis, I'm hopeful that at least one of those bugs will get updated triggering a notification, so that I can update that page once migration in combination with nVMX does work. I've also added a link to https://bugzilla.kernel.org/show_bug.cgi?id=53851 to the wiki — 5 years old just this week, as it happens. :) Thanks for everyone's help explaining this issue to me! Cheers, Florian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-07 22:26 ` David Hildenbrand 2018-02-08 8:19 ` Florian Haas @ 2018-02-08 10:46 ` Kashyap Chamarthy 2018-02-08 11:34 ` Kashyap Chamarthy 2018-02-08 11:48 ` David Hildenbrand 1 sibling, 2 replies; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-08 10:46 UTC (permalink / raw) To: David Hildenbrand; +Cc: Florian Haas, libvirt-users, kvm On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: > On 07.02.2018 16:31, Kashyap Chamarthy wrote: [...] > Sounds like a similar problem as in > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > In short: there is no (live) migration support for nested VMX yet. So as > soon as your guest is using VMX itself ("nVMX"), this is not expected to > work. Actually, live migration with nVMX _does_ work insofar as you have _identical_ CPUs on both source and destination — i.e. use the QEMU '-cpu host' for the L1 guests. At least that's been the case in my experience. FWIW, I frequently use that setup in my test environments. Just to be quadruple sure, I did the test: Migrate an L2 guest (with non-shared storage), and it worked just fine. (No 'oops'es, no stack traces, no "kernel BUG" in `dmesg` or serial consoles on L1s. And I can login to the L2 guest on the destination L1 just fine.) Once you have the password-less SSH between source and destination, and a bit of libvirt config setup. I ran the migrate command as following: $ virsh migrate --verbose --copy-storage-all \ --live cvm1 qemu+tcp://root@f26-vm2/system Migration: [100 %] $ echo $? 0 Full details: https://kashyapc.fedorapeople.org/virt/Migrate-a-nested-guest-08Feb2018.txt (At the end of the document above, I also posted the libvirt config and the version details across L0, L1 and L2. So this is a fully repeatable test.) -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 10:46 ` Kashyap Chamarthy @ 2018-02-08 11:34 ` Kashyap Chamarthy 2018-02-08 11:40 ` Daniel P. Berrangé 2018-02-08 11:48 ` David Hildenbrand 1 sibling, 1 reply; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-08 11:34 UTC (permalink / raw) To: David Hildenbrand; +Cc: libvirt-users, kvm On Thu, Feb 08, 2018 at 11:46:24AM +0100, Kashyap Chamarthy wrote: > On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: > > On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > [...] > > > Sounds like a similar problem as in > > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > > > In short: there is no (live) migration support for nested VMX yet. So as > > soon as your guest is using VMX itself ("nVMX"), this is not expected to > > work. > > Actually, live migration with nVMX _does_ work insofar as you have > _identical_ CPUs on both source and destination — i.e. use the QEMU > '-cpu host' for the L1 guests. At least that's been the case in my > experience. FWIW, I frequently use that setup in my test environments. Correcting my erroneous statement above: For live migration to work in a nested KVM setup, it is _not_ mandatory to use "-cpu host". I just did another test. Here I used libvirt's 'host-model' for both source and destination L1 guests, _and_ for L2 guest. Migrated the L2 to destination L1, worked great. In my setup, both my L1 guests recieved the following CPU configuration (in QEMU command-line): [...] -cpu Haswell-noTSX,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,\ hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,aes=off [...] And the L2 guest recieved this: [...] -cpu Haswell-noTSX,vme=on,ss=on,f16c=on,rdrand=on,hypervisor=on,\ arat=on,tsc_adjust=on,xsaveopt=on,pdpe1gb=on,abm=on,aes=off,invpcid=off [...] -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 11:34 ` Kashyap Chamarthy @ 2018-02-08 11:40 ` Daniel P. Berrangé 0 siblings, 0 replies; 22+ messages in thread From: Daniel P. Berrangé @ 2018-02-08 11:40 UTC (permalink / raw) To: Kashyap Chamarthy; +Cc: David Hildenbrand, libvirt-users, kvm On Thu, Feb 08, 2018 at 12:34:24PM +0100, Kashyap Chamarthy wrote: > On Thu, Feb 08, 2018 at 11:46:24AM +0100, Kashyap Chamarthy wrote: > > On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: > > > On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > > > [...] > > > > > Sounds like a similar problem as in > > > https://bugzilla.kernel.org/show_bug.cgi?id=198621 > > > > > > In short: there is no (live) migration support for nested VMX yet. So as > > > soon as your guest is using VMX itself ("nVMX"), this is not expected to > > > work. > > > > Actually, live migration with nVMX _does_ work insofar as you have > > _identical_ CPUs on both source and destination — i.e. use the QEMU > > '-cpu host' for the L1 guests. At least that's been the case in my > > experience. FWIW, I frequently use that setup in my test environments. > > Correcting my erroneous statement above: For live migration to work in a > nested KVM setup, it is _not_ mandatory to use "-cpu host". Yes, assuming the L1 guests both get given the same CPU model, then you can use any CPU model at all for the L2 guests and still be migrate safe, since your L1 guests provide homogeneous hardware to host L2, regardless of whether the L0 host is homogeneous. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 10:46 ` Kashyap Chamarthy 2018-02-08 11:34 ` Kashyap Chamarthy @ 2018-02-08 11:48 ` David Hildenbrand 2018-02-08 15:23 ` Kashyap Chamarthy 1 sibling, 1 reply; 22+ messages in thread From: David Hildenbrand @ 2018-02-08 11:48 UTC (permalink / raw) To: Kashyap Chamarthy; +Cc: Florian Haas, libvirt-users, kvm On 08.02.2018 11:46, Kashyap Chamarthy wrote: > On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: >> On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > [...] > >> Sounds like a similar problem as in >> https://bugzilla.kernel.org/show_bug.cgi?id=198621 >> >> In short: there is no (live) migration support for nested VMX yet. So as >> soon as your guest is using VMX itself ("nVMX"), this is not expected to >> work. > > Actually, live migration with nVMX _does_ work insofar as you have > _identical_ CPUs on both source and destination — i.e. use the QEMU > '-cpu host' for the L1 guests. At least that's been the case in my > experience. FWIW, I frequently use that setup in my test environments. > Your mixing use cases. While you talk about migrating a L2, this is about migrating an L1, running L2. Migrating an L2 is expected to work just like when migrating an L1, not running L2. (of course, the usual trouble with CPU models, but upper layers should check and handle that). > Just to be quadruple sure, I did the test: Migrate an L2 guest (with > non-shared storage), and it worked just fine. (No 'oops'es, no stack > traces, no "kernel BUG" in `dmesg` or serial consoles on L1s. And I can > login to the L2 guest on the destination L1 just fine.) > > Once you have the password-less SSH between source and destination, and > a bit of libvirt config setup. I ran the migrate command as following: > > $ virsh migrate --verbose --copy-storage-all \ > --live cvm1 qemu+tcp://root@f26-vm2/system > Migration: [100 %] > $ echo $? > 0 > > Full details: > https://kashyapc.fedorapeople.org/virt/Migrate-a-nested-guest-08Feb2018.txt > > (At the end of the document above, I also posted the libvirt config and > the version details across L0, L1 and L2. So this is a fully repeatable > test.) > > -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) 2018-02-08 11:48 ` David Hildenbrand @ 2018-02-08 15:23 ` Kashyap Chamarthy 0 siblings, 0 replies; 22+ messages in thread From: Kashyap Chamarthy @ 2018-02-08 15:23 UTC (permalink / raw) To: David Hildenbrand; +Cc: Florian Haas, libvirt-users, kvm On Thu, Feb 08, 2018 at 12:48:46PM +0100, David Hildenbrand wrote: > On 08.02.2018 11:46, Kashyap Chamarthy wrote: > > On Wed, Feb 07, 2018 at 11:26:14PM +0100, David Hildenbrand wrote: > >> On 07.02.2018 16:31, Kashyap Chamarthy wrote: > > > > [...] > > > >> Sounds like a similar problem as in > >> https://bugzilla.kernel.org/show_bug.cgi?id=198621 > >> > >> In short: there is no (live) migration support for nested VMX yet. So as > >> soon as your guest is using VMX itself ("nVMX"), this is not expected to > >> work. > > > > Actually, live migration with nVMX _does_ work insofar as you have > > _identical_ CPUs on both source and destination — i.e. use the QEMU > > '-cpu host' for the L1 guests. At least that's been the case in my > > experience. FWIW, I frequently use that setup in my test environments. > > > > Your mixing use cases. While you talk about migrating a L2, this is > about migrating an L1, running L2. Yes, you're right. I mixed up briefly, and corrected myself in the other email. We're on the same page. > Migrating an L2 is expected to work just like when migrating an L1, not > running L2. (of course, the usual trouble with CPU models, but upper > layers should check and handle that). Yep. --- Aside: I also remember seeing Vitaly's nice talk[*] at FOSDEM last weekend ("A slightly different kind of nesting"), where he talks about how nVMX actually works in context of Intel's hardware feature "VMCS Shadowing", to reduce number of VMEXITs and VMENTRYs. (Particularly look at his slides 8, 9 and 10.) I reproduced his diagram from his slide-10 ("How nested virtualization really works on Intel") in ASCII here: .---------------------------------------. | | VMCS L1->L2 | | | '-------------' | | | | | L1 (guest | L2 (nested | | hypervisor) | guest) | | | | | | | .---------------------------------------. | VMCS L0->L1 | VMCS L0->L2 | .---------------------------------------. | | | L0 hypervisor | '---------------------------------------' | | | Hardware | '---------------------------------------' [*] https://fosdem.org/2018/schedule/event/vai_kvm_on_hyperv/attachments/slides/2200/export/events/attachments/vai_kvm_on_hyperv/slides/2200/slides_fosdem2018_vkuznets.pdf [...] -- /kashyap ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-02-12 14:08 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAPUexz9mg8wtAkWKfQLqoFgTQ6i+2pC4bGSkTwCEq-nQZin1hg@mail.gmail.com> 2018-02-07 15:31 ` [libvirt-users] Nested KVM: L0 guest produces kernel BUG on wakeup from managed save (while a nested VM is running) Kashyap Chamarthy 2018-02-07 22:26 ` David Hildenbrand 2018-02-08 8:19 ` Florian Haas 2018-02-08 12:07 ` David Hildenbrand 2018-02-08 13:29 ` Florian Haas 2018-02-08 13:47 ` David Hildenbrand 2018-02-08 13:57 ` Florian Haas 2018-02-08 14:51 ` Paolo Bonzini 2018-02-08 14:51 ` [Qemu-devel] " Paolo Bonzini 2018-02-08 14:55 ` David Hildenbrand 2018-02-08 14:59 ` Daniel P. Berrangé 2018-02-08 14:45 ` Kashyap Chamarthy 2018-02-08 17:44 ` Florian Haas 2018-02-09 10:48 ` Kashyap Chamarthy 2018-02-09 11:02 ` Florian Haas 2018-02-12 9:27 ` Kashyap Chamarthy 2018-02-12 14:07 ` Florian Haas 2018-02-08 10:46 ` Kashyap Chamarthy 2018-02-08 11:34 ` Kashyap Chamarthy 2018-02-08 11:40 ` Daniel P. Berrangé 2018-02-08 11:48 ` David Hildenbrand 2018-02-08 15:23 ` Kashyap Chamarthy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.