* [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? @ 2019-01-18 5:32 Mark Mielke 2019-01-18 6:18 ` Christian Ehrhardt 2019-01-18 10:02 ` Dr. David Alan Gilbert 0 siblings, 2 replies; 15+ messages in thread From: Mark Mielke @ 2019-01-18 5:32 UTC (permalink / raw) To: qemu-devel Thank you for the work on nested virtualization. Having had live migrations fail in the past when nested virtualization has been active, it is great to see that clever people have been working on this problem! My question is about whether a migration path has been considered to allow live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? Qemu 2.12 doesn't know about the new nested state available from newer Linux kernels, and it might be used on a machine with an older kernel that doesn't make the nested state available. If Qemu 3.2 is on an up-to-date host with an up-to-date kernel that does support the nested state, I'd like to ensure we have the ability to try the migrations. In the past, I've found that: 1) If the guest had used nested virtualization before, the migration often fails. However, if we reboot the guest and do not use nested virtualization, this simplifies to... 2) If the guest has never used nested virtualization before, the migration succeeds. I would like to leverage 2) as much as possible to migrate forwards to Qemu 3.2 hosts (once it is available). I can normally enter the guest to see if 1) is likely or not, and handle these ones specially. If only 20% of the guests have ever used nested virtualization, then I would like the option to safely migrate 80% of the guests using live migration, and handle the 20% as exceptions. This is the 3.1 change log that got my attention: - x86 machines cannot be live-migrated if nested Intel virtualization is enabled. The next version of QEMU will be able to do live migration when nested virtualization is enabled, if supported by the kernel. I believe this is the change it refers to: commit d98f26073bebddcd3da0ba1b86c3a34e840c0fb8 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Wed Nov 14 10:38:13 2018 +0100 target/i386: kvm: add VMX migration blocker Nested VMX does not support live migration yet. Add a blocker until that is worked out. Nested SVM only does not support it, but unfortunately it is enabled by default for -cpu host so we cannot really disable it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> This particular check seems very simplistic: + if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) { + error_setg(&vmx_mig_blocker, + "Nested VMX virtualization does not support live migration yet"); + r = migrate_add_blocker(vmx_mig_blocker, &local_err); + if (local_err) { + error_report_err(local_err); + error_free(vmx_mig_blocker); + return r; + } + } + It fails if the flag is set, rather than if any nested virtualization has been used before. I'm concerned I will end up with a requirement for *all* guests to be restarted in order to migrate them to the new hosts, rather than just the ones that would have a problem. Thoughts? Thanks! -- Mark Mielke <mark.mielke@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 5:32 [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? Mark Mielke @ 2019-01-18 6:18 ` Christian Ehrhardt 2019-01-22 7:20 ` Like Xu 2019-01-18 10:02 ` Dr. David Alan Gilbert 1 sibling, 1 reply; 15+ messages in thread From: Christian Ehrhardt @ 2019-01-18 6:18 UTC (permalink / raw) To: Mark Mielke; +Cc: qemu-devel On Fri, Jan 18, 2019 at 7:33 AM Mark Mielke <mark.mielke@gmail.com> wrote: > > Thank you for the work on nested virtualization. Having had live migrations > fail in the past when nested virtualization has been active, it is great to > see that clever people have been working on this problem! > > My question is about whether a migration path has been considered to allow > live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag > enabled in the guest? > > Qemu 2.12 doesn't know about the new nested state available from newer > Linux kernels, and it might be used on a machine with an older kernel that > doesn't make the nested state available. If Qemu 3.2 is on an up-to-date > host with an up-to-date kernel that does support the nested state, I'd like > to ensure we have the ability to try the migrations. > > In the past, I've found that: > > 1) If the guest had used nested virtualization before, the migration often > fails. However, if we reboot the guest and do not use nested > virtualization, this simplifies to... > 2) If the guest has never used nested virtualization before, the migration > succeeds. > > I would like to leverage 2) as much as possible to migrate forwards to Qemu > 3.2 hosts (once it is available). I can normally enter the guest to see if > 1) is likely or not, and handle these ones specially. If only 20% of the > guests have ever used nested virtualization, then I would like the option > to safely migrate 80% of the guests using live migration, and handle the > 20% as exceptions. > > This is the 3.1 change log that got my attention: > > > - x86 machines cannot be live-migrated if nested Intel virtualization is > enabled. The next version of QEMU will be able to do live migration when > nested virtualization is enabled, if supported by the kernel. > > > I believe this is the change it refers to: > > commit d98f26073bebddcd3da0ba1b86c3a34e840c0fb8 > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Wed Nov 14 10:38:13 2018 +0100 > > target/i386: kvm: add VMX migration blocker > > Nested VMX does not support live migration yet. Add a blocker > until that is worked out. > > Nested SVM only does not support it, but unfortunately it is > enabled by default for -cpu host so we cannot really disable it. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > This particular check seems very simplistic: > > + if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) { > + error_setg(&vmx_mig_blocker, > + "Nested VMX virtualization does not support live > migration yet"); > + r = migrate_add_blocker(vmx_mig_blocker, &local_err); > + if (local_err) { > + error_report_err(local_err); > + error_free(vmx_mig_blocker); > + return r; > + } > + } > + > > It fails if the flag is set, rather than if any nested virtualization has > been used before. Hi Mark, I was facing the same question just recently - thanks for bringing it up. Even more emphasized as Ubuntu (for ease of use of nested virtualization) will enable the VMX flag by default. That made me end up with no guest being able to migrate at all, which as you point out clearly was not the case - they would migrate fine. In almost all use cases it would be just the VMX flag that was set, but never used. I haven't thought about it before your mail, but if there would be a way to differentiate between "VMX available" and "VMX actually used" that would be a much better check to set the blocker. For now I reverted above patch with the migration blocker in Ubuntu to get the situation temporarily resolved. I considered it a downstream thing as it is mostly triggered by our decision to make VMX available by default which was made years ago - that is the reason I didn't bring it up here, but now that you brought it up it is worth the discussion for sure. Mid term I expect that migration will work for nested guests as well which makes me able to drop that delta then. > I'm concerned I will end up with a requirement for *all* guests to be > restarted in order to migrate them to the new hosts, rather than just the > ones that would have a problem. > > Thoughts? > > Thanks! > > -- > Mark Mielke <mark.mielke@gmail.com> -- Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 6:18 ` Christian Ehrhardt @ 2019-01-22 7:20 ` Like Xu 0 siblings, 0 replies; 15+ messages in thread From: Like Xu @ 2019-01-22 7:20 UTC (permalink / raw) To: qemu-devel On 2019/1/18 14:18, Christian Ehrhardt wrote: > On Fri, Jan 18, 2019 at 7:33 AM Mark Mielke <mark.mielke@gmail.com> wrote: >> >> Thank you for the work on nested virtualization. Having had live migrations >> fail in the past when nested virtualization has been active, it is great to >> see that clever people have been working on this problem! >> >> My question is about whether a migration path has been considered to allow >> live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag >> enabled in the guest? >> >> Qemu 2.12 doesn't know about the new nested state available from newer >> Linux kernels, and it might be used on a machine with an older kernel that >> doesn't make the nested state available. If Qemu 3.2 is on an up-to-date >> host with an up-to-date kernel that does support the nested state, I'd like >> to ensure we have the ability to try the migrations. >> >> In the past, I've found that: >> >> 1) If the guest had used nested virtualization before, the migration often >> fails. However, if we reboot the guest and do not use nested >> virtualization, this simplifies to... >> 2) If the guest has never used nested virtualization before, the migration >> succeeds. >> >> I would like to leverage 2) as much as possible to migrate forwards to Qemu >> 3.2 hosts (once it is available). I can normally enter the guest to see if >> 1) is likely or not, and handle these ones specially. If only 20% of the >> guests have ever used nested virtualization, then I would like the option >> to safely migrate 80% of the guests using live migration, and handle the >> 20% as exceptions. >> >> This is the 3.1 change log that got my attention: >> >> >> - x86 machines cannot be live-migrated if nested Intel virtualization is >> enabled. The next version of QEMU will be able to do live migration when >> nested virtualization is enabled, if supported by the kernel. >> >> >> I believe this is the change it refers to: >> >> commit d98f26073bebddcd3da0ba1b86c3a34e840c0fb8 >> Author: Paolo Bonzini <pbonzini@redhat.com> >> Date: Wed Nov 14 10:38:13 2018 +0100 >> >> target/i386: kvm: add VMX migration blocker >> >> Nested VMX does not support live migration yet. Add a blocker >> until that is worked out. >> >> Nested SVM only does not support it, but unfortunately it is >> enabled by default for -cpu host so we cannot really disable it. >> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> >> >> This particular check seems very simplistic: >> >> + if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) { >> + error_setg(&vmx_mig_blocker, >> + "Nested VMX virtualization does not support live >> migration yet"); >> + r = migrate_add_blocker(vmx_mig_blocker, &local_err); >> + if (local_err) { >> + error_report_err(local_err); >> + error_free(vmx_mig_blocker); >> + return r; >> + } >> + } >> + >> >> It fails if the flag is set, rather than if any nested virtualization has >> been used before. > > Hi Mark, > I was facing the same question just recently - thanks for bringing it up. > > Even more emphasized as Ubuntu (for ease of use of nested > virtualization) will enable the VMX flag by default. > That made me end up with no guest being able to migrate at all, which > as you point out clearly was not the case - they would migrate fine. > In almost all use cases it would be just the VMX flag that was set, > but never used. > > I haven't thought about it before your mail, but if there would be a > way to differentiate between "VMX available" and "VMX actually used" > that would be a much better check to set the blocker. My concern is how could we understand or define "VMX actually used" for nested migration support? > > For now I reverted above patch with the migration blocker in Ubuntu to > get the situation temporarily resolved. > I considered it a downstream thing as it is mostly triggered by our > decision to make VMX available by default which was made years ago - > that is the reason I didn't bring it up here, but now that you brought > it up it is worth the discussion for sure. > > Mid term I expect that migration will work for nested guests as well > which makes me able to drop that delta then. > >> I'm concerned I will end up with a requirement for *all* guests to be >> restarted in order to migrate them to the new hosts, rather than just the >> ones that would have a problem. >> >> Thoughts? >> >> Thanks! >> >> -- >> Mark Mielke <mark.mielke@gmail.com> > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 5:32 [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? Mark Mielke 2019-01-18 6:18 ` Christian Ehrhardt @ 2019-01-18 10:02 ` Dr. David Alan Gilbert 2019-01-18 10:11 ` Paolo Bonzini 1 sibling, 1 reply; 15+ messages in thread From: Dr. David Alan Gilbert @ 2019-01-18 10:02 UTC (permalink / raw) To: Mark Mielke, pbonzini; +Cc: qemu-devel, christian.ehrhardt * Mark Mielke (mark.mielke@gmail.com) wrote: > Thank you for the work on nested virtualization. Having had live migrations > fail in the past when nested virtualization has been active, it is great to > see that clever people have been working on this problem! > > My question is about whether a migration path has been considered to allow > live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag > enabled in the guest? > > Qemu 2.12 doesn't know about the new nested state available from newer > Linux kernels, and it might be used on a machine with an older kernel that > doesn't make the nested state available. If Qemu 3.2 is on an up-to-date > host with an up-to-date kernel that does support the nested state, I'd like > to ensure we have the ability to try the migrations. > > In the past, I've found that: > > 1) If the guest had used nested virtualization before, the migration often > fails. However, if we reboot the guest and do not use nested > virtualization, this simplifies to... > 2) If the guest has never used nested virtualization before, the migration > succeeds. > > I would like to leverage 2) as much as possible to migrate forwards to Qemu > 3.2 hosts (once it is available). I can normally enter the guest to see if > 1) is likely or not, and handle these ones specially. If only 20% of the > guests have ever used nested virtualization, then I would like the option > to safely migrate 80% of the guests using live migration, and handle the > 20% as exceptions. > > This is the 3.1 change log that got my attention: > > > - x86 machines cannot be live-migrated if nested Intel virtualization is > enabled. The next version of QEMU will be able to do live migration when > nested virtualization is enabled, if supported by the kernel. > > > I believe this is the change it refers to: > > commit d98f26073bebddcd3da0ba1b86c3a34e840c0fb8 > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Wed Nov 14 10:38:13 2018 +0100 > > target/i386: kvm: add VMX migration blocker > > Nested VMX does not support live migration yet. Add a blocker > until that is worked out. > > Nested SVM only does not support it, but unfortunately it is > enabled by default for -cpu host so we cannot really disable it. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > This particular check seems very simplistic: > > + if ((env->features[FEAT_1_ECX] & CPUID_EXT_VMX) && !vmx_mig_blocker) { > + error_setg(&vmx_mig_blocker, > + "Nested VMX virtualization does not support live > migration yet"); > + r = migrate_add_blocker(vmx_mig_blocker, &local_err); > + if (local_err) { > + error_report_err(local_err); > + error_free(vmx_mig_blocker); > + return r; > + } > + } > + > > It fails if the flag is set, rather than if any nested virtualization has > been used before. > > I'm concerned I will end up with a requirement for *all* guests to be > restarted in order to migrate them to the new hosts, rather than just the > ones that would have a problem. I think you should be able to migrate from 2.12->3.1 like this, but you'd hit the problem when you then try and migrate again between your new QEMUs. I guess we could modify it to wire it to machine type, so that older machine types didn't block. Dave > Thoughts? > > Thanks! > > -- > Mark Mielke <mark.mielke@gmail.com> -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 10:02 ` Dr. David Alan Gilbert @ 2019-01-18 10:11 ` Paolo Bonzini 2019-01-18 10:16 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 15+ messages in thread From: Paolo Bonzini @ 2019-01-18 10:11 UTC (permalink / raw) To: Dr. David Alan Gilbert, Mark Mielke; +Cc: qemu-devel, christian.ehrhardt On 18/01/19 11:02, Dr. David Alan Gilbert wrote: >> >> It fails if the flag is set, rather than if any nested virtualization has >> been used before. >> >> I'm concerned I will end up with a requirement for *all* guests to be >> restarted in order to migrate them to the new hosts, rather than just the >> ones that would have a problem. > I think you should be able to migrate from 2.12->3.1 like this, but > you'd hit the problem when you then try and migrate again between your > new QEMUs. > > I guess we could modify it to wire it to machine type, so that > older machine types didn't block. That would also be wrong. The old machine types _could_ be using KVM and they have no way to block the live migration. The solution is to restart the VM using "-cpu host,-vmx". Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 10:11 ` Paolo Bonzini @ 2019-01-18 10:16 ` Dr. David Alan Gilbert 2019-01-18 10:21 ` Daniel P. Berrangé 0 siblings, 1 reply; 15+ messages in thread From: Dr. David Alan Gilbert @ 2019-01-18 10:16 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Mark Mielke, qemu-devel, christian.ehrhardt * Paolo Bonzini (pbonzini@redhat.com) wrote: > On 18/01/19 11:02, Dr. David Alan Gilbert wrote: > >> > >> It fails if the flag is set, rather than if any nested virtualization has > >> been used before. > >> > >> I'm concerned I will end up with a requirement for *all* guests to be > >> restarted in order to migrate them to the new hosts, rather than just the > >> ones that would have a problem. > > I think you should be able to migrate from 2.12->3.1 like this, but > > you'd hit the problem when you then try and migrate again between your > > new QEMUs. > > > > I guess we could modify it to wire it to machine type, so that > > older machine types didn't block. > > That would also be wrong. The old machine types _could_ be using KVM > and they have no way to block the live migration. > > The solution is to restart the VM using "-cpu host,-vmx". The problem as Christian explained in that thread is that it was common for them to start VMs with vmx enabled but for people not to use it on most of the VMs, so we break migration for most VMs even though most don't use it. It might not be robust, but it worked for a lot of people most of the time. Dave > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 10:16 ` Dr. David Alan Gilbert @ 2019-01-18 10:21 ` Daniel P. Berrangé 2019-01-18 12:57 ` Paolo Bonzini 0 siblings, 1 reply; 15+ messages in thread From: Daniel P. Berrangé @ 2019-01-18 10:21 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Paolo Bonzini, Mark Mielke, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 10:16:34AM +0000, Dr. David Alan Gilbert wrote: > * Paolo Bonzini (pbonzini@redhat.com) wrote: > > On 18/01/19 11:02, Dr. David Alan Gilbert wrote: > > >> > > >> It fails if the flag is set, rather than if any nested virtualization has > > >> been used before. > > >> > > >> I'm concerned I will end up with a requirement for *all* guests to be > > >> restarted in order to migrate them to the new hosts, rather than just the > > >> ones that would have a problem. > > > I think you should be able to migrate from 2.12->3.1 like this, but > > > you'd hit the problem when you then try and migrate again between your > > > new QEMUs. > > > > > > I guess we could modify it to wire it to machine type, so that > > > older machine types didn't block. > > > > That would also be wrong. The old machine types _could_ be using KVM > > and they have no way to block the live migration. > > > > The solution is to restart the VM using "-cpu host,-vmx". > > The problem as Christian explained in that thread is that it was common > for them to start VMs with vmx enabled but for people not to use it > on most of the VMs, so we break migration for most VMs even though most > don't use it. > > It might not be robust, but it worked for a lot of people most of the > time. Yes, this is exactly why I said we should make the migration blocker be conditional on any L2 guest having been started. I vaguely recall someone saying there wasn't any way to detect this situation from QEMU though ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 10:21 ` Daniel P. Berrangé @ 2019-01-18 12:57 ` Paolo Bonzini 2019-01-18 13:41 ` Mark Mielke 2019-01-18 13:44 ` Daniel P. Berrangé 0 siblings, 2 replies; 15+ messages in thread From: Paolo Bonzini @ 2019-01-18 12:57 UTC (permalink / raw) To: Daniel P. Berrangé, Dr. David Alan Gilbert Cc: Mark Mielke, qemu-devel, christian.ehrhardt On 18/01/19 11:21, Daniel P. Berrangé wrote: > On Fri, Jan 18, 2019 at 10:16:34AM +0000, Dr. David Alan Gilbert wrote: >> * Paolo Bonzini (pbonzini@redhat.com) wrote: >>> On 18/01/19 11:02, Dr. David Alan Gilbert wrote: >>>>> >>>>> It fails if the flag is set, rather than if any nested virtualization has >>>>> been used before. >>>>> >>>>> I'm concerned I will end up with a requirement for *all* guests to be >>>>> restarted in order to migrate them to the new hosts, rather than just the >>>>> ones that would have a problem. >>>> I think you should be able to migrate from 2.12->3.1 like this, but >>>> you'd hit the problem when you then try and migrate again between your >>>> new QEMUs. >>>> >>>> I guess we could modify it to wire it to machine type, so that >>>> older machine types didn't block. >>> >>> That would also be wrong. The old machine types _could_ be using KVM >>> and they have no way to block the live migration. >>> >>> The solution is to restart the VM using "-cpu host,-vmx". >> >> The problem as Christian explained in that thread is that it was common >> for them to start VMs with vmx enabled but for people not to use it >> on most of the VMs, so we break migration for most VMs even though most >> don't use it. >> >> It might not be robust, but it worked for a lot of people most of the >> time. It's not "not robust" (like, it usually works but sometimes fails mysteriously). It's entirely broken, you just don't notice that it is if you're not using the feature. > Yes, this is exactly why I said we should make the migration blocker > be conditional on any L2 guest having been started. I vaguely recall > someone saying there wasn't any way to detect this situation from > QEMU though ? You can check that and give a warning (check that CR4.VMXE=1 but no other live migration state was transferred). However, without live migration support in the kernel and in QEMU you cannot start VMs *for the entire future life of the VM* after a live migration. So even if we implemented that kind of blocker, it would fail even if no VM has been started, as long as the kvm_intel module is loaded on migration. That would be no different in practice from what we have now. It might work to unload the kvm_intel module and run live migration with the CPU configured differently ("-cpu host,-vmx") on the destination. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 12:57 ` Paolo Bonzini @ 2019-01-18 13:41 ` Mark Mielke 2019-01-18 15:25 ` Paolo Bonzini 2019-01-18 13:44 ` Daniel P. Berrangé 1 sibling, 1 reply; 15+ messages in thread From: Mark Mielke @ 2019-01-18 13:41 UTC (permalink / raw) To: Paolo Bonzini Cc: Daniel P. Berrangé, Dr. David Alan Gilbert, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 7:57 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > On 18/01/19 11:21, Daniel P. Berrangé wrote: > > On Fri, Jan 18, 2019 at 10:16:34AM +0000, Dr. David Alan Gilbert wrote: > >> * Paolo Bonzini (pbonzini@redhat.com) wrote: > >>> The solution is to restart the VM using "-cpu host,-vmx". > >> The problem as Christian explained in that thread is that it was common > >> for them to start VMs with vmx enabled but for people not to use it > >> on most of the VMs, so we break migration for most VMs even though most > >> don't use it. > >> It might not be robust, but it worked for a lot of people most of the > >> time. > It's not "not robust" (like, it usually works but sometimes fails > mysteriously). It's entirely broken, you just don't notice that it is > if you're not using the feature. > It is useful to understand the risk. However, this is the same risk we have been successfully living with for several years now, and it seems abrupt to declare 3.1 and 3.2 as the Qemu version beyond which migration requires a whole cluster restart whether or not a L2 guest had been, or will ever be started on any of the guests. I would like to see the risk clearly communicated, and have the option of proceeding anyways (as we have every day since first deploying the solution). I think I am not alone here, otherwise I would have quietly implemented a naive patch myself without raising this for discussion. :-) Given the known risk, I'm happy to restart all machines that have or will likely use an L2 guest, and leverage this capability for the 80%+ of machines that will never launch an L2 guest. Although, detecting it and using this to block live migration in case any mistakes in detection were made would be very cool as well. Is this something that will already work with the pending 3.2 code? Or is any change required to achieve this? Is it best to upgrade to 3.0 before proceeding to 3.2 (once it is released), or will it be acceptable to migrate from 2.12 directly to 3.2 in this manner? > Yes, this is exactly why I said we should make the migration blocker > > be conditional on any L2 guest having been started. I vaguely recall > > someone saying there wasn't any way to detect this situation from > > QEMU though ? > You can check that and give a warning (check that CR4.VMXE=1 but no > other live migration state was transferred). However, without live > migration support in the kernel and in QEMU you cannot start VMs *for > the entire future life of the VM* after a live migration. So even if we > implemented that kind of blocker, it would fail even if no VM has been > started, as long as the kvm_intel module is loaded on migration. That > would be no different in practice from what we have now. > > It might work to unload the kvm_intel module and run live migration with > the CPU configured differently ("-cpu host,-vmx") on the destination. > For machines that will not use L2 guest, would it be a good precaution to unload kvm_intel pre-emptively before live migration just in case? In particular, I'm curious if doing anything at all increases the risk of failure, or if it should be left alone entirely and never used as the lowest risk option (and what we have traditionally been doing anyways). I do appreciate the warnings and details. Just not the enforcement piece. Thanks! -- Mark Mielke <mark.mielke@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 13:41 ` Mark Mielke @ 2019-01-18 15:25 ` Paolo Bonzini 2019-01-18 19:31 ` Dr. David Alan Gilbert 2019-01-22 22:58 ` Mark Mielke 0 siblings, 2 replies; 15+ messages in thread From: Paolo Bonzini @ 2019-01-18 15:25 UTC (permalink / raw) To: Mark Mielke Cc: Daniel P. Berrangé, Dr. David Alan Gilbert, qemu-devel, christian.ehrhardt On 18/01/19 14:41, Mark Mielke wrote: > It is useful to understand the risk. However, this is the same risk we > have been successfully living with for several years now, and it seems > abrupt to declare 3.1 and 3.2 as the Qemu version beyond which migration > requires a whole cluster restart whether or not a L2 guest had been, or > will ever be started on any of the guests. Only if nested was enabled for the kvm_intel module. If you didn't enable it, you didn't see any change with 3.1. Nested was enabled for kvm_amd years ago. It was a mistake, but that's why we didn't add such a blocker for AMD. Paolo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 15:25 ` Paolo Bonzini @ 2019-01-18 19:31 ` Dr. David Alan Gilbert 2019-01-22 22:58 ` Mark Mielke 1 sibling, 0 replies; 15+ messages in thread From: Dr. David Alan Gilbert @ 2019-01-18 19:31 UTC (permalink / raw) To: Paolo Bonzini Cc: Mark Mielke, Daniel P. Berrangé, qemu-devel, christian.ehrhardt * Paolo Bonzini (pbonzini@redhat.com) wrote: > On 18/01/19 14:41, Mark Mielke wrote: > > It is useful to understand the risk. However, this is the same risk we > > have been successfully living with for several years now, and it seems > > abrupt to declare 3.1 and 3.2 as the Qemu version beyond which migration > > requires a whole cluster restart whether or not a L2 guest had been, or > > will ever be started on any of the guests. > > Only if nested was enabled for the kvm_intel module. If you didn't > enable it, you didn't see any change with 3.1. > > Nested was enabled for kvm_amd years ago. It was a mistake, but that's > why we didn't add such a blocker for AMD. Still, it should have probably been machine-type tied; there's lots and lots of broken things we've done in the past which we've kept compatibility with on old machine types. Dave > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 15:25 ` Paolo Bonzini 2019-01-18 19:31 ` Dr. David Alan Gilbert @ 2019-01-22 22:58 ` Mark Mielke 1 sibling, 0 replies; 15+ messages in thread From: Mark Mielke @ 2019-01-22 22:58 UTC (permalink / raw) To: Paolo Bonzini Cc: Daniel P. Berrangé, Dr. David Alan Gilbert, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 10:25 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > On 18/01/19 14:41, Mark Mielke wrote: > > It is useful to understand the risk. However, this is the same risk we > > have been successfully living with for several years now, and it seems > > abrupt to declare 3.1 and 3.2 as the Qemu version beyond which migration > > requires a whole cluster restart whether or not a L2 guest had been, or > > will ever be started on any of the guests. > Only if nested was enabled for the kvm_intel module. If you didn't > enable it, you didn't see any change with 3.1. > We enable it, because a number of the machines require it, and we want to use the same cluster for both use cases. > Nested was enabled for kvm_amd years ago. It was a mistake, but that's > why we didn't add such a blocker for AMD. > I can see how there are users out there that might have it enabled by mistake. But, in our case we explicitly enabled it because one of the key use cases we are addressing is a move from physical workstations to virtual workstations, where product teams have libvirt/qemu/kvm based simulation targets that they are required to run to develop, debug, and test. Before this was resolved - I knew live migration with nested KVM was flaky. I didn't know exactly why or how (although I did suspect), but it works very well for our use cases, and we only rarely use live migration except to upgrade hypervisors. I can also usually detect if KVM is/was/will be used or not, and treat these machines specially (by shutting them down, migrating them, and starting them back up, although in the past before I recognized the severity of this problem I have done live migration and then recommended a restart on their schedule). With the new information from this thread and the release notes that lead me to starting this thread, I will definitely ensure that these machines are properly shut down before they are migrated and started back up. But, I still need to deal with the issue of hundreds of machines which are on the same cluster, which happen to be having the VMX bit passed through, but will never use nested KVM. These machines can't be restarted, but because they are shared by multiple tenants (all internal to our company - different product teams, different owners), it will be incredibly difficult to get buy-in for a system wide restart. It will be much easier for me to live migrate a majority of the machines with the same level of safety as I have today with Qemu 2.12, and then deal with the exceptions one at a time in co-ordination with the owners. For example, if a physical machine has 20 guests on it, and 2 of those guests are using the nested KVM feature (or will use it in future or past), then I would like to live migrate the 18 to new machines and then contact the owners of the two machines to schedule down time to safely move them to fully evacuate the machine and upgrade it. We know Qemu 2.12 is broken with this configuration. That's what I am on today. I think it verges on "ivory tower" / "purist" to say that I absolutely should not expect to be able to live migrate to from Qemu 2.12 to Qemu 4.0 and inherit the same risk that I already have with Qemu 2.12 to Qemu 2.12, and that a system wide restart is the only correct option. However, I can accept that you don't want to accept responsibility for people in this scenario, and you want them to face the problem head-on, and not allow for blame to come back to the Qemu team, where they say "but Qemu 4.0 fixed this, right? why is it still broken after i live migrated from Qemu 2.12?" I think this is where you are coming from? I can appreciate that. If so, I'd like to know whether I can locally patch Qemu 4.0 to remove the live migration check, and whether in theory, and with my own testing, and with me taking responsibility for my own systems and not blaming you for anything that goes wrong, if you think with your best educated guess, that it should probably work as well as it did with a Qemu 2.12 live migration with the same VMX bit set, and the guests in the same state. I think I saw somebody on this thread is already doing this for Ubuntu with Qemu 3.2? Thanks for any insight you can provide! :-) I do appreciate it greatly! P.S. I will ensure that every system is restarted properly. The problem is that I need to stagger this, and not require the entire environment, or entire hypervisors worth of hosts with multiple tenants to go down simultaneously. I'd rather track the machines left to do, and tackle them in groups over several months and as opportunity is available. It is more work for me, but when it comes to choosing between interrupting product release cycles and me spending a little more time, and us accepting approximately the same risk we already have today - the correct business decision needs to be made. -- Mark Mielke <mark.mielke@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 12:57 ` Paolo Bonzini 2019-01-18 13:41 ` Mark Mielke @ 2019-01-18 13:44 ` Daniel P. Berrangé 2019-01-18 14:09 ` Mark Mielke 1 sibling, 1 reply; 15+ messages in thread From: Daniel P. Berrangé @ 2019-01-18 13:44 UTC (permalink / raw) To: Paolo Bonzini Cc: Dr. David Alan Gilbert, Mark Mielke, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 01:57:31PM +0100, Paolo Bonzini wrote: > On 18/01/19 11:21, Daniel P. Berrangé wrote: > > On Fri, Jan 18, 2019 at 10:16:34AM +0000, Dr. David Alan Gilbert wrote: > >> * Paolo Bonzini (pbonzini@redhat.com) wrote: > >>> On 18/01/19 11:02, Dr. David Alan Gilbert wrote: > >>>>> > >>>>> It fails if the flag is set, rather than if any nested virtualization has > >>>>> been used before. > >>>>> > >>>>> I'm concerned I will end up with a requirement for *all* guests to be > >>>>> restarted in order to migrate them to the new hosts, rather than just the > >>>>> ones that would have a problem. > >>>> I think you should be able to migrate from 2.12->3.1 like this, but > >>>> you'd hit the problem when you then try and migrate again between your > >>>> new QEMUs. > >>>> > >>>> I guess we could modify it to wire it to machine type, so that > >>>> older machine types didn't block. > >>> > >>> That would also be wrong. The old machine types _could_ be using KVM > >>> and they have no way to block the live migration. > >>> > >>> The solution is to restart the VM using "-cpu host,-vmx". > >> > >> The problem as Christian explained in that thread is that it was common > >> for them to start VMs with vmx enabled but for people not to use it > >> on most of the VMs, so we break migration for most VMs even though most > >> don't use it. > >> > >> It might not be robust, but it worked for a lot of people most of the > >> time. > > It's not "not robust" (like, it usually works but sometimes fails > mysteriously). It's entirely broken, you just don't notice that it is > if you're not using the feature. > > > Yes, this is exactly why I said we should make the migration blocker > > be conditional on any L2 guest having been started. I vaguely recall > > someone saying there wasn't any way to detect this situation from > > QEMU though ? > > You can check that and give a warning (check that CR4.VMXE=1 but no > other live migration state was transferred). However, without live > migration support in the kernel and in QEMU you cannot start VMs *for > the entire future life of the VM* after a live migration. So even if we > implemented that kind of blocker, it would fail even if no VM has been > started, as long as the kvm_intel module is loaded on migration. That > would be no different in practice from what we have now. Ahh, I was mis-understanding that it only applied to L2 VMs that existed at the time the migration is performed. Given that it breaks all future possibility of launching an L2 VM, this strict blocker does make more sense. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 13:44 ` Daniel P. Berrangé @ 2019-01-18 14:09 ` Mark Mielke 2019-01-18 14:48 ` Daniel P. Berrangé 0 siblings, 1 reply; 15+ messages in thread From: Mark Mielke @ 2019-01-18 14:09 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 8:44 AM Daniel P. Berrangé <berrange@redhat.com> wrote: > On Fri, Jan 18, 2019 at 01:57:31PM +0100, Paolo Bonzini wrote: > > On 18/01/19 11:21, Daniel P. Berrangé wrote: > > > Yes, this is exactly why I said we should make the migration blocker > > > be conditional on any L2 guest having been started. I vaguely recall > > > someone saying there wasn't any way to detect this situation from > > > QEMU though ? > > You can check that and give a warning (check that CR4.VMXE=1 but no > > other live migration state was transferred). However, without live > > migration support in the kernel and in QEMU you cannot start VMs *for > > the entire future life of the VM* after a live migration. So even if we > > implemented that kind of blocker, it would fail even if no VM has been > > started, as long as the kvm_intel module is loaded on migration. That > > would be no different in practice from what we have now. > Ahh, I was mis-understanding that it only applied to L2 VMs that > existed at the time the migration is performed. Given that it breaks > all future possibility of launching an L2 VM, this strict blocker does > make more sense. > To explain my use case more fully: Right now all guests are on Linux 4.14.79+ hypervisors, with Qemu 2.12.1+. I understand the value of this feature, and I need to get all the guests to Linux 4.19.16+ hypervisors, with Qemu 3.2 (once it is available). As documented, and as best as I can read from the source code and this mailing list, the recommended solution would be for me to upgrade Linux and Qemu on existing hypervisors, and then restart the entire environment. After it comes up with the new kernel, and the new qemu, everything will be "correct". This model will not be well received by users. We have long running operations of all sorts, and restarting them all at the same time is a very big problem. I would like to heal this system over time. The first stage will include introducing new Linux 4.19.16+ hypervisors, and migrating the guests to these machines carefully and opportunistically. Carefully means that machines that will use L2 guests will need to be restarted in discussion with the users(or their hypervisors avoided from this exercise), but most (80%+) of machines that will never launch an L2 guest can migrate live with low risk (at least according to our experience to date). This will allow existing hypervisors to be freed up so that they too can be upgraded to Linux 4.19.16+. The second stage will include upgrading to Qemu 3.2 once it is available and demonstrated to be stable for our use cases. However, I will need to be able to live migrate most (80%+) systems from Qemu 2.12.1+ to Qemu 3.2. We would again handle the machines with L2 guests with care. If Qemu 3.2 will be ready sooner (weeks?), I would wait before migration, and combine the above two steps such the new hypervisors would have both Linux 4.19.16+ and Qemu 3.2. But, if Qemu 3.2 is months away, I would keep it as two steps. A large number of systems would be shut down and replaced as part of normal processes, which would automatically heal these. The long running machines would not be healed, but they would be no worse than they are today. To achieve this, I need a path to live migrate from Qemu 2.12.1+ with VMX bit set in the guest, to Qemu 3.2. Further complicating things is that we are using OpenStack, so options for tweaking flags on a case by case basis would be limited or non-existent. I'm totally fine with the understanding that any machine not restarted is still broken under Qemu 3.2, just as it was broken under Qemu 2.12. New machines will be correct, and the broken machines can be fixed opportunistically and in discussion with the users. And, we don't need a system-wide restart of the whole cluster to deploy Qemu 3.2. -- Mark Mielke <mark.mielke@gmail.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? 2019-01-18 14:09 ` Mark Mielke @ 2019-01-18 14:48 ` Daniel P. Berrangé 0 siblings, 0 replies; 15+ messages in thread From: Daniel P. Berrangé @ 2019-01-18 14:48 UTC (permalink / raw) To: Mark Mielke Cc: Paolo Bonzini, Dr. David Alan Gilbert, qemu-devel, christian.ehrhardt On Fri, Jan 18, 2019 at 09:09:31AM -0500, Mark Mielke wrote: > On Fri, Jan 18, 2019 at 8:44 AM Daniel P. Berrangé <berrange@redhat.com> > wrote: > > > On Fri, Jan 18, 2019 at 01:57:31PM +0100, Paolo Bonzini wrote: > > > On 18/01/19 11:21, Daniel P. Berrangé wrote: > > > > Yes, this is exactly why I said we should make the migration blocker > > > > be conditional on any L2 guest having been started. I vaguely recall > > > > someone saying there wasn't any way to detect this situation from > > > > QEMU though ? > > > You can check that and give a warning (check that CR4.VMXE=1 but no > > > other live migration state was transferred). However, without live > > > migration support in the kernel and in QEMU you cannot start VMs *for > > > the entire future life of the VM* after a live migration. So even if we > > > implemented that kind of blocker, it would fail even if no VM has been > > > started, as long as the kvm_intel module is loaded on migration. That > > > would be no different in practice from what we have now. > > Ahh, I was mis-understanding that it only applied to L2 VMs that > > existed at the time the migration is performed. Given that it breaks > > all future possibility of launching an L2 VM, this strict blocker does > > make more sense. > > > > To explain my use case more fully: > > Right now all guests are on Linux 4.14.79+ hypervisors, with Qemu 2.12.1+. > > I understand the value of this feature, and I need to get all the guests to > Linux 4.19.16+ hypervisors, with Qemu 3.2 (once it is available). > > As documented, and as best as I can read from the source code and this > mailing list, the recommended solution would be for me to upgrade Linux and > Qemu on existing hypervisors, and then restart the entire environment. > After it comes up with the new kernel, and the new qemu, everything will be > "correct". The recommendation depends on whether you actually need to run L2 guests or not. For people who don't need L2 guests, the recommendation solution is to simply disable the vmx flag in the guest CPU model and reboot the affected L1 guests. Only people who need to run L2 guests would need to upgrade the software stack to get live migration working. That said there is a workaround I'll mention below... > The first stage will include introducing new Linux 4.19.16+ hypervisors, > and migrating the guests to these machines carefully and opportunistically. > Carefully means that machines that will use L2 guests will need to be > restarted in discussion with the users(or their hypervisors avoided from > this exercise), but most (80%+) of machines that will never launch an L2 > guest can migrate live with low risk (at least according to our experience > to date). This will allow existing hypervisors to be freed up so that they > too can be upgraded to Linux 4.19.16+. > > The second stage will include upgrading to Qemu 3.2 once it is available > and demonstrated to be stable for our use cases. However, I will need to be > able to live migrate most (80%+) systems from Qemu 2.12.1+ to Qemu 3.2. We > would again handle the machines with L2 guests with care. L1 guests with running L2 guests can not be live migrated from 2.12 no matter what as both running guests will fail, and the L1 guest will also be unable to launch any new guests. You would need to boot a new L1 guest on a different host, then live migrate all the L2 guests to this new L1 guest. Since the L1 guest would presumably then be empty there'd no longer be a need to live migrate the L1 guest, and it can simply be powered off. IOW, this is pretty similar situation to doing physical hardware replacement of virt hosts. The serious pain point that I see is for people who have L1 guests which have VMX enabled, but which were not, and will never be, used for running L2 VMs. They can't live migrate thir L1 guests, so they'd need to restart their application workloads which is very unpleasant. I wonder if there's a case to be made to allow the QEMU migration blocker to be overridden in this case. Libvirt has a VIR_MIGRATE_UNSAFE flag that mgmt apps be set to tell libvirt to do the migration even if it believes the config to be unsafe. Libvirt has some migration restrictions around valid disk cache modes where this is used if libvirt made the wrong decision. There's no way for us to plumb it into the QEMU migration blocker for VMX though, so it can't currently be used in this scenario > If Qemu 3.2 will be ready sooner (weeks?), I would wait before migration, > and combine the above two steps such the new hypervisors would have both > Linux 4.19.16+ and Qemu 3.2. But, if Qemu 3.2 is months away, I would keep > it as two steps. NB the next release will be 4.0, as we switched policy to increment the major release number at the start of each year. QEMU follows a fixed schedule of 3-releases a year, which gives 4 month gaps. So assuming no slippage you can expect 4.0 in late April. > To achieve this, I need a path to live migrate from Qemu 2.12.1+ with VMX > bit set in the guest, to Qemu 3.2. Further complicating things is that we > are using OpenStack, so options for tweaking flags on a case by case basis > would be limited or non-existent. > > I'm totally fine with the understanding that any machine not restarted is > still broken under Qemu 3.2, just as it was broken under Qemu 2.12. New > machines will be correct, and the broken machines can be fixed > opportunistically and in discussion with the users. > > And, we don't need a system-wide restart of the whole cluster to deploy > Qemu 3.2. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2019-01-22 22:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-18 5:32 [Qemu-devel] Live migration from Qemu 2.12 hosts to Qemu 3.2 hosts, with VMX flag enabled in the guest? Mark Mielke 2019-01-18 6:18 ` Christian Ehrhardt 2019-01-22 7:20 ` Like Xu 2019-01-18 10:02 ` Dr. David Alan Gilbert 2019-01-18 10:11 ` Paolo Bonzini 2019-01-18 10:16 ` Dr. David Alan Gilbert 2019-01-18 10:21 ` Daniel P. Berrangé 2019-01-18 12:57 ` Paolo Bonzini 2019-01-18 13:41 ` Mark Mielke 2019-01-18 15:25 ` Paolo Bonzini 2019-01-18 19:31 ` Dr. David Alan Gilbert 2019-01-22 22:58 ` Mark Mielke 2019-01-18 13:44 ` Daniel P. Berrangé 2019-01-18 14:09 ` Mark Mielke 2019-01-18 14:48 ` Daniel P. Berrangé
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.