From mboxrd@z Thu Jan 1 00:00:00 1970 From: Liran Alon Subject: Re: [QEMU PATCH v2 0/2]: KVM: i386: Add support for save and restore nested state Date: Fri, 2 Nov 2018 14:59:20 +0200 Message-ID: <95850543-FE41-404F-9775-23E792F6428E@oracle.com> References: <20181102034649.43559-1-liran.alon@oracle.com> <12c26c34-8dd1-a442-7826-86b93ff978f8@redhat.com> Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mtosatti@redhat.com, dgilbert@redhat.com, qemu-devel@nongnu.org, jmattson@google.com, rth@twiddle.net To: Paolo Bonzini Return-path: In-Reply-To: <12c26c34-8dd1-a442-7826-86b93ff978f8@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel2=m.gmane.org@nongnu.org Sender: "Qemu-devel" List-Id: kvm.vger.kernel.org > On 2 Nov 2018, at 11:40, Paolo Bonzini wrote: >=20 > On 02/11/2018 04:46, Liran Alon wrote: >>> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson = wrote: >>=20 >>>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert = wrote: >>=20 >>>> So if I have matching host kernels it should always work? >>>> What happens if I upgrade the source kernel to increase it's = maximum >>>> nested size, can I force it to keep things small for some VMs? >>=20 >>> Any change to the format of the nested state should be gated by a >>> KVM_CAP set by userspace. (Unlike, say, how the >>> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events = state >>> in commit f077825a8758d.) KVM has traditionally been quite bad about >>> maintaining backwards compatibility, but I hope the community is = more >>> cognizant of the issues now. >>=20 >>> As a cloud provider, one would only enable the new capability from >>> userspace once all hosts in the pool have a kernel that supports it. >>> During the transition, the capability would not be enabled on the >>> hosts with a new kernel, and these hosts would continue to provide >>> nested state that could be consumed by hosts running the older = kernel. >>=20 >> Hmm this makes sense. >>=20 >> This means though that the patch I have submitted here isn't good = enough. >> My patch currently assumes that when it attempts to get nested state = from KVM, >> QEMU should always set nested_state->size to max size supported by = KVM as received >> from kvm_check_extension(s, KVM_CAP_NESTED_STATE); >> (See kvm_get_nested_state() introduced on my patch). >> This indeed won't allow migration from host with new KVM to host with = old KVM if >> nested_state size was enlarged between these KVM versions. >> Which is obviously an issue. >=20 > Actually I think this is okay, because unlike the "new" capability was > enabled, KVM would always reduce nested_state->size to a value that is > compatible with current kernels. >=20 >> But on second thought, I'm not sure that this is the right approach = as-well. >> We don't really want the used version of nested_state to be = determined on kvm_init(). >> * On source QEMU, we actually want to determine it when preparing for = migration based >> on to the support given by our destination host. If it's an old host, = we would like to >> save an old version nested_state and if it's a new host, we will like = to save our newest >> supported nested_state. >=20 > No, that's wrong because it would lead to losing state. If the source > QEMU supports more state than the destination QEMU, and the current VM > state needs to transmit it for migration to be _correct_, then = migration > to that destination QEMU must fail. >=20 > In particular, enabling the new KVM capability needs to be gated by a > new machine type and/or -cpu flag, if migration compatibility is = needed. > (In particular, this is one reason why I haven't considered this = series > for 3.1. Right now, migration of nested hypervisors is completely > busted but if we make it "almost" work, pre-3.1 machine types would = not > ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD. Therefore, > it's better for users if we wait for one release more, and add support > for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same = time). >=20 > Personally, I would like to say that, starting from QEMU 3.2, enabling > nested VMX requires a 4.20 kernel. It's a bit bold, but I think it's = a > good way to keep some sanity. Any opinions on that? >=20 > Paolo If I understand you correctly, you wish that nested_state version used = will be tied to the machine-type used to launch the guest. The reason I am not fond of this approach is that it means that once a = VM is launched with some machine-type, it=E2=80=99s nested_state will be forever saved with a specific version. Even if this VM has = already migrated to a host with newer kernel which knows how to save more accurate state for the next migration. The scheme I have described below should avoid this kind of issues while = still preserving the ability to migrate to older hosts. Note that I believe it=E2=80=99s in the responsibility of the mgmt-layer = to decide the risk of migrating from a new host to an old host in case the old host cannot receive the full nested_state that the new = host is capable of generating. I don=E2=80=99t think migration should fail in this case like happens currently with the patch I have = submitted. So in general I still agree with Jim's approach, but I=E2=80=99m not = convinced we should use KVM_CAP for this. (See details on proposed scheme below). What are the disadvantages you see of using the proposed scheme below? Why using KVM_CAP is better? BTW, I agree with the rest of the group here that it=E2=80=99s too = aggressive to make QEMU 3.2 force having kernel 4.20 for using nVMX. This will hurt common nVMX workloads that don=E2=80=99t care about the = ability to migrate. -Liran >=20 >> Therefore, I don't think that we want this versioning to be based on = KVM_CAP at all. >> It seems that we would want the process to behave as follows: >> 1) Mgmt-layer at dest queries dest host max supported nested_state = size. >> (Which should be returned from = kvm_check_extension(KVM_CAP_NESTED_STATE)) >> 2) Mgmt-layer at source initiate migration to dest with requesting = QEMU to send nested_state=20 >> matching dest max supported nested_state size. >> When saving nested state using KVM_GET_NESTED_STATE IOCTL, QEMU = will specify in nested_state->size >> the *requested* size to be saved and KVM should be able to save = only the information which matches >> the version that worked with that size. >> 3) After some sanity checks on received migration stream, dest host = use KVM_SET_NESTED_STATE IOCTL. >> This IOCTL should deduce which information it should deploy based = on given nested_state->size. >>=20 >> This also makes me wonder if it's not just nicer to use = nested_state->flags to specify which >> information is actually present on nested_state instead of managing = versioning with nested_state->size. >>=20 >> What are your opinions on this? >>=20 >> -Liran >>=20 >=20 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46254) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gIZ33-0006KW-QT for qemu-devel@nongnu.org; Fri, 02 Nov 2018 08:59:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gIZ2y-00009t-JU for qemu-devel@nongnu.org; Fri, 02 Nov 2018 08:59:37 -0400 Received: from aserp2120.oracle.com ([141.146.126.78]:59938) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gIZ2v-000087-F8 for qemu-devel@nongnu.org; Fri, 02 Nov 2018 08:59:31 -0400 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.1 \(3445.4.7\)) From: Liran Alon In-Reply-To: <12c26c34-8dd1-a442-7826-86b93ff978f8@redhat.com> Date: Fri, 2 Nov 2018 14:59:20 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <95850543-FE41-404F-9775-23E792F6428E@oracle.com> References: <20181102034649.43559-1-liran.alon@oracle.com> <12c26c34-8dd1-a442-7826-86b93ff978f8@redhat.com> Subject: Re: [Qemu-devel] [QEMU PATCH v2 0/2]: KVM: i386: Add support for save and restore nested state List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: jmattson@google.com, dgilbert@redhat.com, ehabkost@redhat.com, kvm@vger.kernel.org, mtosatti@redhat.com, rth@twiddle.net, qemu-devel@nongnu.org > On 2 Nov 2018, at 11:40, Paolo Bonzini wrote: >=20 > On 02/11/2018 04:46, Liran Alon wrote: >>> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson = wrote: >>=20 >>>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert = wrote: >>=20 >>>> So if I have matching host kernels it should always work? >>>> What happens if I upgrade the source kernel to increase it's = maximum >>>> nested size, can I force it to keep things small for some VMs? >>=20 >>> Any change to the format of the nested state should be gated by a >>> KVM_CAP set by userspace. (Unlike, say, how the >>> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events = state >>> in commit f077825a8758d.) KVM has traditionally been quite bad about >>> maintaining backwards compatibility, but I hope the community is = more >>> cognizant of the issues now. >>=20 >>> As a cloud provider, one would only enable the new capability from >>> userspace once all hosts in the pool have a kernel that supports it. >>> During the transition, the capability would not be enabled on the >>> hosts with a new kernel, and these hosts would continue to provide >>> nested state that could be consumed by hosts running the older = kernel. >>=20 >> Hmm this makes sense. >>=20 >> This means though that the patch I have submitted here isn't good = enough. >> My patch currently assumes that when it attempts to get nested state = from KVM, >> QEMU should always set nested_state->size to max size supported by = KVM as received >> from kvm_check_extension(s, KVM_CAP_NESTED_STATE); >> (See kvm_get_nested_state() introduced on my patch). >> This indeed won't allow migration from host with new KVM to host with = old KVM if >> nested_state size was enlarged between these KVM versions. >> Which is obviously an issue. >=20 > Actually I think this is okay, because unlike the "new" capability was > enabled, KVM would always reduce nested_state->size to a value that is > compatible with current kernels. >=20 >> But on second thought, I'm not sure that this is the right approach = as-well. >> We don't really want the used version of nested_state to be = determined on kvm_init(). >> * On source QEMU, we actually want to determine it when preparing for = migration based >> on to the support given by our destination host. If it's an old host, = we would like to >> save an old version nested_state and if it's a new host, we will like = to save our newest >> supported nested_state. >=20 > No, that's wrong because it would lead to losing state. If the source > QEMU supports more state than the destination QEMU, and the current VM > state needs to transmit it for migration to be _correct_, then = migration > to that destination QEMU must fail. >=20 > In particular, enabling the new KVM capability needs to be gated by a > new machine type and/or -cpu flag, if migration compatibility is = needed. > (In particular, this is one reason why I haven't considered this = series > for 3.1. Right now, migration of nested hypervisors is completely > busted but if we make it "almost" work, pre-3.1 machine types would = not > ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD. Therefore, > it's better for users if we wait for one release more, and add support > for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same = time). >=20 > Personally, I would like to say that, starting from QEMU 3.2, enabling > nested VMX requires a 4.20 kernel. It's a bit bold, but I think it's = a > good way to keep some sanity. Any opinions on that? >=20 > Paolo If I understand you correctly, you wish that nested_state version used = will be tied to the machine-type used to launch the guest. The reason I am not fond of this approach is that it means that once a = VM is launched with some machine-type, it=E2=80=99s nested_state will be forever saved with a specific version. Even if this VM has = already migrated to a host with newer kernel which knows how to save more accurate state for the next migration. The scheme I have described below should avoid this kind of issues while = still preserving the ability to migrate to older hosts. Note that I believe it=E2=80=99s in the responsibility of the mgmt-layer = to decide the risk of migrating from a new host to an old host in case the old host cannot receive the full nested_state that the new = host is capable of generating. I don=E2=80=99t think migration should fail in this case like happens currently with the patch I have = submitted. So in general I still agree with Jim's approach, but I=E2=80=99m not = convinced we should use KVM_CAP for this. (See details on proposed scheme below). What are the disadvantages you see of using the proposed scheme below? Why using KVM_CAP is better? BTW, I agree with the rest of the group here that it=E2=80=99s too = aggressive to make QEMU 3.2 force having kernel 4.20 for using nVMX. This will hurt common nVMX workloads that don=E2=80=99t care about the = ability to migrate. -Liran >=20 >> Therefore, I don't think that we want this versioning to be based on = KVM_CAP at all. >> It seems that we would want the process to behave as follows: >> 1) Mgmt-layer at dest queries dest host max supported nested_state = size. >> (Which should be returned from = kvm_check_extension(KVM_CAP_NESTED_STATE)) >> 2) Mgmt-layer at source initiate migration to dest with requesting = QEMU to send nested_state=20 >> matching dest max supported nested_state size. >> When saving nested state using KVM_GET_NESTED_STATE IOCTL, QEMU = will specify in nested_state->size >> the *requested* size to be saved and KVM should be able to save = only the information which matches >> the version that worked with that size. >> 3) After some sanity checks on received migration stream, dest host = use KVM_SET_NESTED_STATE IOCTL. >> This IOCTL should deduce which information it should deploy based = on given nested_state->size. >>=20 >> This also makes me wonder if it's not just nicer to use = nested_state->flags to specify which >> information is actually present on nested_state instead of managing = versioning with nested_state->size. >>=20 >> What are your opinions on this? >>=20 >> -Liran >>=20 >=20