linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] KVM: nVMX: Map enlightened VMCS upon restore when possible
Date: Wed, 05 May 2021 11:33:09 +0300	[thread overview]
Message-ID: <fa744382453d7a196812e88fe9ae9e842c903e13.camel@redhat.com> (raw)
In-Reply-To: <20210503150854.1144255-5-vkuznets@redhat.com>

On Mon, 2021-05-03 at 17:08 +0200, Vitaly Kuznetsov wrote:
> It now looks like a bad idea to not restore eVMCS mapping directly from
> vmx_set_nested_state(). The restoration path now depends on whether KVM
> will continue executing L2 (vmx_get_nested_state_pages()) or will have to
> exit to L1 (nested_vmx_vmexit()), this complicates error propagation and
> diverges too much from the 'native' path when 'nested.current_vmptr' is
> set directly from vmx_get_nested_state_pages().
> 
> The existing solution postponing eVMCS mapping also seems to be fragile.
> In multiple places the code checks whether 'vmx->nested.hv_evmcs' is not
> NULL to distinguish between eVMCS and non-eVMCS cases. All these checks
> are 'incomplete' as we have a weird 'eVMCS is in use but not yet mapped'
> state.
> 
> Also, in case vmx_get_nested_state() is called right after
> vmx_set_nested_state() without executing the guest first, the resulting
> state is going to be incorrect as 'KVM_STATE_NESTED_EVMCS' flag will be
> missing.
> 
> Fix all these issues by making eVMCS restoration path closer to its
> 'native' sibling by putting eVMCS GPA to 'struct kvm_vmx_nested_state_hdr'.
> To avoid ABI incompatibility, do not introduce a new flag and keep the
> original eVMCS mapping path through KVM_REQ_GET_NESTED_STATE_PAGES in
> place. To distinguish between 'new' and 'old' formats consider eVMCS
> GPA == 0 as an unset GPA (thus forcing KVM_REQ_GET_NESTED_STATE_PAGES
> path). While technically possible, it seems to be an extremely unlikely
> case.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/uapi/asm/kvm.h |  2 ++
>  arch/x86/kvm/vmx/nested.c       | 27 +++++++++++++++++++++------
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 0662f644aad9..3845977b739e 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -441,6 +441,8 @@ struct kvm_vmx_nested_state_hdr {
>  
>  	__u32 flags;
>  	__u64 preemption_timer_deadline;
> +
> +	__u64 evmcs_pa;
>  };
>  
>  struct kvm_svm_nested_state_data {
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 37fdc34f7afc..4261cf4755c8 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6019,6 +6019,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
>  		.hdr.vmx.vmxon_pa = -1ull,
>  		.hdr.vmx.vmcs12_pa = -1ull,
>  		.hdr.vmx.preemption_timer_deadline = 0,
> +		.hdr.vmx.evmcs_pa = -1ull,
>  	};
>  	struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
>  		&user_kvm_nested_state->data.vmx[0];
> @@ -6037,8 +6038,10 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
>  		if (vmx_has_valid_vmcs12(vcpu)) {
>  			kvm_state.size += sizeof(user_vmx_nested_state->vmcs12);
>  
> -			if (vmx->nested.hv_evmcs)
> +			if (vmx->nested.hv_evmcs) {
>  				kvm_state.flags |= KVM_STATE_NESTED_EVMCS;
> +				kvm_state.hdr.vmx.evmcs_pa = vmx->nested.hv_evmcs_vmptr;
> +			}
>  
>  			if (is_guest_mode(vcpu) &&
>  			    nested_cpu_has_shadow_vmcs(vmcs12) &&
> @@ -6230,13 +6233,25 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>  
>  		set_current_vmptr(vmx, kvm_state->hdr.vmx.vmcs12_pa);
>  	} else if (kvm_state->flags & KVM_STATE_NESTED_EVMCS) {
> +		u64 evmcs_gpa = kvm_state->hdr.vmx.evmcs_pa;
> +
>  		/*
> -		 * nested_vmx_handle_enlightened_vmptrld() cannot be called
> -		 * directly from here as HV_X64_MSR_VP_ASSIST_PAGE may not be
> -		 * restored yet. EVMCS will be mapped from
> -		 * nested_get_vmcs12_pages().
> +		 * EVMCS GPA == 0 most likely indicates that the migration data is
> +		 * coming from an older KVM which doesn't support 'evmcs_pa' in
> +		 * 'struct kvm_vmx_nested_state_hdr'.
>  		 */
> -		kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> +		if (evmcs_gpa && (evmcs_gpa != -1ull) &&
> +		    (__nested_vmx_handle_enlightened_vmptrld(vcpu, evmcs_gpa, false) !=
> +		     EVMPTRLD_SUCCEEDED)) {
> +			return -EINVAL;
> +		} else if (!evmcs_gpa) {
> +			/*
> +			 * EVMCS GPA can't be acquired from VP assist page here because
> +			 * HV_X64_MSR_VP_ASSIST_PAGE may not be restored yet.
> +			 * EVMCS will be mapped from nested_get_evmcs_page().
> +			 */
> +			kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
> +		}
>  	} else {
>  		return -EINVAL;
>  	}

Hi everyone!

Let me expalin my concern about this patch and also ask if I understand this correctly.

In a nutshell if I understand this correctly, we are not allowed to access any guest
memory while setting the nested state. 

Now, if I understand correctly as well, the reason for the above,
is that the userspace is allowed to set the nested state first, then fiddle with
the KVM memslots, maybe even update the guest memory and only later do the KVM_RUN ioctl,

And so this is the major reason why the KVM_REQ_GET_NESTED_STATE_PAGES
request exists in the first place.

If that is correct I assume that we either have to keep loading the EVMCS page on
KVM_REQ_GET_NESTED_STATE_PAGES request, or we want to include the EVMCS itself
in the migration state in addition to its physical address, similar to how we treat
the VMCS12 and the VMCB12.

I personally tinkered with qemu to try and reproduce this situation
and in my tests I wasn't able to make it update the memory
map after the load of the nested state but prior to KVM_RUN
but neither I wasn't able to prove that this can't happen.

In addition to that I don't know how qemu behaves when it does 
guest ram post-copy because so far I haven't tried to tinker with it.

Finally other userspace hypervisors exist, and they might rely on assumption
as well.

Looking forward for any comments,
Best regards,
	Maxim Levitsky




  parent reply	other threads:[~2021-05-05  8:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-03 15:08 [PATCH 0/4] KVM: nVMX: Fix migration of nested guests when eVMCS is in use Vitaly Kuznetsov
2021-05-03 15:08 ` [PATCH 1/4] KVM: nVMX: Always make an attempt to map eVMCS after migration Vitaly Kuznetsov
2021-05-05  8:22   ` Maxim Levitsky
2021-05-05  8:39     ` Vitaly Kuznetsov
2021-05-05  9:17       ` Maxim Levitsky
2021-05-05  9:23         ` Vitaly Kuznetsov
2021-05-03 15:08 ` [PATCH 2/4] KVM: nVMX: Properly pad 'struct kvm_vmx_nested_state_hdr' Vitaly Kuznetsov
2021-05-05  8:24   ` Maxim Levitsky
2021-05-05 17:34     ` Sean Christopherson
2021-05-03 15:08 ` [PATCH 3/4] KVM: nVMX: Introduce __nested_vmx_handle_enlightened_vmptrld() Vitaly Kuznetsov
2021-05-05  8:24   ` Maxim Levitsky
2021-05-03 15:08 ` [PATCH 4/4] KVM: nVMX: Map enlightened VMCS upon restore when possible Vitaly Kuznetsov
2021-05-03 15:53   ` Paolo Bonzini
2021-05-04  8:02     ` Vitaly Kuznetsov
2021-05-04  8:06       ` Paolo Bonzini
2021-05-05  8:33   ` Maxim Levitsky [this message]
2021-05-05  9:17     ` Vitaly Kuznetsov
2021-05-03 15:43 ` [PATCH 0/4] KVM: nVMX: Fix migration of nested guests when eVMCS is in use Paolo Bonzini
2021-05-03 15:52   ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa744382453d7a196812e88fe9ae9e842c903e13.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).