kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Maxim Levitsky <mlevitsk@redhat.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 1/7] KVM: nVMX: Introduce nested_evmcs_is_used()
Date: Thu, 27 May 2021 09:54:16 +0200	[thread overview]
Message-ID: <87cztc7gt3.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <2a3eae6089958956f707dbc55d1d2a410edb6983.camel@redhat.com>

Maxim Levitsky <mlevitsk@redhat.com> writes:

> On Mon, 2021-05-24 at 14:35 +0200, Vitaly Kuznetsov wrote:
>> Maxim Levitsky <mlevitsk@redhat.com> writes:
>> 
>> > On Mon, 2021-05-17 at 15:50 +0200, Vitaly Kuznetsov wrote:
>> > > Unlike regular set_current_vmptr(), nested_vmx_handle_enlightened_vmptrld()
>> > > can not be called directly from vmx_set_nested_state() as KVM may not have
>> > > all the information yet (e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not be
>> > > restored yet). Enlightened VMCS is mapped later while getting nested state
>> > > pages. In the meantime, vmx->nested.hv_evmcs remains NULL and using it
>> > > for various checks is incorrect. In particular, if KVM_GET_NESTED_STATE is
>> > > called right after KVM_SET_NESTED_STATE, KVM_STATE_NESTED_EVMCS flag in the
>> > > resulting state will be unset (and such state will later fail to load).
>> > > 
>> > > Introduce nested_evmcs_is_used() and use 'is_guest_mode(vcpu) &&
>> > > vmx->nested.current_vmptr == -1ull' check to detect not-yet-mapped eVMCS
>> > > after restore.
>> > > 
>> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> > > ---
>> > >  arch/x86/kvm/vmx/nested.c | 31 ++++++++++++++++++++++++++-----
>> > >  1 file changed, 26 insertions(+), 5 deletions(-)
>> > > 
>> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> > > index 6058a65a6ede..3080e00c8f90 100644
>> > > --- a/arch/x86/kvm/vmx/nested.c
>> > > +++ b/arch/x86/kvm/vmx/nested.c
>> > > @@ -141,6 +141,27 @@ static void init_vmcs_shadow_fields(void)
>> > >  	max_shadow_read_write_fields = j;
>> > >  }
>> > >  
>> > > +static inline bool nested_evmcs_is_used(struct vcpu_vmx *vmx)
>> > > +{
>> > > +	struct kvm_vcpu *vcpu = &vmx->vcpu;
>> > > +
>> > > +	if (vmx->nested.hv_evmcs)
>> > > +		return true;
>> > > +
>> > > +	/*
>> > > +	 * After KVM_SET_NESTED_STATE, enlightened VMCS is mapped during
>> > > +	 * KVM_REQ_GET_NESTED_STATE_PAGES handling and until the request is
>> > > +	 * processed vmx->nested.hv_evmcs is NULL. It is, however, possible to
>> > > +	 * detect such state by checking 'nested.current_vmptr == -1ull' when
>> > > +	 * vCPU is in guest mode, it is only possible with eVMCS.
>> > > +	 */
>> > > +	if (unlikely(vmx->nested.enlightened_vmcs_enabled && is_guest_mode(vcpu) &&
>> > > +		     (vmx->nested.current_vmptr == -1ull)))
>> > > +		return true;
>> > > +
>> > > +	return false;
>> > > +}
>> > 
>> > I think that this is a valid way to solve the issue,
>> > but it feels like there might be a better way.
>> > I don't mind though to accept this patch as is.
>> > 
>> > So here are my 2 cents about this:
>> > 
>> > First of all after studying how evmcs works I take my words back
>> > about needing to migrate its contents. 
>> > 
>> > It is indeed enough to migrate its physical address, 
>> > or maybe even just a flag that evmcs is loaded
>> > (and to my surprise we already do this - KVM_STATE_NESTED_EVMCS)
>> > 
>> > So how about just having a boolean flag that indicates that evmcs is in use, 
>> > but doesn't imply that we know its address or that it is mapped 
>> > to host address space, something like 'vmx->nested.enlightened_vmcs_loaded'
>> > 
>> > On migration that flag saved and restored as the KVM_STATE_NESTED_EVMCS,
>> > otherwise it set when we load an evmcs and cleared when it is released.
>> > 
>> > Then as far as I can see we can use this flag in nested_evmcs_is_used
>> > since all its callers don't touch evmcs, thus don't need it to be
>> > mapped.
>> > 
>> > What do you think?
>
>
>
>> > 
>> 
>> First, we need to be compatible with older KVMs which don't have the
>> flag and this is problematic: currently, we always expect vmcs12 to
>> carry valid contents. This is challenging.
>
> All right, I understand this can be an issue!
>
> If the userspace doesn't set the KVM_STATE_NESTED_EVMCS
> but has a valid EVMCS as later indicated enabling it in the HV
> assist page, we can just use the logic that this patch uses but use it 
> to set vmx->nested.enlightened_vmcs_loaded flag or whatever
> we decide to name it.
> Later we can even deprecate and disable this with a new KVM cap.
>
>
> BTW, I like Paolo's idea of putting this flag into the evmcs_gpa,
> like that
>
> -1 no evmcs
> 0 - evmcs enabled but its gpa not known
> anything else - valid gpa.
>
>
> Also as I said, I am not against this patch either, 
> I am just thinking maybe we can make it a bit better.
>

v3 implements a similar idea (I kept Paolo's 'Suggested-by' though :-)
we set hv_evmcs_vmptr to EVMPTR_MAP_PENDING after migration. I haven't
tried it yet but I thin we can eventually drop
KVM_REQ_GET_NESTED_STATE_PAGES usage.

For now, this series is a bugfix (multiple bugfixes, actually) so I'd
like to get in and not try to make everything perfect regarding eVMCS
:-) I'll certainly take a look at other possible improvements later.

>
>> 
>> Second, vCPU can be migrated in three different states:
>> 1) While L2 was running ('true' nested state is in VMCS02)
>> 2) While L1 was running ('true' nested state is in eVMCS)
>> 3) Right after an exit from L2 to L1 was forced
>> ('need_vmcs12_to_shadow_sync = true') ('true' nested state is in
>> VMCS12).
>
> Yes and this was quite difficult thing to understand
> when I was trying to figure out how this code works.
>
> Also you can add another intersting state:
>
> 4) Right after emulating vmlauch/vmresume but before
> the actual entry to the nested guest (aka nested_run_pending=true)
>

Honestly, I haven't took a closer look at this state. Do you envision
specific issues? Can we actually try to serve KVM_GET_NESTED_STATE in
between setting 'nested_run_pending=true' and an actual attempt to enter
L2?

>
>
>> 
>> The current solution is to always use VMCS12 as a container to transfer
>> the state and conceptually, it is at least easier to understand.
>> 
>> We can, indeed, transfer eVMCS (or VMCS12) in case 2) through guest
>> memory and I even tried that but that was making the code more complex
>> so eventually I gave up and decided to preserve the 'always use VMCS12
>> as a container' status quo.
>
>
> My only point of concern is that it feels like it is wrong to update eVMCS
> when not doing a nested vmexit, because then the eVMCS is owned by the
> L1 hypervisor.

I see your concern and ideally we wouldn't have to touch it.

> At least not the fields which aren't supposed to be updated by us.
>
>
> This is a rough code draft of what I had in mind (not tested).
> To me this seems reasonable but I do agree that there is
> some complexety tradeoffs involved.
>
> About the compatibitly it can be said that:
>
>
> Case 1:
> Both old and new kernels will send/recive up to date vmcs12,
> while evcms is not up to date, and its contents aren't even defined
> (since L2 runs).
>
> Case 2:
> Old kernel will send vmcb12, with partial changes that L1 already
> made to evmcs, and latest state of evmcs with all changes
> in the guest memory.
>
> But these changes will be discarded on the receiving side, 
> since once L1 asks us to enter L2, we will reload all the state from eVMCS,
> (at least the state that is marked as dirty, which means differ
> from vmcs12 as it was on the last nested vmexit)
>
> New kernel will always send the vmcb12 as it was on the last vmexit,
> a bit older version but even a more consistent one.
>
> But this doesn't matter either as just like in case of the old kernel, 
> the vmcs12 will be updated from evmcs as soon as we do another L2 entry.
>
> So while in this case we send 'more stale' vmcb12, it doesn't
> really matter as it is stale anyway and will be reloaded from
> evmcs.
>
> Case 3:
> Old kernel will send up to date vmcb12 (since L1 didn't had a chance
> to run anyway after nested vmexit). The evmcs will not be up to date
> in the guest memory, but newer kernel can fix this by updating it
> as you did in patch 6.
>
> New kernel will send up to date vmcb12 (same reason) and up to date
> evmcs, so in fact an unchanged target kernel will be able to migrate
> from this state.
>
> So in fact my suggestion would allow to actually migrate to a kernel
> without the fix applied.
> This is even better than I thought.
>
>
> This is a rough draft of the idea:
>
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 6058a65a6ede..98eb7526cae6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -167,15 +167,22 @@ static int nested_vmx_failInvalid(struct kvm_vcpu *vcpu)
>  static int nested_vmx_failValid(struct kvm_vcpu *vcpu,
>  				u32 vm_instruction_error)
>  {
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	vmx_set_rflags(vcpu, (vmx_get_rflags(vcpu)
>  			& ~(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF |
>  			    X86_EFLAGS_SF | X86_EFLAGS_OF))
>  			| X86_EFLAGS_ZF);
>  	get_vmcs12(vcpu)->vm_instruction_error = vm_instruction_error;
> +
>  	/*
>  	 * We don't need to force a shadow sync because
>  	 * VM_INSTRUCTION_ERROR is not shadowed
> +	 * We do need to update the evmcs
>  	 */
> +
> +	if (vmx->nested.hv_evmcs)
> +		vmx->nested.hv_evmcs->vm_instruction_error = vm_instruction_error;
> +
>  	return kvm_skip_emulated_instruction(vcpu);
>  }
>  
> @@ -1962,6 +1969,10 @@ static int copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx)
>  
>  	evmcs->guest_bndcfgs = vmcs12->guest_bndcfgs;
>  
> +	/* All fields are clean */
> +	vmx->nested.hv_evmcs->hv_clean_fields |=
> +		HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
> +
>  	return 0;
>  }
>  
> @@ -2055,16 +2066,7 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
>  void nested_sync_vmcs12_to_shadow(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
> -	if (vmx->nested.hv_evmcs) {
> -		copy_vmcs12_to_enlightened(vmx);
> -		/* All fields are clean */
> -		vmx->nested.hv_evmcs->hv_clean_fields |=
> -			HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
> -	} else {
> -		copy_vmcs12_to_shadow(vmx);
> -	}
> -
> +	copy_vmcs12_to_shadow(vmx);
>  	vmx->nested.need_vmcs12_to_shadow_sync = false;
>  }
>  
> @@ -3437,8 +3439,13 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>  
>  	load_vmcs12_host_state(vcpu, vmcs12);
>  	vmcs12->vm_exit_reason = exit_reason.full;
> -	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
> +
> +	if (enable_shadow_vmcs)
>  		vmx->nested.need_vmcs12_to_shadow_sync = true;
> +
> +	if (vmx->nested.hv_evmcs)
> +		copy_vmcs12_to_enlightened(vmx);
> +
>  	return NVMX_VMENTRY_VMEXIT;
>  }
>  
> @@ -4531,10 +4538,12 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  		kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>  	}
>  
> -	if ((vm_exit_reason != -1) &&
> -	    (enable_shadow_vmcs || vmx->nested.hv_evmcs))
> +	if ((vm_exit_reason != -1) && enable_shadow_vmcs)
>  		vmx->nested.need_vmcs12_to_shadow_sync = true;
>  
> +	if (vmx->nested.hv_evmcs)
> +		copy_vmcs12_to_enlightened(vmx);
> +
>  	/* in case we halted in L2 */
>  	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  
> @@ -6111,12 +6120,8 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
>  		sync_vmcs02_to_vmcs12_rare(vcpu, vmcs12);
>  	} else  {
>  		copy_vmcs02_to_vmcs12_rare(vcpu, get_vmcs12(vcpu));
> -		if (!vmx->nested.need_vmcs12_to_shadow_sync) {
> -			if (vmx->nested.hv_evmcs)
> -				copy_enlightened_to_vmcs12(vmx);
> -			else if (enable_shadow_vmcs)
> -				copy_shadow_to_vmcs12(vmx);
> -		}
> +		if (enable_shadow_vmcs && !vmx->nested.need_vmcs12_to_shadow_sync)
> +			copy_shadow_to_vmcs12(vmx);
>  	}
>  
>  	BUILD_BUG_ON(sizeof(user_vmx_nested_state->vmcs12) < VMCS12_SIZE);
>
>

I don't see why this can't work, the only concern here is that
conceptually, we'll be making eVMCS something different from shadow
vmcs. In any case, let's give this a try when this series fixing real
bugs lands.

-- 
Vitaly


  reply	other threads:[~2021-05-27  7:54 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 13:50 [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 1/7] KVM: nVMX: Introduce nested_evmcs_is_used() Vitaly Kuznetsov
2021-05-24 12:11   ` Maxim Levitsky
2021-05-24 12:35     ` Vitaly Kuznetsov
2021-05-26 14:34       ` Maxim Levitsky
2021-05-27  7:54         ` Vitaly Kuznetsov [this message]
2021-05-27 14:10           ` Maxim Levitsky
2021-05-24 13:54   ` Paolo Bonzini
2021-05-24 14:09     ` Vitaly Kuznetsov
2021-05-24 14:18       ` Paolo Bonzini
2021-05-24 14:37         ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 2/7] KVM: nVMX: Release enlightened VMCS on VMCLEAR Vitaly Kuznetsov
2021-05-24 12:13   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 3/7] KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in vmx_get_nested_state() Vitaly Kuznetsov
2021-05-24 12:26   ` Maxim Levitsky
2021-05-24 13:01     ` Vitaly Kuznetsov
2021-05-24 13:58       ` Paolo Bonzini
2021-05-26 14:44         ` Maxim Levitsky
2021-05-24 13:56   ` Paolo Bonzini
2021-05-24 14:12     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 4/7] KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid() Vitaly Kuznetsov
2021-05-24 12:27   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 5/7] KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02() Vitaly Kuznetsov
2021-05-24 12:34   ` Maxim Levitsky
2021-05-24 13:07     ` Vitaly Kuznetsov
2021-05-17 13:50 ` [PATCH v2 6/7] KVM: nVMX: Request to sync eVMCS from VMCS12 after migration Vitaly Kuznetsov
2021-05-24 12:35   ` Maxim Levitsky
2021-05-17 13:50 ` [PATCH v2 7/7] KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never lost Vitaly Kuznetsov
2021-05-24 12:36   ` Maxim Levitsky
2021-05-24 12:08 ` [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use Maxim Levitsky
2021-05-24 12:44   ` Vitaly Kuznetsov
2021-05-26 14:41     ` Maxim Levitsky
2021-05-27  8:01       ` Vitaly Kuznetsov
2021-05-27 14:11         ` Maxim Levitsky
2021-05-27 14:17           ` Paolo Bonzini
2021-05-24 14:01 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cztc7gt3.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).