All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>,
	kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Cathy Avery <cavery@redhat.com>,
	Emanuele Giuseppe Esposito <eesposit@redhat.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/6] KVM: nSVM: Fix L1 state corruption upon return from SMM
Date: Wed, 07 Jul 2021 13:32:18 +0300	[thread overview]
Message-ID: <db569845400963d057eedf063bba1599cb9ed77a.camel@redhat.com> (raw)
In-Reply-To: <20210628104425.391276-5-vkuznets@redhat.com>

On Mon, 2021-06-28 at 12:44 +0200, Vitaly Kuznetsov wrote:
> VMCB split commit 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the
> nested L2 guest") broke return from SMM when we entered there from guest
> (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
> manifests itself like this:
> 
>   kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
>   kvm_emulate_insn:     0:7ffbb280: 0f aa
>   kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
>   kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
>     nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
>     npt: on
>   kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
>     intercepts: fd44bfeb 0000217f 00000000
>   kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
>   kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
>     200000006 1ab000
>   kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
>     0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
>     error_code 0x00000000
>   kvm_page_fault:       address 1ab000 error_code 6
>   kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
>     int_info 0 int_info_err 0
>   kvm_entry:            vcpu 0, rip 0x7ffbb280
>   kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
>   kvm_emulate_insn:     0:7ffbb280: 0f aa
>   kvm_inj_exception:    #GP (0x0)
> 
> Note: return to L2 succeeded but upon first exit to L1 its RIP points to
> 'RSM' instruction but we're not in SMM.
> 
> The problem appears to be that VMCB01 gets irreversibly destroyed during
> SMM execution. Previously, we used to have 'hsave' VMCB where regular
> (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
> switch to VMCB01 from VMCB02.
> 
> Pre-split (working) flow looked like:
> - SMM is triggered during L2's execution
> - L2's state is pushed to SMRAM
> - nested_svm_vmexit() restores L1's state from 'hsave'
> - SMM -> RSM
> - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
>   pre-SMM (and pre L2 VMRUN) L1's state there
> - L2's state is restored from SMRAM
> - upon first exit L1's state is restored from L1.
> 
> This was always broken with regards to svm_get_nested_state()/
> svm_set_nested_state(): 'hsave' was never a part of what's being
> save and restored so migration happening during SMM triggered from L2 would
> never restore L1's state correctly.
> 
> Post-split flow (broken) looks like:
> - SMM is triggered during L2's execution
> - L2's state is pushed to SMRAM
> - nested_svm_vmexit() switches to VMCB01 from VMCB02
> - SMM -> RSM
> - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
>   is already lost.
> - L2's state is restored from SMRAM
> - upon first exit L1's state is restored from VMCB01 but it is corrupted
>  (reflects the state during 'RSM' execution).
> 
> VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
> and host state so when we switch back to VMCS02 L1's state is intact there.
> 
> To resolve the issue we need to save L1's state somewhere. We could've
> created a third VMCB for SMM but that would require us to modify saved
> state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
> seems appropriate: L0 is free to save any (or none) of L1's state there.
> Currently, KVM does 'none'.
> 
> Note, for nested state migration to succeed, both source and destination
> hypervisors must have the fix. We, however, don't need to create a new
> flag indicating the fact that HSAVE area is now populated as migration
> during SMM triggered from L2 was always broken.
> 
> Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> - RFC: I'm not 100% sure my 'smart' idea to use currently-unused HSAVE area
> is that smart. Also, we don't even seem to check that L1 set it up upon
> nested VMRUN so hypervisors which don't do that may remain broken. A very
> much needed selftest is also missing.
> ---
>  arch/x86/kvm/svm/svm.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b6f85fd19f96..fbf1b352a9bb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4291,6 +4291,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>  static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct kvm_host_map map_save;
>  	int ret;
>  
>  	if (is_guest_mode(vcpu)) {
> @@ -4306,6 +4307,29 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
>  		ret = nested_svm_vmexit(svm);
>  		if (ret)
>  			return ret;
> +
> +		/*
> +		 * KVM uses VMCB01 to cache L1 host state while L2 runs but
> +		 * VMCB01 is going to be used during SMM and thus the state will
> +		 * be lost. Temporary save non-VMLOAD/VMSAVE state to host save
> +		 * area pointed to by MSR_VM_HSAVE_PA. APM guarantees that the
> +		 * format of the area is identical to guest save area offsetted
> +		 * by 0x400 (matches the offset of 'struct vmcb_save_area'
> +		 * within 'struct vmcb'). Note: HSAVE area may also be used by
> +		 * L1 hypervisor to save additional host context (e.g. KVM does
> +		 * that, see svm_prepare_guest_switch()) which must be
> +		 * preserved.
> +		 */

Minor nitpick: I woudn't say that KVM "caches" L1 host state as this implies
that there is a master copy somewhere.
I would write something like that instead:

"KVM stores L1 host state in VMCB01 because the CPU naturally stores it there."

Other than that the comment looks very good.



> +		if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
> +				 &map_save) == -EINVAL)
> +			return 1;

Looks like the return value of vendor specific 'enter_smm' isn't checked
in the common 'enter_smm' handler which itself returns 'void', and doesn't
check anything it does either.

enter_smm is called from inject_pending_event which is allowed to return
a negative value which is assumed to be -EBUSY and doesn't lead
to exit with negative value to userspace.
I vote to fix this and only hide -EBUSY from the userspace,
and let all other errors from this function end up in userspace 
so that userspace could kill the guest.

This isn't this patch fault though, and I think I'll send a patch
to do the above.


> +
> +		BUILD_BUG_ON(offsetof(struct vmcb, save) != 0x400);
I doubt that this will ever change, but let it be.
> +
> +		svm_copy_nonvmloadsave_state(&svm->vmcb01.ptr->save,
> +					     map_save.hva + 0x400);
> +
> +		kvm_vcpu_unmap(vcpu, &map_save, true);
>  	}
>  	return 0;
>  }
> @@ -4313,7 +4337,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
>  static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> -	struct kvm_host_map map;
> +	struct kvm_host_map map, map_save;
>  	int ret = 0;
>  
>  	if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
> @@ -4337,6 +4361,19 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
>  
>  			ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, map.hva);
>  			kvm_vcpu_unmap(vcpu, &map, true);
> +
> +			/*
> +			 * Restore L1 host state from L1 HSAVE area as VMCB01 was
> +			 * used during SMM (see svm_enter_smm())
> +			 */
Looks fine as well.

> +			if (kvm_vcpu_map(vcpu, gpa_to_gfn(svm->nested.hsave_msr),
> +					 &map_save) == -EINVAL)
> +				return 1;
> +
> +			svm_copy_nonvmloadsave_state(map_save.hva + 0x400,
> +						     &svm->vmcb01.ptr->save);
> +
> +			kvm_vcpu_unmap(vcpu, &map_save, true);
>  		}
>  	}
>  

So besides the nitpick of the comment,
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


  reply	other threads:[~2021-07-07 10:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28 10:44 [PATCH 0/6] KVM: nSVM: Fix issues when SMM is entered from L2 Vitaly Kuznetsov
2021-06-28 10:44 ` [PATCH 1/6] KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA Vitaly Kuznetsov
2021-07-07 10:28   ` Maxim Levitsky
2021-07-08 17:27     ` Paolo Bonzini
2021-07-09  6:08       ` Maxim Levitsky
2021-06-28 10:44 ` [PATCH 2/6] KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN Vitaly Kuznetsov
2021-07-07 10:28   ` Maxim Levitsky
2021-06-28 10:44 ` [PATCH 3/6] KVM: nSVM: Introduce svm_copy_nonvmloadsave_state() Vitaly Kuznetsov
2021-07-05 12:08   ` Paolo Bonzini
2021-07-07 10:29     ` Maxim Levitsky
2021-06-28 10:44 ` [PATCH 4/6] KVM: nSVM: Fix L1 state corruption upon return from SMM Vitaly Kuznetsov
2021-07-07 10:32   ` Maxim Levitsky [this message]
2021-06-28 10:44 ` [PATCH 5/6] KVM: nSVM: Restore nested control upon leaving SMM Vitaly Kuznetsov
2021-07-07 10:35   ` Maxim Levitsky
2021-06-28 10:44 ` [PATCH 6/6] KVM: selftests: smm_test: Test SMM enter from L2 Vitaly Kuznetsov
2021-07-07 10:35   ` Maxim Levitsky
2021-07-08 17:40 ` [PATCH 0/6] KVM: nSVM: Fix issues when SMM is entered " Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db569845400963d057eedf063bba1599cb9ed77a.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=cavery@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.