linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: SMM fixes for nVMX
@ 2021-08-26  9:57 Maxim Levitsky
  2021-08-26  9:57 ` [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation Maxim Levitsky
  2021-08-26  9:57 ` [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM Maxim Levitsky
  0 siblings, 2 replies; 10+ messages in thread
From: Maxim Levitsky @ 2021-08-26  9:57 UTC (permalink / raw)
  To: kvm
  Cc: Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Sean Christopherson, Ingo Molnar, Paolo Bonzini,
	Vitaly Kuznetsov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Maxim Levitsky

Those are two patches that fix SMM entries while nested guests
are active and either EPT or unrestricted guest mode is disabled
(EPT disables the later)

1. First patch makes sure that we don't run vmx_handle_exit_irqoff
   when we emulate a handful of real mode smm instructions.

   When in emulation mode, vmx exit reason is not updated,
   and thus this function uses outdated values and crashes.

2. Second patch works around an incorrect restore of segment
   registers upon entry to nested guest from SMM.

   When entering the nested guest from SMM we enter real mode,
   and from it straight to nested guest, and in particular
   once we restore L2's CR0, enter_pmode is called which
   'restores' the segment registers from real mode segment
   cache.

   Normally this isn't a problem since after we finish entering
   the nested guest, we restore all its registers from SMRAM,
   but for the brief period when L2's segment registers are not up to date,
   we trip 'vmx_guest_state_valid' check for non unrestricted guest mode, even
   though it will be later valid.

Note that I still am able to crash L1 by migrating a VM with a
nested guest running and smm load, on VMX.

This even happens with normal stock settings of ept=1,unrestricted_guest=1
and will soon be investigated.

Best regards,
	Maxim Levitsky

Maxim Levitsky (2):
  KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  VMX: nSVM: enter protected mode prior to returning to nested guest
    from SMM

 arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-08-26  9:57 [PATCH 0/2] KVM: SMM fixes for nVMX Maxim Levitsky
@ 2021-08-26  9:57 ` Maxim Levitsky
  2021-08-26 16:01   ` Sean Christopherson
  2021-08-26  9:57 ` [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM Maxim Levitsky
  1 sibling, 1 reply; 10+ messages in thread
From: Maxim Levitsky @ 2021-08-26  9:57 UTC (permalink / raw)
  To: kvm
  Cc: Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Sean Christopherson, Ingo Molnar, Paolo Bonzini,
	Vitaly Kuznetsov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Maxim Levitsky

If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fada1055f325..0c2c0d5ae873 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6382,6 +6382,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+	if (vmx->emulation_required)
+		return;
+
 	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
 		handle_external_interrupt_irqoff(vcpu);
 	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM
  2021-08-26  9:57 [PATCH 0/2] KVM: SMM fixes for nVMX Maxim Levitsky
  2021-08-26  9:57 ` [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation Maxim Levitsky
@ 2021-08-26  9:57 ` Maxim Levitsky
  2021-08-26 16:23   ` Sean Christopherson
  1 sibling, 1 reply; 10+ messages in thread
From: Maxim Levitsky @ 2021-08-26  9:57 UTC (permalink / raw)
  To: kvm
  Cc: Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Sean Christopherson, Ingo Molnar, Paolo Bonzini,
	Vitaly Kuznetsov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Maxim Levitsky

SMM return code switches CPU to real mode, and
then the nested_vmx_enter_non_root_mode first switches to vmcs02,
and then restores CR0 in the KVM register cache.

Unfortunately when it restores the CR0, this enables the protection mode
which leads us to "restore" the segment registers from
"real mode segment cache", which is not up to date vs L2 and trips
'vmx_guest_state_valid check' later, when the
unrestricted guest mode is not enabled.

This happens to work otherwise, because after we enter the nested guest,
we restore its register state again from SMRAM with correct values
and that includes the segment values.

As a workaround to this if we enter protected mode first,
then setting CR0 won't cause this damage.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0c2c0d5ae873..805c415494cf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7507,6 +7507,13 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
 	}
 
 	if (vmx->nested.smm.guest_mode) {
+
+		/*
+		 * Enter protected mode to avoid clobbering L2's segment
+		 * registers during nested guest entry
+		 */
+		vmx_set_cr0(vcpu, vcpu->arch.cr0 | X86_CR0_PE);
+
 		ret = nested_vmx_enter_non_root_mode(vcpu, false);
 		if (ret)
 			return ret;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-08-26  9:57 ` [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation Maxim Levitsky
@ 2021-08-26 16:01   ` Sean Christopherson
  2021-08-30 12:27     ` Maxim Levitsky
  2021-09-06 10:09     ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2021-08-26 16:01 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Paolo Bonzini, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Aug 26, 2021, Maxim Levitsky wrote:
> If we are emulating an invalid guest state, we don't have a correct
> exit reason, and thus we shouldn't do anything in this function.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>

This should have Cc: stable.  I believe userspace could fairly easily trick KVM
into "handling" a spurious IRQ, e.g. trigger SIGALRM and stuff invalid state.
For all those evil folks running CPUs that are almost old enough to drive :-)

> ---
>  arch/x86/kvm/vmx/vmx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index fada1055f325..0c2c0d5ae873 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6382,6 +6382,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  
> +	if (vmx->emulation_required)
> +		return;

Rather than play whack-a-mole with flows consuming stale state, I'd much prefer
to synthesize a VM-Exit(INVALID_GUEST_STATE).  Alternatively, just skip ->run()
entirely by adding hooks in vcpu_enter_guest(), but that's a much larger change
and probably not worth the risk at this juncture.

---
 arch/x86/kvm/vmx/vmx.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 32e3a8b35b13..12fe63800889 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6618,10 +6618,21 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		     vmx->loaded_vmcs->soft_vnmi_blocked))
 		vmx->loaded_vmcs->entry_time = ktime_get();
 
-	/* Don't enter VMX if guest state is invalid, let the exit handler
-	   start emulation until we arrive back to a valid state */
-	if (vmx->emulation_required)
+	/*
+	 * Don't enter VMX if guest state is invalid, let the exit handler
+	 * start emulation until we arrive back to a valid state.  Synthesize a
+	 * consistency check VM-Exit due to invalid guest state and bail.
+	 */
+	if (unlikely(vmx->emulation_required)) {
+		vmx->fail = 0;
+		vmx->exit_reason.full = EXIT_REASON_INVALID_STATE;
+		vmx->exit_reason.failed_vmentry = 1;
+		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1);
+		vmx->exit_qualification = ENTRY_FAIL_DEFAULT;
+		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2);
+		vmx->exit_intr_info = 0;
 		return EXIT_FASTPATH_NONE;
+	}
 
 	trace_kvm_entry(vcpu);
 
--

or the beginnings of an aggressive refactor...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf8fb6eb676a..a4fe0f78898a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9509,6 +9509,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
                goto cancel_injection;
        }

+       if (unlikely(static_call(kvm_x86_emulation_required)(vcpu)))
+               return static_call(kvm_x86_emulate_invalid_guest_state)(vcpu);
+
        preempt_disable();

        static_call(kvm_x86_prepare_guest_switch)(vcpu);

> +
>  	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
>  		handle_external_interrupt_irqoff(vcpu);
>  	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
> -- 
> 2.26.3
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM
  2021-08-26  9:57 ` [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM Maxim Levitsky
@ 2021-08-26 16:23   ` Sean Christopherson
  2021-08-30 12:45     ` Maxim Levitsky
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2021-08-26 16:23 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Paolo Bonzini, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Aug 26, 2021, Maxim Levitsky wrote:
> SMM return code switches CPU to real mode, and
> then the nested_vmx_enter_non_root_mode first switches to vmcs02,
> and then restores CR0 in the KVM register cache.
> 
> Unfortunately when it restores the CR0, this enables the protection mode
> which leads us to "restore" the segment registers from
> "real mode segment cache", which is not up to date vs L2 and trips
> 'vmx_guest_state_valid check' later, when the
> unrestricted guest mode is not enabled.

I suspect this is slightly inaccurate.  When loading vmcs02, vmx_switch_vmcs()
will do vmx_register_cache_reset(), which also causes the segment cache to be
reset.  enter_pmode() will still load stale values, but they'll come from vmcs02,
not KVM's segment register cache.

> This happens to work otherwise, because after we enter the nested guest,
> we restore its register state again from SMRAM with correct values
> and that includes the segment values.
> 
> As a workaround to this if we enter protected mode first,
> then setting CR0 won't cause this damage.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 0c2c0d5ae873..805c415494cf 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7507,6 +7507,13 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
>  	}
>  
>  	if (vmx->nested.smm.guest_mode) {
> +
> +		/*
> +		 * Enter protected mode to avoid clobbering L2's segment
> +		 * registers during nested guest entry
> +		 */
> +		vmx_set_cr0(vcpu, vcpu->arch.cr0 | X86_CR0_PE);

I'd really, really, reaaaally like to avoid stuffing state.  All of the instances
I've come across where KVM has stuffed state for something like this were just
papering over one symptom of an underlying bug.

For example, won't this now cause the same bad behavior if L2 is in Real Mode?

Is the problem purely that emulation_required is stale?  If so, how is it stale?
Every segment write as part of RSM emulation should reevaluate emulation_required
via vmx_set_segment().

Oooooh, or are you talking about the explicit vmx_guest_state_valid() in prepare_vmcs02()?
If that's the case, then we likely should skip that check entirely.  The only part
I'm not 100% clear on is whether or not it can/should be skipped for vmx_set_nested_state().

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bc6327950657..20bd84554c1f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
         * which means L1 attempted VMEntry to L2 with invalid state.
         * Fail the VMEntry.
         */
-       if (CC(!vmx_guest_state_valid(vcpu))) {
+       if (from_vmentry && CC(!vmx_guest_state_valid(vcpu))) {
                *entry_failure_code = ENTRY_FAIL_DEFAULT;
                return -EINVAL;
        }


If we want to retain the check for the common vmx_set_nested_state() path, i.e.
when the vCPU is truly being restored to guest mode, then we can simply exempt
the smm.guest_mode case (which also exempts that case when its set via
vmx_set_nested_state()).  The argument would be that RSM is going to restore L2
state, so whatever happens to be in vmcs12/vmcs02 is stale.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bc6327950657..ac30ba6a8592 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
         * which means L1 attempted VMEntry to L2 with invalid state.
         * Fail the VMEntry.
         */
-       if (CC(!vmx_guest_state_valid(vcpu))) {
+       if (!vmx->nested.smm.guest_mode && CC(!vmx_guest_state_valid(vcpu))) {
                *entry_failure_code = ENTRY_FAIL_DEFAULT;
                return -EINVAL;
        }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-08-26 16:01   ` Sean Christopherson
@ 2021-08-30 12:27     ` Maxim Levitsky
  2021-09-06 10:09     ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Maxim Levitsky @ 2021-08-30 12:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Paolo Bonzini, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, 2021-08-26 at 16:01 +0000, Sean Christopherson wrote:
> On Thu, Aug 26, 2021, Maxim Levitsky wrote:
> > If we are emulating an invalid guest state, we don't have a correct
> > exit reason, and thus we shouldn't do anything in this function.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> 
> This should have Cc: stable.  I believe userspace could fairly easily trick KVM
> into "handling" a spurious IRQ, e.g. trigger SIGALRM and stuff invalid state.
> For all those evil folks running CPUs that are almost old enough to drive :-)
> 
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index fada1055f325..0c2c0d5ae873 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -6382,6 +6382,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >  
> > +	if (vmx->emulation_required)
> > +		return;
> 
> Rather than play whack-a-mole with flows consuming stale state, I'd much prefer
> to synthesize a VM-Exit(INVALID_GUEST_STATE).  Alternatively, just skip ->run()
> entirely by adding hooks in vcpu_enter_guest(), but that's a much larger change
> and probably not worth the risk at this juncture.
> 
> ---
>  arch/x86/kvm/vmx/vmx.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 32e3a8b35b13..12fe63800889 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6618,10 +6618,21 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  		     vmx->loaded_vmcs->soft_vnmi_blocked))
>  		vmx->loaded_vmcs->entry_time = ktime_get();
>  
> -	/* Don't enter VMX if guest state is invalid, let the exit handler
> -	   start emulation until we arrive back to a valid state */
> -	if (vmx->emulation_required)
> +	/*
> +	 * Don't enter VMX if guest state is invalid, let the exit handler
> +	 * start emulation until we arrive back to a valid state.  Synthesize a
> +	 * consistency check VM-Exit due to invalid guest state and bail.
> +	 */
> +	if (unlikely(vmx->emulation_required)) {
> +		vmx->fail = 0;
> +		vmx->exit_reason.full = EXIT_REASON_INVALID_STATE;
> +		vmx->exit_reason.failed_vmentry = 1;
> +		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_1);
> +		vmx->exit_qualification = ENTRY_FAIL_DEFAULT;
> +		kvm_register_mark_available(vcpu, VCPU_EXREG_EXIT_INFO_2);
> +		vmx->exit_intr_info = 0;
>  		return EXIT_FASTPATH_NONE;
> +	}

I was thinking exactly about this when I wrote the patch, and in fact first
version of it did roughly what you suggest.

But I was afraid that this will also introduce a whack-a-mole as now
it "appears" as if VM entry failed and we should thus kill the guest.

But I'll try that.

Thanks a lot for the review!

Best regards,
	Maxim Levitsky


>  
>  	trace_kvm_entry(vcpu);
>  
> --
> 
> or the beginnings of an aggressive refactor...





> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf8fb6eb676a..a4fe0f78898a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9509,6 +9509,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                 goto cancel_injection;
>         }
> 
> +       if (unlikely(static_call(kvm_x86_emulation_required)(vcpu)))
> +               return static_call(kvm_x86_emulate_invalid_guest_state)(vcpu);
> +
>         preempt_disable();
> 
>         static_call(kvm_x86_prepare_guest_switch)(vcpu);
> 
> > +
> >  	if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
> >  		handle_external_interrupt_irqoff(vcpu);
> >  	else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM
  2021-08-26 16:23   ` Sean Christopherson
@ 2021-08-30 12:45     ` Maxim Levitsky
  0 siblings, 0 replies; 10+ messages in thread
From: Maxim Levitsky @ 2021-08-30 12:45 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Paolo Bonzini, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, 2021-08-26 at 16:23 +0000, Sean Christopherson wrote:
> On Thu, Aug 26, 2021, Maxim Levitsky wrote:
> > SMM return code switches CPU to real mode, and
> > then the nested_vmx_enter_non_root_mode first switches to vmcs02,
> > and then restores CR0 in the KVM register cache.
> > 
> > Unfortunately when it restores the CR0, this enables the protection mode
> > which leads us to "restore" the segment registers from
> > "real mode segment cache", which is not up to date vs L2 and trips
> > 'vmx_guest_state_valid check' later, when the
> > unrestricted guest mode is not enabled.
> 
> I suspect this is slightly inaccurate.  When loading vmcs02, vmx_switch_vmcs()
> will do vmx_register_cache_reset(), which also causes the segment cache to be
> reset.  enter_pmode() will still load stale values, but they'll come from vmcs02,
> not KVM's segment register cache.
> 
> > This happens to work otherwise, because after we enter the nested guest,
> > we restore its register state again from SMRAM with correct values
> > and that includes the segment values.
> > 
> > As a workaround to this if we enter protected mode first,
> > then setting CR0 won't cause this damage.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 0c2c0d5ae873..805c415494cf 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7507,6 +7507,13 @@ static int vmx_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
> >  	}
> >  
> >  	if (vmx->nested.smm.guest_mode) {
> > +
> > +		/*
> > +		 * Enter protected mode to avoid clobbering L2's segment
> > +		 * registers during nested guest entry
> > +		 */
> > +		vmx_set_cr0(vcpu, vcpu->arch.cr0 | X86_CR0_PE);
> 
> I'd really, really, reaaaally like to avoid stuffing state.  All of the instances
> I've come across where KVM has stuffed state for something like this were just
> papering over one symptom of an underlying bug.

I can't agree more with you on this. I even called this patch a hack in the cover letter,
because I didn't like it either.


> 
> For example, won't this now cause the same bad behavior if L2 is in Real Mode?
> 
> Is the problem purely that emulation_required is stale?  If so, how is it stale?
> Every segment write as part of RSM emulation should reevaluate emulation_required
> via vmx_set_segment().

So this is what is happening:

1. rsm emulation switches the vCPU from the 64 bit protected mode (since BIOS SMM handler
   of course switches to it) to real mode via CR0 write.

   Here 'enter_rmode' is called which saves current segment register values in 'real mode segemnt cache',
   and then fixes the values in VMCS to 'work' in vm86 mode. The saved architectural values in that 'cache'
   are then used, when trying to read them (e.g via vmx_get_segment)

2. vmx_leave_smm is called which calls nested_vmx_enter_non_root_mode
   this is unusually done in real mode, while otherwise VMX non root mode entry is
   only possible from protected mode (all vmx instructions #UD in real mode).

3. nested_vmx_enter_non_root_mode first thing switches to vmcb02 by vmx_switch_vmcs
   which 'loads' the L2 segments, because it zeros the segment cache via vmx_register_cache_reset),
   so any attempt to read them will read them from vmcs02.

   That means that at this point all good segment values are loaded.

4. Now prepare_vmcs02 is called which eventually sets KVM's CR0 using 'vmx_set_cr0'

   At that point that function notices that we are entering protected mode and thus 
   enter_pmode is called, which first reads the segment values from the real mode segment
   cache (which reflect sadly change to CS that rsm emulation did), updates their base & selectors
   but not segment types, and writes back these segments, corrupting the L2 state.

   The code is:

   vmx_get_segment(vcpu, &vmx->rmode.segs[VCPU_SREG_CS], VCPU_SREG_CS); // reads segment cache since vmx->rmode.vm86_active = 1;
   ...
   vmx->rmode.vm86_active = 0;
   ...
   fix_pmode_seg(vcpu, VCPU_SREG_CS, &vmx->rmode.segs[VCPU_SREG_CS]):
	__vmx_set_segment(vcpu, save, seg);


My hack was to avoid all this by setting protected mode first and then doing the nested
entry, which is more natural as I said above.


> 
> Oooooh, or are you talking about the explicit vmx_guest_state_valid() in prepare_vmcs02()?
> If that's the case, then we likely should skip that check entirely.  The only part
> I'm not 100% clear on is whether or not it can/should be skipped for vmx_set_nested_state().

Yes. Initially in the first version (which I didn't post) of the patches I indeed just removed this check and it 
works sans another fix which is correct to have anyway,
(see note below).

The L2 will briefly have invalid state and it will be fixed by loading registers from SMRAM.
 
 
For vmx_set_nested_state I suspect something similiar can happen at least in theory:
We load the nested state, and then restore the registers, and only then the state becomes valid.

So it makes sense to remove this check for all but from_entry==true case.
 
However we do need to extend the check in vmx_vcpu_run that if the guest state is not valid and we
are nested, then fail instead of emulating.
I'll do this.
 

NOTE: There is another fix that has to be done if I remove the
check for validity of the nested state in nested_vmx_enter_non_root_mode, instead of stuffing
the protected mode state hack:

This is what is happening:
 
1. rsm emulation switches vCPU (that is vmcs01) to real mode, this state is left in vmcs01
This means that now L1 state is not valid as well!
(but with my hack that switches vCPU to protected mode, this doesn't happen accidentaly!)
 

2. We switch to vmcb02, L2 state temporary invalid as it has protected mode segments and real mode. 

3. rsm emulation loads L2 registers from SMBASE, and makes the L2 state valid again.
 
4. we (optionally) enter L2
 
5. we exit to L1. L1 guest state is real mode, and invalid now.

We overwrite L1's guest state with vmcb12 host state which is *valid*, however the way the
'load_vmcs12_host_state' works, is that it uses __vmx_set_segment which doesn't update
'emulation_required', and thus the L1 state doesn't become valid, 
we try to emulate it and crash eventually as the emulator can't really emulate everything.

I am now posting a new version of my SMM fixes with title '[PATCH v2 0/6] KVM: few more SMM fixes' 
(I merged the SVM and VMX fixes in single patch series), and I include all of the above there.

Thanks again for the review!
  
Best regards,
	Maxim Levitsky


> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index bc6327950657..20bd84554c1f 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>          * which means L1 attempted VMEntry to L2 with invalid state.
>          * Fail the VMEntry.
>          */
> -       if (CC(!vmx_guest_state_valid(vcpu))) {
> +       if (from_vmentry && CC(!vmx_guest_state_valid(vcpu))) {
>                 *entry_failure_code = ENTRY_FAIL_DEFAULT;
>                 return -EINVAL;
>         }
> 
> 
> If we want to retain the check for the common vmx_set_nested_state() path, i.e.
> when the vCPU is truly being restored to guest mode, then we can simply exempt
> the smm.guest_mode case (which also exempts that case when its set via
> vmx_set_nested_state()).  The argument would be that RSM is going to restore L2
> state, so whatever happens to be in vmcs12/vmcs02 is stale.
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index bc6327950657..ac30ba6a8592 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2547,7 +2547,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>          * which means L1 attempted VMEntry to L2 with invalid state.
>          * Fail the VMEntry.
>          */
> -       if (CC(!vmx_guest_state_valid(vcpu))) {
> +       if (!vmx->nested.smm.guest_mode && CC(!vmx_guest_state_valid(vcpu))) {
>                 *entry_failure_code = ENTRY_FAIL_DEFAULT;
>                 return -EINVAL;
>         }
> 





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-08-26 16:01   ` Sean Christopherson
  2021-08-30 12:27     ` Maxim Levitsky
@ 2021-09-06 10:09     ` Paolo Bonzini
  2021-09-06 21:07       ` Maxim Levitsky
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2021-09-06 10:09 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On 26/08/21 18:01, Sean Christopherson wrote:
>> +	if (vmx->emulation_required)
>> +		return;
> Rather than play whack-a-mole with flows consuming stale state, I'd much prefer
> to synthesize a VM-Exit(INVALID_GUEST_STATE).  Alternatively, just skip ->run()
> entirely by adding hooks in vcpu_enter_guest(), but that's a much larger change
> and probably not worth the risk at this juncture.

I'm going with Maxim's patch for now (and for stable kernels especially)
but I like the


+       if (unlikely(static_call(kvm_x86_emulation_required)(vcpu)))
+               return static_call(kvm_x86_emulate_invalid_guest_state)(vcpu);
+

idea.  I'll put a Fixes for 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and
async #PFs in common irqs-disabled fn", Linux 5.3).

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-09-06 10:09     ` Paolo Bonzini
@ 2021-09-06 21:07       ` Maxim Levitsky
  2021-09-07  6:50         ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Maxim Levitsky @ 2021-09-06 21:07 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Mon, 2021-09-06 at 12:09 +0200, Paolo Bonzini wrote:
> On 26/08/21 18:01, Sean Christopherson wrote:
> > > +	if (vmx->emulation_required)
> > > +		return;
> > Rather than play whack-a-mole with flows consuming stale state, I'd much prefer
> > to synthesize a VM-Exit(INVALID_GUEST_STATE).  Alternatively, just skip ->run()
> > entirely by adding hooks in vcpu_enter_guest(), but that's a much larger change
> > and probably not worth the risk at this juncture.
> 
> I'm going with Maxim's patch for now (and for stable kernels especially)
> but I like the
> 
> 
> +       if (unlikely(static_call(kvm_x86_emulation_required)(vcpu)))
> +               return static_call(kvm_x86_emulate_invalid_guest_state)(vcpu);
> +
> 
> idea.  I'll put a Fixes for 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and
> async #PFs in common irqs-disabled fn", Linux 5.3).
> 
> Paolo
> 
Note that I posted V2 of this patch series ([PATCH v2 0/6] KVM: few more SMM fixes)

There I addressed the review feedback from this patch series,
and for this particular case, I synthesized invalid VM exit as was suggested.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation
  2021-09-06 21:07       ` Maxim Levitsky
@ 2021-09-07  6:50         ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2021-09-07  6:50 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Thomas Gleixner, Wanpeng Li, Joerg Roedel, H. Peter Anvin,
	Jim Mattson, Ingo Molnar, Vitaly Kuznetsov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Borislav Petkov, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On 06/09/21 23:07, Maxim Levitsky wrote:
> Note that I posted V2 of this patch series ([PATCH v2 0/6] KVM: few more SMM fixes)
> 
> There I addressed the review feedback from this patch series,
> and for this particular case, I synthesized invalid VM exit as was suggested.

Yes, that's intended.  I will revert this version in 5.16.

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-07  6:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26  9:57 [PATCH 0/2] KVM: SMM fixes for nVMX Maxim Levitsky
2021-08-26  9:57 ` [PATCH 1/2] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation Maxim Levitsky
2021-08-26 16:01   ` Sean Christopherson
2021-08-30 12:27     ` Maxim Levitsky
2021-09-06 10:09     ` Paolo Bonzini
2021-09-06 21:07       ` Maxim Levitsky
2021-09-07  6:50         ` Paolo Bonzini
2021-08-26  9:57 ` [PATCH 2/2] VMX: nSVM: enter protected mode prior to returning to nested guest from SMM Maxim Levitsky
2021-08-26 16:23   ` Sean Christopherson
2021-08-30 12:45     ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).