All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer
@ 2020-04-14  0:09 Jim Mattson
  2020-04-14  0:09 ` [PATCH 2/2] kvm: nVMX: Single-step traps " Jim Mattson
  2020-04-22 21:06 ` [PATCH 1/2] kvm: nVMX: Pending debug exceptions " Sean Christopherson
  0 siblings, 2 replies; 21+ messages in thread
From: Jim Mattson @ 2020-04-14  0:09 UTC (permalink / raw)
  To: kvm; +Cc: Jim Mattson, Oliver Upton, Peter Shier

Previously, if L1 launched vmcs12 with both pending debug exceptions
and an already-expired VMX-preemption timer, the pending debug
exceptions were lost due to a priority inversion between a pending #DB
exception and a "VMX-preemption timer expired" VM-exit from L2 to L1.

In this scenario, L0 constructs a vmcs02 that has both a zero-valued
VMX-preemption timer (assuming enable_preemption_timer is set) and
pending debug exceptions. When the vmcs02 is launched/resumed, the
hardware correctly prioritizes the pending debug exceptions. At this
point, L0 intercepts the resulting #DB trap and queues it up for
redelivery. However, when checking for nested events in software, L0
incorrectly prioritizes the "VMX-preemption timer expired" VM-exit
from L2 to L1.

Technically, nested events should probably be blocked at this
point. Hardware has already determined that the #DB trap is the next
event that should happen. L0 just got in the way because it was
concerned about infinite IDT vectoring loops.

Logically, the enqueued #DB trap is quite similar to a "reinjected"
event resulting from interrupted IDT-vectoring. Treating it as such
fixes the problem, since nested events are blocked when a reinjected
event is present. However, there are some ways in which the enqueued
interrupted IDT-vectoring. In particular, it should not be recorded in
the IDT-vectoring information field of the vmcs12 in the event of a
synthesized VM-exit from L2 to L1. I don't believe that path should
ever be taken, since the #DB trap should take priority over any
synthesized VM-exit from L2 to L1.

Recategorize both the reinjected #DB and #AC exceptions as
"reinjected" exceptions. For consistency, do the same thing for SVM,
even though it doesn't have a VMX-preemption timer equivalent.

Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
---
 arch/x86/kvm/svm/svm.c | 4 ++--
 arch/x86/kvm/vmx/vmx.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2be5bbae3a40..26b30099c4e4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1739,7 +1739,7 @@ static int db_interception(struct vcpu_svm *svm)
 	if (!(svm->vcpu.guest_debug &
 	      (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) &&
 		!svm->nmi_singlestep) {
-		kvm_queue_exception(&svm->vcpu, DB_VECTOR);
+		kvm_requeue_exception(&svm->vcpu, DB_VECTOR);
 		return 1;
 	}
 
@@ -1778,7 +1778,7 @@ static int ud_interception(struct vcpu_svm *svm)
 
 static int ac_interception(struct vcpu_svm *svm)
 {
-	kvm_queue_exception_e(&svm->vcpu, AC_VECTOR, 0);
+	kvm_requeue_exception_e(&svm->vcpu, AC_VECTOR, 0);
 	return 1;
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 83050977490c..aae01253bfba 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4682,7 +4682,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 			if (is_icebp(intr_info))
 				WARN_ON(!skip_emulated_instruction(vcpu));
 
-			kvm_queue_exception(vcpu, DB_VECTOR);
+			kvm_requeue_exception(vcpu, DB_VECTOR);
 			return 1;
 		}
 		kvm_run->debug.arch.dr6 = dr6 | DR6_FIXED_1;
@@ -4703,7 +4703,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 		break;
 	case AC_VECTOR:
 		if (guest_inject_ac(vcpu)) {
-			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			kvm_requeue_exception_e(vcpu, AC_VECTOR, error_code);
 			return 1;
 		}
 
-- 
2.26.0.110.g2183baf09c-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-14  0:09 [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer Jim Mattson
@ 2020-04-14  0:09 ` Jim Mattson
  2020-04-14  3:17   ` Sean Christopherson
  2020-04-22  8:30   ` Paolo Bonzini
  2020-04-22 21:06 ` [PATCH 1/2] kvm: nVMX: Pending debug exceptions " Sean Christopherson
  1 sibling, 2 replies; 21+ messages in thread
From: Jim Mattson @ 2020-04-14  0:09 UTC (permalink / raw)
  To: kvm; +Cc: Jim Mattson, Oliver Upton, Peter Shier

Previously, if the hrtimer for the nested VMX-preemption timer fired
while L0 was emulating an L2 instruction with RFLAGS.TF set, the
synthesized single-step trap would be unceremoniously dropped when
synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.

To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
from L2 to L1 when there is a pending debug trap, such as a
single-step trap.

Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
---
 arch/x86/kvm/vmx/nested.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cbc9ea2de28f..6ab974debd44 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
 	    vmx->nested.preemption_timer_expired) {
 		if (block_nested_events)
 			return -EBUSY;
-		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
+		if (!vmx_pending_dbg_trap(vcpu))
+			nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER,
+					  0, 0);
 		return 0;
 	}
 
-- 
2.26.0.110.g2183baf09c-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-14  0:09 ` [PATCH 2/2] kvm: nVMX: Single-step traps " Jim Mattson
@ 2020-04-14  3:17   ` Sean Christopherson
  2020-04-14 16:47     ` Jim Mattson
  2020-04-22  8:30   ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2020-04-14  3:17 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Oliver Upton, Peter Shier

On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> Previously, if the hrtimer for the nested VMX-preemption timer fired
> while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> synthesized single-step trap would be unceremoniously dropped when
> synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> 
> To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> from L2 to L1 when there is a pending debug trap, such as a
> single-step trap.
> 
> Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index cbc9ea2de28f..6ab974debd44 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>  	    vmx->nested.preemption_timer_expired) {
>  		if (block_nested_events)
>  			return -EBUSY;
> -		nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> +		if (!vmx_pending_dbg_trap(vcpu))

IMO this one warrants a comment.  It's not immediately obvious that this
only applies to #DBs that are being injected into L2, and that returning
-EBUSY will do the wrong thing.

> +			nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER,
> +					  0, 0);

I'd just let the "0, 0);" poke out past 80 chars.

>  		return 0;
>  	}
>  
> -- 
> 2.26.0.110.g2183baf09c-goog
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-14  3:17   ` Sean Christopherson
@ 2020-04-14 16:47     ` Jim Mattson
  2020-04-15  0:12       ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-14 16:47 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Oliver Upton, Peter Shier

On Mon, Apr 13, 2020 at 8:17 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > synthesized single-step trap would be unceremoniously dropped when
> > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> >
> > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > from L2 to L1 when there is a pending debug trap, such as a
> > single-step trap.
> >
> > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index cbc9ea2de28f..6ab974debd44 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >           vmx->nested.preemption_timer_expired) {
> >               if (block_nested_events)
> >                       return -EBUSY;
> > -             nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> > +             if (!vmx_pending_dbg_trap(vcpu))
>
> IMO this one warrants a comment.  It's not immediately obvious that this
> only applies to #DBs that are being injected into L2, and that returning
> -EBUSY will do the wrong thing.

Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
what the potential confusion is regarding the event. Are you
suggesting that one might think that we have a #DB to deliver to L1
while we're in guest mode? IIRC, that can happen under SVM, but I
don't believe it can happen under VMX.

> > +                     nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER,
> > +                                       0, 0);
>
> I'd just let the "0, 0);" poke out past 80 chars.

Oh, you rascal, you!

> >               return 0;
> >       }
> >
> > --
> > 2.26.0.110.g2183baf09c-goog
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-14 16:47     ` Jim Mattson
@ 2020-04-15  0:12       ` Sean Christopherson
  2020-04-15  0:20         ` Sean Christopherson
  2020-04-15 23:33         ` Jim Mattson
  0 siblings, 2 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-15  0:12 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> On Mon, Apr 13, 2020 at 8:17 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> > > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > > synthesized single-step trap would be unceremoniously dropped when
> > > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > >
> > > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > > from L2 to L1 when there is a pending debug trap, such as a
> > > single-step trap.
> > >
> > > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > Reviewed-by: Peter Shier <pshier@google.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index cbc9ea2de28f..6ab974debd44 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > >           vmx->nested.preemption_timer_expired) {
> > >               if (block_nested_events)
> > >                       return -EBUSY;
> > > -             nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> > > +             if (!vmx_pending_dbg_trap(vcpu))
> >
> > IMO this one warrants a comment.  It's not immediately obvious that this
> > only applies to #DBs that are being injected into L2, and that returning
> > -EBUSY will do the wrong thing.
> 
> Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> what the potential confusion is regarding the event. Are you
> suggesting that one might think that we have a #DB to deliver to L1
> while we're in guest mode? IIRC, that can happen under SVM, but I
> don't believe it can happen under VMX.

The potential confusion is that vcpu->arch.exception.pending was already
checked, twice.  It makes one wonder why it needs to be checked a third
time.  And actually, I think that's probably a good indicator that singling
out single-step #DB isn't the correct fix, it just happens to be the only
case that's been encountered thus far, e.g. a #PF when fetching the instr
for emulation should also get priority over the preemption timer.  On real
hardware, expiration of the preemption timer while vectoring a #PF wouldn't
wouldn't get recognized until the next instruction boundary, i.e. at the
start of the first instruction of the #PF handler.  Dropping the #PF isn't
a problem in most cases, because unlike the single-step #DB, it will be
re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.

In general, interception of an event doesn't change the priority of events,
e.g. INTR shouldn't get priority over NMI just because if L1 wants to
intercept INTR but not NMI.

TL;DR: I think the fix should instead be:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c868c64770e0..042d7a9037be 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
        /*
         * Process any exceptions that are not debug traps before MTF.
         */
-       if (vcpu->arch.exception.pending &&
-           !vmx_pending_dbg_trap(vcpu) &&
-           nested_vmx_check_exception(vcpu, &exit_qual)) {
+       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
+               if (!nested_vmx_check_exception(vcpu, &exit_qual))
+                       return 0;
+
                if (block_nested_events)
                        return -EBUSY;
                nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
@@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
                return 0;
        }

-       if (vcpu->arch.exception.pending &&
-           nested_vmx_check_exception(vcpu, &exit_qual)) {
+       if (vcpu->arch.exception.pending) {
+               if (!nested_vmx_check_exception(vcpu, &exit_qual))
+                       return 0;
+
                if (block_nested_events)
                        return -EBUSY;
                nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
@@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
                return 0;
        }

-       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
+       if (vcpu->arch.nmi_pending) {
+               if (!nested_exit_on_nmi(vcpu))
+                       return 0;
+
                if (block_nested_events)
                        return -EBUSY;
                nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
@@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
                return 0;
        }

-       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
+       if (kvm_cpu_has_interrupt(vcpu) {
+               if (!nested_exit_on_intr(vcpu))
+                       return 0;
+
                if (block_nested_events)
                        return -EBUSY;
                nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-15  0:12       ` Sean Christopherson
@ 2020-04-15  0:20         ` Sean Christopherson
  2020-04-15  0:22           ` Sean Christopherson
  2020-04-15 23:33         ` Jim Mattson
  1 sibling, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2020-04-15  0:20 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Tue, Apr 14, 2020 at 05:12:12PM -0700, Sean Christopherson wrote:
> On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > On Mon, Apr 13, 2020 at 8:17 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> > > > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > > > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > > > synthesized single-step trap would be unceremoniously dropped when
> > > > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > > >
> > > > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > > > from L2 to L1 when there is a pending debug trap, such as a
> > > > single-step trap.
> > > >
> > > > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > > Reviewed-by: Peter Shier <pshier@google.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index cbc9ea2de28f..6ab974debd44 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > > >           vmx->nested.preemption_timer_expired) {
> > > >               if (block_nested_events)
> > > >                       return -EBUSY;
> > > > -             nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> > > > +             if (!vmx_pending_dbg_trap(vcpu))
> > >
> > > IMO this one warrants a comment.  It's not immediately obvious that this
> > > only applies to #DBs that are being injected into L2, and that returning
> > > -EBUSY will do the wrong thing.
> > 
> > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > what the potential confusion is regarding the event. Are you
> > suggesting that one might think that we have a #DB to deliver to L1
> > while we're in guest mode? IIRC, that can happen under SVM, but I
> > don't believe it can happen under VMX.
> 
> The potential confusion is that vcpu->arch.exception.pending was already
> checked, twice.  It makes one wonder why it needs to be checked a third
> time.  And actually, I think that's probably a good indicator that singling
> out single-step #DB isn't the correct fix, it just happens to be the only
> case that's been encountered thus far, e.g. a #PF when fetching the instr
> for emulation should also get priority over the preemption timer.  On real
> hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> wouldn't get recognized until the next instruction boundary, i.e. at the
> start of the first instruction of the #PF handler.  Dropping the #PF isn't
> a problem in most cases, because unlike the single-step #DB, it will be
> re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.
> 
> In general, interception of an event doesn't change the priority of events,
> e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> intercept INTR but not NMI.
> 
> TL;DR: I think the fix should instead be:
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index c868c64770e0..042d7a9037be 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>         /*
>          * Process any exceptions that are not debug traps before MTF.
>          */
> -       if (vcpu->arch.exception.pending &&
> -           !vmx_pending_dbg_trap(vcpu) &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
> 
> -       if (vcpu->arch.exception.pending &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending) {
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
> 
> -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +       if (vcpu->arch.nmi_pending) {
> +               if (!nested_exit_on_nmi(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
> 
> -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> +       if (kvm_cpu_has_interrupt(vcpu) {

Obviously untested, because this doesn't compile due to a missing ')'.

> +               if (!nested_exit_on_intr(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-15  0:20         ` Sean Christopherson
@ 2020-04-15  0:22           ` Sean Christopherson
  0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-15  0:22 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Tue, Apr 14, 2020 at 05:20:44PM -0700, Sean Christopherson wrote:
> On Tue, Apr 14, 2020 at 05:12:12PM -0700, Sean Christopherson wrote:
> > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > > what the potential confusion is regarding the event. Are you
> > > suggesting that one might think that we have a #DB to deliver to L1
> > > while we're in guest mode? IIRC, that can happen under SVM, but I
> > > don't believe it can happen under VMX.
> > 
> > The potential confusion is that vcpu->arch.exception.pending was already
> > checked, twice.  It makes one wonder why it needs to be checked a third
> > time.  And actually, I think that's probably a good indicator that singling
> > out single-step #DB isn't the correct fix, it just happens to be the only
> > case that's been encountered thus far, e.g. a #PF when fetching the instr
> > for emulation should also get priority over the preemption timer.  On real
> > hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> > wouldn't get recognized until the next instruction boundary, i.e. at the
> > start of the first instruction of the #PF handler.  Dropping the #PF isn't
> > a problem in most cases, because unlike the single-step #DB, it will be
> > re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.
> > 
> > In general, interception of an event doesn't change the priority of events,
> > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > intercept INTR but not NMI.
> > 
> > TL;DR: I think the fix should instead be:
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index c868c64770e0..042d7a9037be 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >         /*
> >          * Process any exceptions that are not debug traps before MTF.
> >          */
> > -       if (vcpu->arch.exception.pending &&
> > -           !vmx_pending_dbg_trap(vcpu) &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))

*sigh*  And a missing '{' here.

> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> > 
> > -       if (vcpu->arch.exception.pending &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending) {
> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> > 
> > -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> > +       if (vcpu->arch.nmi_pending) {
> > +               if (!nested_exit_on_nmi(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> > @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> > 
> > -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> > +       if (kvm_cpu_has_interrupt(vcpu) {
> 
> Obviously untested, because this doesn't compile due to a missing ')'.
> 
> > +               if (!nested_exit_on_intr(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-15  0:12       ` Sean Christopherson
  2020-04-15  0:20         ` Sean Christopherson
@ 2020-04-15 23:33         ` Jim Mattson
  2020-04-18  4:21           ` Sean Christopherson
  1 sibling, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-15 23:33 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Oliver Upton, Peter Shier

On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > On Mon, Apr 13, 2020 at 8:17 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Mon, Apr 13, 2020 at 05:09:46PM -0700, Jim Mattson wrote:
> > > > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > > > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > > > synthesized single-step trap would be unceremoniously dropped when
> > > > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > > >
> > > > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > > > from L2 to L1 when there is a pending debug trap, such as a
> > > > single-step trap.
> > > >
> > > > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > > Reviewed-by: Oliver Upton <oupton@google.com>
> > > > Reviewed-by: Peter Shier <pshier@google.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/nested.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > > index cbc9ea2de28f..6ab974debd44 100644
> > > > --- a/arch/x86/kvm/vmx/nested.c
> > > > +++ b/arch/x86/kvm/vmx/nested.c
> > > > @@ -3690,7 +3690,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> > > >           vmx->nested.preemption_timer_expired) {
> > > >               if (block_nested_events)
> > > >                       return -EBUSY;
> > > > -             nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
> > > > +             if (!vmx_pending_dbg_trap(vcpu))
> > >
> > > IMO this one warrants a comment.  It's not immediately obvious that this
> > > only applies to #DBs that are being injected into L2, and that returning
> > > -EBUSY will do the wrong thing.
> >
> > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > what the potential confusion is regarding the event. Are you
> > suggesting that one might think that we have a #DB to deliver to L1
> > while we're in guest mode? IIRC, that can happen under SVM, but I
> > don't believe it can happen under VMX.
>
> The potential confusion is that vcpu->arch.exception.pending was already
> checked, twice.  It makes one wonder why it needs to be checked a third
> time.  And actually, I think that's probably a good indicator that singling
> out single-step #DB isn't the correct fix, it just happens to be the only
> case that's been encountered thus far, e.g. a #PF when fetching the instr
> for emulation should also get priority over the preemption timer.  On real
> hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> wouldn't get recognized until the next instruction boundary, i.e. at the
> start of the first instruction of the #PF handler.  Dropping the #PF isn't
> a problem in most cases, because unlike the single-step #DB, it will be
> re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.

Yes, it's wrong in the abstract, but with respect to faults and the
VMX-preemption timer expiration, is there any way for either L1 or L2
to *know* that the virtual CPU has done something wrong?

Isn't it generally true that if you have an exception queued when you
transition from L2 to L1, then you've done something wrong? I wonder
if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
serves to sweep a whole collection of problems under the rug.

> In general, interception of an event doesn't change the priority of events,
> e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> intercept INTR but not NMI.

Yes, but that's a different problem altogether.

> TL;DR: I think the fix should instead be:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index c868c64770e0..042d7a9037be 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>         /*
>          * Process any exceptions that are not debug traps before MTF.
>          */
> -       if (vcpu->arch.exception.pending &&
> -           !vmx_pending_dbg_trap(vcpu) &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (vcpu->arch.exception.pending &&
> -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> +       if (vcpu->arch.exception.pending) {
> +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> +       if (vcpu->arch.nmi_pending) {
> +               if (!nested_exit_on_nmi(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>
> -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> +       if (kvm_cpu_has_interrupt(vcpu) {
> +               if (!nested_exit_on_intr(vcpu))
> +                       return 0;
> +
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-15 23:33         ` Jim Mattson
@ 2020-04-18  4:21           ` Sean Christopherson
  2020-04-20 17:18             ` Jim Mattson
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2020-04-18  4:21 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > > what the potential confusion is regarding the event. Are you
> > > suggesting that one might think that we have a #DB to deliver to L1
> > > while we're in guest mode? IIRC, that can happen under SVM, but I
> > > don't believe it can happen under VMX.
> >
> > The potential confusion is that vcpu->arch.exception.pending was already
> > checked, twice.  It makes one wonder why it needs to be checked a third
> > time.  And actually, I think that's probably a good indicator that singling
> > out single-step #DB isn't the correct fix, it just happens to be the only
> > case that's been encountered thus far, e.g. a #PF when fetching the instr
> > for emulation should also get priority over the preemption timer.  On real
> > hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> > wouldn't get recognized until the next instruction boundary, i.e. at the
> > start of the first instruction of the #PF handler.  Dropping the #PF isn't
> > a problem in most cases, because unlike the single-step #DB, it will be
> > re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.
> 
> Yes, it's wrong in the abstract, but with respect to faults and the
> VMX-preemption timer expiration, is there any way for either L1 or L2
> to *know* that the virtual CPU has done something wrong?

I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
of fudging the result, why wouldn't we fix KVM?

> Isn't it generally true that if you have an exception queued when you
> transition from L2 to L1, then you've done something wrong? I wonder
> if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> serves to sweep a whole collection of problems under the rug.

More than likely, yes.

> > In general, interception of an event doesn't change the priority of events,
> > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > intercept INTR but not NMI.
> 
> Yes, but that's a different problem altogether.

But isn't the fix the same?  Stop processing events if a higher priority
event is pending, regardless of whether the event exits to L1.

> > TL;DR: I think the fix should instead be:
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index c868c64770e0..042d7a9037be 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3724,9 +3724,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >         /*
> >          * Process any exceptions that are not debug traps before MTF.
> >          */
> > -       if (vcpu->arch.exception.pending &&
> > -           !vmx_pending_dbg_trap(vcpu) &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending && !vmx_pending_dbg_trap(vcpu))
> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3741,8 +3742,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (vcpu->arch.exception.pending &&
> > -           nested_vmx_check_exception(vcpu, &exit_qual)) {
> > +       if (vcpu->arch.exception.pending) {
> > +               if (!nested_vmx_check_exception(vcpu, &exit_qual))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_inject_exception_vmexit(vcpu, exit_qual);
> > @@ -3757,7 +3760,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
> > +       if (vcpu->arch.nmi_pending) {
> > +               if (!nested_exit_on_nmi(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
> > @@ -3772,7 +3778,10 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
> >                 return 0;
> >         }
> >
> > -       if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
> > +       if (kvm_cpu_has_interrupt(vcpu) {
> > +               if (!nested_exit_on_intr(vcpu))
> > +                       return 0;
> > +
> >                 if (block_nested_events)
> >                         return -EBUSY;
> >                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-18  4:21           ` Sean Christopherson
@ 2020-04-20 17:18             ` Jim Mattson
  2020-04-21  4:41               ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-20 17:18 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Oliver Upton, Peter Shier

On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > > Regarding -EBUSY, I'm in complete agreement. However, I'm not sure
> > > > what the potential confusion is regarding the event. Are you
> > > > suggesting that one might think that we have a #DB to deliver to L1
> > > > while we're in guest mode? IIRC, that can happen under SVM, but I
> > > > don't believe it can happen under VMX.
> > >
> > > The potential confusion is that vcpu->arch.exception.pending was already
> > > checked, twice.  It makes one wonder why it needs to be checked a third
> > > time.  And actually, I think that's probably a good indicator that singling
> > > out single-step #DB isn't the correct fix, it just happens to be the only
> > > case that's been encountered thus far, e.g. a #PF when fetching the instr
> > > for emulation should also get priority over the preemption timer.  On real
> > > hardware, expiration of the preemption timer while vectoring a #PF wouldn't
> > > wouldn't get recognized until the next instruction boundary, i.e. at the
> > > start of the first instruction of the #PF handler.  Dropping the #PF isn't
> > > a problem in most cases, because unlike the single-step #DB, it will be
> > > re-encountered when L1 resumes L2.  But, dropping the #PF is still wrong.
> >
> > Yes, it's wrong in the abstract, but with respect to faults and the
> > VMX-preemption timer expiration, is there any way for either L1 or L2
> > to *know* that the virtual CPU has done something wrong?
>
> I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
> of fudging the result, why wouldn't we fix KVM?

I'm not sure that I can fix KVM. The missing #DB traps were relatively
straightforward, but as for the rest of this mess...

Since you seem to have a handle on what needs to be done, I will defer to you.

> > Isn't it generally true that if you have an exception queued when you
> > transition from L2 to L1, then you've done something wrong? I wonder
> > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> > serves to sweep a whole collection of problems under the rug.
>
> More than likely, yes.
>
> > > In general, interception of an event doesn't change the priority of events,
> > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > > intercept INTR but not NMI.
> >
> > Yes, but that's a different problem altogether.
>
> But isn't the fix the same?  Stop processing events if a higher priority
> event is pending, regardless of whether the event exits to L1.

That depends on how you see the scope of the problem. One could argue
that the fix for everything that is wrong with KVM is actually the
same: properly emulate the physical CPU.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-20 17:18             ` Jim Mattson
@ 2020-04-21  4:41               ` Sean Christopherson
  2020-04-21 18:28                 ` Jim Mattson
  0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2020-04-21  4:41 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Mon, Apr 20, 2020 at 10:18:42AM -0700, Jim Mattson wrote:
> On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> >
> > On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> > > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> > > <sean.j.christopherson@intel.com> wrote:
> > > >
> > > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > Yes, it's wrong in the abstract, but with respect to faults and the
> > > VMX-preemption timer expiration, is there any way for either L1 or L2
> > > to *know* that the virtual CPU has done something wrong?
> >
> > I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
> > of fudging the result, why wouldn't we fix KVM?
> 
> I'm not sure that I can fix KVM. The missing #DB traps were relatively
> straightforward, but as for the rest of this mess...
> 
> Since you seem to have a handle on what needs to be done, I will defer to
> you.

I wouldn't go so far as to say I have a handle on it, more like I have an
idea of how to fix one part of the overall problem with a generic "rule"
change that also happens to (hopefully) resolve the #DB+MTF issue.

Anyways, I'll send a patch.  Worst case scenario it fails miserably and we
go with this patch :-)

> > > Isn't it generally true that if you have an exception queued when you
> > > transition from L2 to L1, then you've done something wrong? I wonder
> > > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> > > serves to sweep a whole collection of problems under the rug.
> >
> > More than likely, yes.
> >
> > > > In general, interception of an event doesn't change the priority of events,
> > > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > > > intercept INTR but not NMI.
> > >
> > > Yes, but that's a different problem altogether.
> >
> > But isn't the fix the same?  Stop processing events if a higher priority
> > event is pending, regardless of whether the event exits to L1.
> 
> That depends on how you see the scope of the problem. One could argue
> that the fix for everything that is wrong with KVM is actually the
> same: properly emulate the physical CPU.

Heh, there is that.

What I'm arguing is that we shouldn't throw in a workaround knowing that
it's papering over the underlying issue.  Preserving event priority
irrespective of VM-Exit behavior is different, in that while it may not
resolve all issues that are being masked by kvm_clear_exception_queue(),
the change itself is correct when viewed in a vacuum.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-21  4:41               ` Sean Christopherson
@ 2020-04-21 18:28                 ` Jim Mattson
  2020-04-22  0:16                   ` Sean Christopherson
  0 siblings, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-21 18:28 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Oliver Upton, Peter Shier

On Mon, Apr 20, 2020 at 9:41 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Apr 20, 2020 at 10:18:42AM -0700, Jim Mattson wrote:
> > On Fri, Apr 17, 2020 at 9:21 PM Sean Christopherson
> > <sean.j.christopherson@intel.com> wrote:
> > >
> > > On Wed, Apr 15, 2020 at 04:33:31PM -0700, Jim Mattson wrote:
> > > > On Tue, Apr 14, 2020 at 5:12 PM Sean Christopherson
> > > > <sean.j.christopherson@intel.com> wrote:
> > > > >
> > > > > On Tue, Apr 14, 2020 at 09:47:53AM -0700, Jim Mattson wrote:
> > > > Yes, it's wrong in the abstract, but with respect to faults and the
> > > > VMX-preemption timer expiration, is there any way for either L1 or L2
> > > > to *know* that the virtual CPU has done something wrong?
> > >
> > > I don't think so?  But how is that relevant, i.e. if we can fix KVM instead
> > > of fudging the result, why wouldn't we fix KVM?
> >
> > I'm not sure that I can fix KVM. The missing #DB traps were relatively
> > straightforward, but as for the rest of this mess...
> >
> > Since you seem to have a handle on what needs to be done, I will defer to
> > you.
>
> I wouldn't go so far as to say I have a handle on it, more like I have an
> idea of how to fix one part of the overall problem with a generic "rule"
> change that also happens to (hopefully) resolve the #DB+MTF issue.
>
> Anyways, I'll send a patch.  Worst case scenario it fails miserably and we
> go with this patch :-)
>
> > > > Isn't it generally true that if you have an exception queued when you
> > > > transition from L2 to L1, then you've done something wrong? I wonder
> > > > if the call to kvm_clear_exception_queue() in prepare_vmcs12() just
> > > > serves to sweep a whole collection of problems under the rug.
> > >
> > > More than likely, yes.
> > >
> > > > > In general, interception of an event doesn't change the priority of events,
> > > > > e.g. INTR shouldn't get priority over NMI just because if L1 wants to
> > > > > intercept INTR but not NMI.
> > > >
> > > > Yes, but that's a different problem altogether.
> > >
> > > But isn't the fix the same?  Stop processing events if a higher priority
> > > event is pending, regardless of whether the event exits to L1.
> >
> > That depends on how you see the scope of the problem. One could argue
> > that the fix for everything that is wrong with KVM is actually the
> > same: properly emulate the physical CPU.
>
> Heh, there is that.
>
> What I'm arguing is that we shouldn't throw in a workaround knowing that
> it's papering over the underlying issue.  Preserving event priority
> irrespective of VM-Exit behavior is different, in that while it may not
> resolve all issues that are being masked by kvm_clear_exception_queue(),
> the change itself is correct when viewed in a vacuum.

The more I look at that call to kvm_clear_exception_queue(), the more
convinced I am that it's wrong. The comment above it says:

/*
* Drop what we picked up for L2 via vmx_complete_interrupts. It is
* preserved above and would only end up incorrectly in L1.
*/

The first sentence is just wrong. Vmx_complete_interrupts may not be
where the NMI/exception/interrupt came from. And the second sentence
is not entirely true. Only *injected* events are "preserved above" (by
the call to vmcs12_save_pending_event). However,
kvm_clear_exception_queue zaps both injected events and pending
events. Moreover, vmcs12_save_pending_event "preserves" the event by
stashing it in the IDT-vectoring info field of vmcs12, even when the
current VM-exit (from L2 to L1) did not (and in some cases cannot)
occur during event delivery (e.g. VMX-preemption timer expired).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-21 18:28                 ` Jim Mattson
@ 2020-04-22  0:16                   ` Sean Christopherson
  0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22  0:16 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Tue, Apr 21, 2020 at 11:28:23AM -0700, Jim Mattson wrote:
> The more I look at that call to kvm_clear_exception_queue(), the more
> convinced I am that it's wrong. The comment above it says:
> 
> /*
> * Drop what we picked up for L2 via vmx_complete_interrupts. It is
> * preserved above and would only end up incorrectly in L1.
> */
> 
> The first sentence is just wrong. Vmx_complete_interrupts may not be
> where the NMI/exception/interrupt came from. And the second sentence
> is not entirely true. Only *injected* events are "preserved above" (by
> the call to vmcs12_save_pending_event). However,
> kvm_clear_exception_queue zaps both injected events and pending
> events. Moreover, vmcs12_save_pending_event "preserves" the event by
> stashing it in the IDT-vectoring info field of vmcs12, even when the
> current VM-exit (from L2 to L1) did not (and in some cases cannot)
> occur during event delivery (e.g. VMX-preemption timer expired).

The comments are definitely wrong, or perhaps incomplete, but the behavior
isn't "wrong, assuming the rest of the nested event handling is correct
(hint, it's not).  Even with imperfect event handling, clearing
queued/pending exceptions on nested exit isn't terrible behavior as VMX
doesn't have a notion of queued/pending excpetions (ignoring #DBs for now).
If the VM-Exit occured while vectoring an event, that event is captured in
IDT_VECTORING_INFO.  Everything else that might have happened is an
alternate timeline.

The piece that's likely missing is updating GUEST_PENDING_DBG_EXCEPTIONS if
a single-step #DB was pending after emulation.  KVM currently propagates
the field as-is from vmcs02, which would miss any emulated #DB.  But, that
is effectively orthogonal to kvm_clear_exception_queue(), e.g. KVM needs to
"save" the pending #DB like it "saves" an injected event, but clearing the
#DB from the queue is correct.

The main issue I have with the nested event code is that the checks are
effectively grouped by exiting versus non-exiting instead of being grouped
by priority.  That makes it extremely difficult to correctly prioritize
events because each half needs to know the other's behavior, even though
the code is "separated"  E.g. my suggestion to return immediatel if an NMI
is pending and won't VM-Exit is wrong, as the behavior should also be
conditioned on NMIs being unmasked in L2.  Actually implementing that check
is a gigantic mess in the current code base and simply isn't worth the
marginal benefit.

Unwinding the mess, i.e. processing each event class exactly once per run
loop, is extremely difficult because there one-off cases that have been
piled on top, e.g. calling check_nested_events() from kvm_vcpu_running()
and processing INIT and SIPI in kvm_apic_accept_events(), whose full impact
is impossible to ascertain simply by reading the code.  The KVM_REQ_EVENT
optimization also exacerbates the problem, i.e. the event checking really
should be done unconditionally on every loop.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-14  0:09 ` [PATCH 2/2] kvm: nVMX: Single-step traps " Jim Mattson
  2020-04-14  3:17   ` Sean Christopherson
@ 2020-04-22  8:30   ` Paolo Bonzini
  2020-04-22 15:48     ` Sean Christopherson
  2020-04-22 16:28     ` Jim Mattson
  1 sibling, 2 replies; 21+ messages in thread
From: Paolo Bonzini @ 2020-04-22  8:30 UTC (permalink / raw)
  To: Jim Mattson, kvm; +Cc: Oliver Upton, Peter Shier

On 14/04/20 02:09, Jim Mattson wrote:
> Previously, if the hrtimer for the nested VMX-preemption timer fired
> while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> synthesized single-step trap would be unceremoniously dropped when
> synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> 
> To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> from L2 to L1 when there is a pending debug trap, such as a
> single-step trap.

Do you have a testcase for these bugs?

Paolo


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-22  8:30   ` Paolo Bonzini
@ 2020-04-22 15:48     ` Sean Christopherson
  2020-04-22 16:28     ` Jim Mattson
  1 sibling, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22 15:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jim Mattson, kvm, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 10:30:28AM +0200, Paolo Bonzini wrote:
> On 14/04/20 02:09, Jim Mattson wrote:
> > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > synthesized single-step trap would be unceremoniously dropped when
> > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > 
> > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > from L2 to L1 when there is a pending debug trap, such as a
> > single-step trap.
> 
> Do you have a testcase for these bugs?

Just in case you're feeling trigger happy, I'm working on a set of patches
to fix this in a more generic fashion.  Well, fixing this specific issue
can be done in a single patch, but NMIs and interrupts technically suffer
from the same bug and fixing those requires a bit of extra elbow grease.

There are also (theoretical) bugs related to nested exceptions and
interrupt injection that I'm trying to address.  Unfortunately I don't have
testcases for any of this :-(.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-22  8:30   ` Paolo Bonzini
  2020-04-22 15:48     ` Sean Christopherson
@ 2020-04-22 16:28     ` Jim Mattson
  2020-04-22 16:42       ` Sean Christopherson
  1 sibling, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-22 16:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm list, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 1:30 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 14/04/20 02:09, Jim Mattson wrote:
> > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > synthesized single-step trap would be unceremoniously dropped when
> > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> >
> > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > from L2 to L1 when there is a pending debug trap, such as a
> > single-step trap.
>
> Do you have a testcase for these bugs?

Indeed. They should be just prior to this in your inbox.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] kvm: nVMX: Single-step traps trump expired VMX-preemption timer
  2020-04-22 16:28     ` Jim Mattson
@ 2020-04-22 16:42       ` Sean Christopherson
  0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22 16:42 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Paolo Bonzini, kvm list, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 09:28:13AM -0700, Jim Mattson wrote:
> On Wed, Apr 22, 2020 at 1:30 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 14/04/20 02:09, Jim Mattson wrote:
> > > Previously, if the hrtimer for the nested VMX-preemption timer fired
> > > while L0 was emulating an L2 instruction with RFLAGS.TF set, the
> > > synthesized single-step trap would be unceremoniously dropped when
> > > synthesizing the "VMX-preemption timer expired" VM-exit from L2 to L1.
> > >
> > > To fix this, don't synthesize a "VMX-preemption timer expired" VM-exit
> > > from L2 to L1 when there is a pending debug trap, such as a
> > > single-step trap.
> >
> > Do you have a testcase for these bugs?
> 
> Indeed. They should be just prior to this in your inbox.

Ah, I missed those too and apparently didn't think to search for preemption
timer tests.  Thanks for the refresher!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer
  2020-04-14  0:09 [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer Jim Mattson
  2020-04-14  0:09 ` [PATCH 2/2] kvm: nVMX: Single-step traps " Jim Mattson
@ 2020-04-22 21:06 ` Sean Christopherson
  2020-04-22 21:23   ` Sean Christopherson
  2020-04-22 21:27   ` Jim Mattson
  1 sibling, 2 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22 21:06 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Oliver Upton, Peter Shier

On Mon, Apr 13, 2020 at 05:09:45PM -0700, Jim Mattson wrote:
> Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Oliver Upton <oupton@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 83050977490c..aae01253bfba 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4682,7 +4682,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  			if (is_icebp(intr_info))
>  				WARN_ON(!skip_emulated_instruction(vcpu));
>  
> -			kvm_queue_exception(vcpu, DB_VECTOR);
> +			kvm_requeue_exception(vcpu, DB_VECTOR);

This isn't wrong per se, but it's effectively papering over an underlying
bug, e.g. the same missed preemption timer bug can manifest if the timer
expires while in KVM context (because the hr timer is left running) and KVM
queues an exception for _any_ reason.  Most of the scenarios where L0 will
queue an exception for L2 are fairly contrived, but they are possible.

I believe the correct fix is to open a "preemption timer window" like we do
for pending SMI, NMI and IRQ.  It's effectively the same handling a pending
SMI on VMX, set req_immediate_exit in the !inject_pending_event() path.

Patches incoming soon-ish, think I've finally got my head wrapped around all
the combinations, though I also thought that was true several hours ago...

>  			return 1;
>  		}
>  		kvm_run->debug.arch.dr6 = dr6 | DR6_FIXED_1;
> @@ -4703,7 +4703,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>  		break;
>  	case AC_VECTOR:
>  		if (guest_inject_ac(vcpu)) {
> -			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +			kvm_requeue_exception_e(vcpu, AC_VECTOR, error_code);
>  			return 1;
>  		}
>  
> -- 
> 2.26.0.110.g2183baf09c-goog
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer
  2020-04-22 21:06 ` [PATCH 1/2] kvm: nVMX: Pending debug exceptions " Sean Christopherson
@ 2020-04-22 21:23   ` Sean Christopherson
  2020-04-22 21:27   ` Jim Mattson
  1 sibling, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22 21:23 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 02:06:49PM -0700, Sean Christopherson wrote:
> On Mon, Apr 13, 2020 at 05:09:45PM -0700, Jim Mattson wrote:
> > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
> 
> ...
> 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 83050977490c..aae01253bfba 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4682,7 +4682,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >  			if (is_icebp(intr_info))
> >  				WARN_ON(!skip_emulated_instruction(vcpu));
> >  
> > -			kvm_queue_exception(vcpu, DB_VECTOR);
> > +			kvm_requeue_exception(vcpu, DB_VECTOR);
> 
> This isn't wrong per se, but it's effectively papering over an underlying
> bug, e.g. the same missed preemption timer bug can manifest if the timer
> expires while in KVM context (because the hr timer is left running) and KVM
> queues an exception for _any_ reason.

I just reread your changelog and realized this patch was intended to fix a
different symptom than what I observed, i.e. the above probably doesn't
make a whole lot of sense.  I just so happened that this change also
resolved my "missing timer" bug because directly injecting the #DB would
cause vmx_check_nested_events() to return -EBUSY on the preemption timer.

That being said, I'm 99% certain that the behavior you observed is fixed by
correctly handling priority of non-exiting events vs. exiting events, i.e.
slightly different justification, same net result.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer
  2020-04-22 21:06 ` [PATCH 1/2] kvm: nVMX: Pending debug exceptions " Sean Christopherson
  2020-04-22 21:23   ` Sean Christopherson
@ 2020-04-22 21:27   ` Jim Mattson
  2020-04-22 22:06     ` Sean Christopherson
  1 sibling, 1 reply; 21+ messages in thread
From: Jim Mattson @ 2020-04-22 21:27 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm list, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 2:06 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Mon, Apr 13, 2020 at 05:09:45PM -0700, Jim Mattson wrote:
> > Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Reviewed-by: Oliver Upton <oupton@google.com>
> > Reviewed-by: Peter Shier <pshier@google.com>
>
> ...
>
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 83050977490c..aae01253bfba 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -4682,7 +4682,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> >                       if (is_icebp(intr_info))
> >                               WARN_ON(!skip_emulated_instruction(vcpu));
> >
> > -                     kvm_queue_exception(vcpu, DB_VECTOR);
> > +                     kvm_requeue_exception(vcpu, DB_VECTOR);
>
> This isn't wrong per se, but it's effectively papering over an underlying
> bug, e.g. the same missed preemption timer bug can manifest if the timer
> expires while in KVM context (because the hr timer is left running) and KVM
> queues an exception for _any_ reason.  Most of the scenarios where L0 will
> queue an exception for L2 are fairly contrived, but they are possible.

I'm now thinking that this is actually wrong, since requeued
exceptions may make their way into the vmcs12 IDT-vectoring info, and
I believe that ultimately the vmcs12 IDT-vectoring info should be
populated only from the vmcs02 IDT-vectoring info (by whatever
circuitous route KVM likes).

> I believe the correct fix is to open a "preemption timer window" like we do
> for pending SMI, NMI and IRQ.  It's effectively the same handling a pending
> SMI on VMX, set req_immediate_exit in the !inject_pending_event() path.

The KVM code that deals with all of these events is really hard to
follow. I wish we could take a step back and just implement Table 6-2
from the SDM volume 3 (augmented with the scattered information about
VMX events and their priorities relative to their nearest neighbors.
Lumping priorities 7 - 10 together (faults that we either intercepted
or synthesized in emulation), I think these are the various things we
need to check, in this order...

0. Is there a fault to be delivered? (In L2, is it intercepted by L1?)
1. Is there a RESET or machine check event?
2. Is there a trap on task switch?
3. Is there an SMI or an INIT?
3.5 In L2, is there an MTF VM-exit?
4. Is there a #DB trap on the previous instruction? (In L2, is it
intercepted by L1?)
4.3 In L2, has the VMX-preemption timer expired?
4.6 In L2, do we need to synthesize an NMI-window VM-exit?
5. Is there an NMI? (In L2, is it intercepted by L1?)
5.3 In L2 do we need to synthesize an interrupt-window VM-exit?
5.6 In L2, do we need to virtualize virtual-interrupt delivery?
6. Is there a maskable interrupt? (In L2, is it intercepted by L1?)
7. Now, we can enter VMX non-root mode.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer
  2020-04-22 21:27   ` Jim Mattson
@ 2020-04-22 22:06     ` Sean Christopherson
  0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2020-04-22 22:06 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm list, Oliver Upton, Peter Shier

On Wed, Apr 22, 2020 at 02:27:33PM -0700, Jim Mattson wrote:
> On Wed, Apr 22, 2020 at 2:06 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> The KVM code that deals with all of these events is really hard to
> follow. I wish we could take a step back and just implement Table 6-2
> from the SDM volume 3 (augmented with the scattered information about
> VMX events and their priorities relative to their nearest neighbors.
> Lumping priorities 7 - 10 together (faults that we either intercepted
> or synthesized in emulation), I think these are the various things we
> need to check, in this order...
> 
> 0. Is there a fault to be delivered? (In L2, is it intercepted by L1?)
> 1. Is there a RESET or machine check event?
> 2. Is there a trap on task switch?
> 3. Is there an SMI or an INIT?
> 3.5 In L2, is there an MTF VM-exit?
> 4. Is there a #DB trap on the previous instruction? (In L2, is it
> intercepted by L1?)
> 4.3 In L2, has the VMX-preemption timer expired?
> 4.6 In L2, do we need to synthesize an NMI-window VM-exit?
> 5. Is there an NMI? (In L2, is it intercepted by L1?)
> 5.3 In L2 do we need to synthesize an interrupt-window VM-exit?
> 5.6 In L2, do we need to virtualize virtual-interrupt delivery?
> 6. Is there a maskable interrupt? (In L2, is it intercepted by L1?)
> 7. Now, we can enter VMX non-root mode.

100% agreed.  I even tried to go down that path, multiple times, while
sorting this stuff out.  The big problem that isn't easily resolved is
kvm_vcpu_running(), which currently calls .check_nested_events()
even if KVM_REQ_EVENT isn't set.  Its existence makes it annoyingly
difficult to provide a unified single-pass flow for exiting and
non-exiting events, e.g. we'd either have to duplicate a big pile of
logic (eww) or significantly rework the event handling (scary).

Having the INIT and SIPI handling buried in kvm_apic_accept_events() is
also a pain, but that's less scary to change.

In the long term, I absolutely think it'd be worth revamping the event
handling so that it's not scattered all over tarnation, but that's
something that should probably have a full kernel cycle or two of
testing and performance analysis.

If someone does pick up that torch, I think it'd also be worth experimenting
with removing KVM_REQ_EVENT, i.e. processing events on _every_ run.  IMO
that would simplify the code, or at least how one reasons about the code, a
great deal.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-04-22 22:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14  0:09 [PATCH 1/2] kvm: nVMX: Pending debug exceptions trump expired VMX-preemption timer Jim Mattson
2020-04-14  0:09 ` [PATCH 2/2] kvm: nVMX: Single-step traps " Jim Mattson
2020-04-14  3:17   ` Sean Christopherson
2020-04-14 16:47     ` Jim Mattson
2020-04-15  0:12       ` Sean Christopherson
2020-04-15  0:20         ` Sean Christopherson
2020-04-15  0:22           ` Sean Christopherson
2020-04-15 23:33         ` Jim Mattson
2020-04-18  4:21           ` Sean Christopherson
2020-04-20 17:18             ` Jim Mattson
2020-04-21  4:41               ` Sean Christopherson
2020-04-21 18:28                 ` Jim Mattson
2020-04-22  0:16                   ` Sean Christopherson
2020-04-22  8:30   ` Paolo Bonzini
2020-04-22 15:48     ` Sean Christopherson
2020-04-22 16:28     ` Jim Mattson
2020-04-22 16:42       ` Sean Christopherson
2020-04-22 21:06 ` [PATCH 1/2] kvm: nVMX: Pending debug exceptions " Sean Christopherson
2020-04-22 21:23   ` Sean Christopherson
2020-04-22 21:27   ` Jim Mattson
2020-04-22 22:06     ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.