All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Further hack request_interrupt_window handling to work around kvm_cpu_has_interrupt() nesting breakage
@ 2020-11-12 13:03 David Woodhouse
  2020-11-25 15:10 ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2020-11-12 13:03 UTC (permalink / raw)
  To: kvm; +Cc: Sironi, Filippo, Raslan, KarimAllah

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

In kvm_cpu_has_interrupt() we see the following FIXME:

	/*
	 * FIXME: interrupt.injected represents an interrupt that it's
	 * side-effects have already been applied (e.g. bit from IRR
	 * already moved to ISR). Therefore, it is incorrect to rely
	 * on interrupt.injected to know if there is a pending
	 * interrupt in the user-mode LAPIC.
	 * This leads to nVMX/nSVM not be able to distinguish
	 * if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on
	 * pending interrupt or should re-inject an injected
	 * interrupt.
	 */

I'm using nested VMX for testing, while I add split-irqchip support to
my VMM. I see the vCPU lock up when attempting to deliver an interrupt.

What seems to happen is that request_interrupt_window is set, causing
an immediate vmexit because an IRQ *can* be delivered. But then
kvm_vcpu_ready_for_interrupt_injection() returns false, because
kvm_cpu_has_interrupt() is true.

Because that returns false, the kernel just continues looping in
vcpu_run(), constantly vmexiting and going right back in.

This utterly naïve hack makes my L2 guest boot properly, by not
enabling the irq window when we were going to ignore the exit anyway.
Is there a better fix?

I must also confess I'm working on a slightly older kernel in L1, and
have forward-ported to a more recent tree without actually testing
because from inspection it looks like exactly the same issue still
exists.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 397f599b20e5..e23f0c8b4a16 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8830,7 +8830,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 
 		inject_pending_event(vcpu, &req_immediate_exit);
-		if (req_int_win)
+		/* Don't enable the interrupt window for userspace if
+		 * kvm_cpu_has_interrupt() is set and we'd never actually
+		 * exit with ready_for_interrupt_window set anyway. */
+		if (req_int_win && !kvm_cpu_has_interrupt(vcpu)
 			kvm_x86_ops.enable_irq_window(vcpu);
 
 		if (kvm_lapic_enabled(vcpu)) {

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-12 13:03 [RFC] Further hack request_interrupt_window handling to work around kvm_cpu_has_interrupt() nesting breakage David Woodhouse
@ 2020-11-25 15:10 ` David Woodhouse
  2020-11-25 21:19   ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2020-11-25 15:10 UTC (permalink / raw)
  To: kvm
  Cc: Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

On Thu, 2020-11-12 at 13:03 +0000, David Woodhouse wrote:
> I'm using nested VMX for testing, while I add split-irqchip support to
> my VMM. I see the vCPU lock up when attempting to deliver an interrupt.

Turns out I don't need nesting or my own VMM to reproduce this; all I
need to do is boot a guest in qemu with split-irqchip and 'noapic' on
the guest command line. It locks up before getting to a login prompt,
every time.

qemu-system-x86_64 -serial mon:stdio -machine q35,accel=kvm,kernel-irqchip=split -m 2G -display none -drive file=foo.qcow2,if=virtio

Commit 782d422bc ("KVM: x86: split kvm_vcpu_ready_for_interrupt_injection
out of dm_request_for_irq_injection") made dm_request_for_irq_injection()
return true even when kvm_cpu_has_interrupt() is true.

So we enable the vmexit on interrupt window because userspace asked for
it, but then kvm_vcpu_ready_for_interrupt_injection() returns false,
causing us *not* to exit all the way to userspace but just to loop in
vcpu_run() instead.

But we *didn't* have an injectable interrupt from the kernel, so we
just go straight back into the guest, vmexit again, loop again, ad
infinitum.

This appears to fix it:

--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4028,7 +4028,7 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
 static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 {
        return kvm_arch_interrupt_allowed(vcpu) &&
-               !kvm_cpu_has_interrupt(vcpu) &&
+               !kvm_cpu_has_injectable_intr(vcpu) &&
                !kvm_event_needs_reinjection(vcpu) &&
                kvm_cpu_accept_dm_intr(vcpu);
 }


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-25 15:10 ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
@ 2020-11-25 21:19   ` Sean Christopherson
  2020-11-26 11:10     ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2020-11-25 21:19 UTC (permalink / raw)
  To: David Woodhouse
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

On Wed, Nov 25, 2020, David Woodhouse wrote:
> On Thu, 2020-11-12 at 13:03 +0000, David Woodhouse wrote:
> > I'm using nested VMX for testing, while I add split-irqchip support to
> > my VMM. I see the vCPU lock up when attempting to deliver an interrupt.
> 
> Turns out I don't need nesting or my own VMM to reproduce this; all I
> need to do is boot a guest in qemu with split-irqchip and 'noapic' on
> the guest command line. It locks up before getting to a login prompt,
> every time.
> 
> qemu-system-x86_64 -serial mon:stdio -machine q35,accel=kvm,kernel-irqchip=split -m 2G -display none -drive file=foo.qcow2,if=virtio
> 
> Commit 782d422bc ("KVM: x86: split kvm_vcpu_ready_for_interrupt_injection
> out of dm_request_for_irq_injection") made dm_request_for_irq_injection()
> return true even when kvm_cpu_has_interrupt() is true.
> 
> So we enable the vmexit on interrupt window because userspace asked for
> it, but then kvm_vcpu_ready_for_interrupt_injection() returns false,
> causing us *not* to exit all the way to userspace but just to loop in
> vcpu_run() instead.
> 
> But we *didn't* have an injectable interrupt from the kernel, so we
> just go straight back into the guest, vmexit again, loop again, ad
> infinitum.
> 
> This appears to fix it:
> 
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4028,7 +4028,7 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
>  static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
>  {
>         return kvm_arch_interrupt_allowed(vcpu) &&
> -               !kvm_cpu_has_interrupt(vcpu) &&
> +               !kvm_cpu_has_injectable_intr(vcpu) &&

Interrupt window exiting explicitly has priority over virtual interrupt delivery,
so this is correct from that perspective.

But I wonder if would make sense to be more precise so that KVM behavior is
consistent for APICv and legacy IRQ injection.  If the LAPIC is in-kernel,
KVM_INTERRUPT can only be used for EXTINT; whether or not there's an IRQ in the
LAPIC should be irrelevant when deciding to exit to userspace.  Note, the
reinjection check covers vcpu->arch.interrupt.injected for the case where LAPIC
is in userspace.

	return kvm_arch_interrupt_allowed(vcpu) &&
	       (!lapic_in_kernel(vcpu) || !kvm_cpu_has_extint(vcpu)) &&
	       !kvm_event_needs_reinjection(vcpu) &&
	       kvm_cpu_accept_dm_intr(vcpu);
}

>                 !kvm_event_needs_reinjection(vcpu) &&
>                 kvm_cpu_accept_dm_intr(vcpu);
>  }
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-25 21:19   ` Sean Christopherson
@ 2020-11-26 11:10     ` David Woodhouse
  2020-11-26 12:05       ` [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv David Woodhouse
  2020-11-26 17:29       ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
  0 siblings, 2 replies; 11+ messages in thread
From: David Woodhouse @ 2020-11-26 11:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 3999 bytes --]

On Wed, 2020-11-25 at 21:19 +0000, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -4028,7 +4028,7 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
> >   static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
> >   {
> >          return kvm_arch_interrupt_allowed(vcpu) &&
> > -               !kvm_cpu_has_interrupt(vcpu) &&
> > +               !kvm_cpu_has_injectable_intr(vcpu) &&
> 
> Interrupt window exiting explicitly has priority over virtual interrupt delivery,
> so this is correct from that perspective.
> 
> But I wonder if would make sense to be more precise so that KVM behavior is
> consistent for APICv and legacy IRQ injection.  If the LAPIC is in-kernel,
> KVM_INTERRUPT can only be used for EXTINT; 

I think it should be used for injecting Xen event channel vectors too,
shouldn't it? Those have their own EOI/mask mechanism and shouldn't be
injected through the LAPIC (for example by simulating MSI to the
appropriate APIC+vector) because the guest won't EOI them there.

Although to all extents and purposes that basically means we *are*
treating them like ExtINT.

Not that it makes a difference to the outcome right now, although there
was once a patch floating around to allow KVM_INTERRUPT even when the
PIC was in-kernel, precisely for that reason.

> whether or not there's an IRQ in the
> LAPIC should be irrelevant when deciding to exit to userspace.  Note, the
> reinjection check covers vcpu->arch.interrupt.injected for the case where LAPIC
> is in userspace.
> 
>         return kvm_arch_interrupt_allowed(vcpu) &&
>                (!lapic_in_kernel(vcpu) || !kvm_cpu_has_extint(vcpu)) &&
>                !kvm_event_needs_reinjection(vcpu) &&
>                kvm_cpu_accept_dm_intr(vcpu);
> }

Makes sense. I'm putting this version through some testing and will
post it later...


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..e85e2f1c661c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1656,6 +1656,7 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end,
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 99d118ffc67d..9c4ef1682b81 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -40,7 +40,7 @@ static int pending_userspace_extint(struct kvm_vcpu *v)
  * check if there is pending interrupt from
  * non-APIC source without intack.
  */
-static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+int kvm_cpu_has_extint(struct kvm_vcpu *v)
 {
 	u8 accept = kvm_apic_accept_pic_intr(v);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3fdc16cfd6f..b50ae8ba66e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4080,12 +4080,14 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
  * if userspace requested an interrupt window, check that the
  * interrupt window is open.
  *
- * No need to exit to userspace if we already have an interrupt queued.
+ * If there is already an event which needs reinjection or a
+ * pending ExtINT, allow that to be processed by the kernel
+ * before letting userspace have the opportunity.
  */
 static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 {
 	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
+		!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu)) &&
 		!kvm_event_needs_reinjection(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu);
 }

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv
  2020-11-26 11:10     ` David Woodhouse
@ 2020-11-26 12:05       ` David Woodhouse
  2020-11-26 18:00         ` Paolo Bonzini
  2020-11-26 17:29       ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
  1 sibling, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2020-11-26 12:05 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 3934 bytes --]

From: David Woodhouse <dwmw@amazon.co.uk>

Booting a guest with 'noapic' with split irqchip and APICv enabled leads
to a livelock in vcpu_run().

When the userspace PIC has an IRQ to deliver, the VMM sets
kvm_run->request_interrupt_window and vcpu_enter_guest() duly enables the
corresponding VMexit, which happens immediately.

However, if an interrupt also exists in the local APIC, that causes
kvm_cpu_has_interrupt() to be true and thus causes
kvm_vcpu_ready_for_interrupt_injection() to return false; the intent
being that the kernel will use this interrupt window to inject its own
interrupt from the LAPIC. So vcpu_run() doesn't exit all the way to
userspace, and just loops.

However, when APICv is in use there is no interrupt to be injected since
that happens automatically. So the guest vCPU is launched again, exits
again immediately, and the loop repeats ad infinitum without making any
progress.

It looks like this was introduced in commit 782d422bcaee, when
dm_request_for_irq_injection() started returning true based purely
on the fact that userspace had requested the interrupt window, without
heed to kvm_cpu_has_interrupt() also being true.

Fix it by allowing userspace to use the interrupt window with priority
over the interrupt that is already in the LAPIC, for both APICv and
legacy injection alike. Instead of '!kvm_cpu_has_interrupt()', the
condition becomes '!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu))'

Fixes: 782d422bcaee ("KVM: x86: split kvm_vcpu_ready_for_interrupt_injection out of dm_request_for_irq_injection")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/irq.c              | 2 +-
 arch/x86/kvm/x86.c              | 6 ++++--
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..e85e2f1c661c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1656,6 +1656,7 @@ int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start, unsigned long end,
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 99d118ffc67d..9c4ef1682b81 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -40,7 +40,7 @@ static int pending_userspace_extint(struct kvm_vcpu *v)
  * check if there is pending interrupt from
  * non-APIC source without intack.
  */
-static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+int kvm_cpu_has_extint(struct kvm_vcpu *v)
 {
 	u8 accept = kvm_apic_accept_pic_intr(v);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3fdc16cfd6f..b50ae8ba66e9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4080,12 +4080,14 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
  * if userspace requested an interrupt window, check that the
  * interrupt window is open.
  *
- * No need to exit to userspace if we already have an interrupt queued.
+ * If there is already an event which needs reinjection or a
+ * pending ExtINT, allow that to be processed by the kernel
+ * before letting userspace have the opportunity.
  */
 static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 {
 	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
+		!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu)) &&
 		!kvm_event_needs_reinjection(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu);
 }
-- 
2.17.1


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-26 11:10     ` David Woodhouse
  2020-11-26 12:05       ` [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv David Woodhouse
@ 2020-11-26 17:29       ` David Woodhouse
  2020-11-26 17:59         ` Paolo Bonzini
  1 sibling, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2020-11-26 17:29 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On Thu, 2020-11-26 at 11:10 +0000, David Woodhouse wrote:
> 
> > whether or not there's an IRQ in the
> > LAPIC should be irrelevant when deciding to exit to userspace.  Note, the
> > reinjection check covers vcpu->arch.interrupt.injected for the case where LAPIC
> > is in userspace.
> > 
> >         return kvm_arch_interrupt_allowed(vcpu) &&
> >                (!lapic_in_kernel(vcpu) || !kvm_cpu_has_extint(vcpu)) &&
> >                !kvm_event_needs_reinjection(vcpu) &&
> >                kvm_cpu_accept_dm_intr(vcpu);
> > }
> 
> Makes sense. I'm putting this version through some testing and will
> post it later...

Hm, that survived enough test iterations to persuade me to post it, but
then seems to have fallen over later. I'm reverting to the
kvm_cpu_has_injectable_intr() version to leave that one running too and
be sure it's gone in that.

Without either patch it's 100% repeatable, and will happen as soon as
the 'noapic' guest enabled lapic timer interrupts. I'm not sure I see
why your version would simply make it less frequent...


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-26 17:29       ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
@ 2020-11-26 17:59         ` Paolo Bonzini
  2020-11-26 21:48           ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2020-11-26 17:59 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

On 26/11/20 18:29, David Woodhouse wrote:
> On Thu, 2020-11-26 at 11:10 +0000, David Woodhouse wrote:
>>
>>> whether or not there's an IRQ in the
>>> LAPIC should be irrelevant when deciding to exit to userspace.  Note, the
>>> reinjection check covers vcpu->arch.interrupt.injected for the case where LAPIC
>>> is in userspace.
>>>
>>>          return kvm_arch_interrupt_allowed(vcpu) &&
>>>                 (!lapic_in_kernel(vcpu) || !kvm_cpu_has_extint(vcpu)) &&
>>>                 !kvm_event_needs_reinjection(vcpu) &&
>>>                 kvm_cpu_accept_dm_intr(vcpu);
>>> }
>>
>> Makes sense. I'm putting this version through some testing and will
>> post it later...
> 
> Hm, that survived enough test iterations to persuade me to post it, but
> then seems to have fallen over later. I'm reverting to the
> kvm_cpu_has_injectable_intr() version to leave that one running too and
> be sure it's gone in that.

!kvm_cpu_has_injectable_intr(vcpu) boils down (assuming no nested virt) to

         if (!lapic_in_kernel(v))
                 return !v->arch.interrupt.injected;

         if (kvm_cpu_has_extint(v))
                 return 0;

         return 1;

and Sean's proposal instead is the same indeed (the first "if" doesn't 
matter), so there may be more than one bug.

But it turns out that with some more inlining and Boolean algebra, we 
can actually figure out what the code does. :)  I had just finished 
writing a looong review of your patch starting from that idea, so I'll 
post it.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv
  2020-11-26 12:05       ` [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv David Woodhouse
@ 2020-11-26 18:00         ` Paolo Bonzini
  2020-11-26 19:07           ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2020-11-26 18:00 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

On 26/11/20 13:05, David Woodhouse wrote:
> |It looks like this was introduced in commit 782d422bcaee, when 
> dm_request_for_irq_injection() started returning true based purely on 
> the fact that userspace had requested the interrupt window, without heed 
> to kvm_cpu_has_interrupt() also being true. |

That patch had no semantic change, because 
dm_request_for_irq_injection() was split in two and the problematic bit 
was only split to kvm_vcpu_ready_for_interrupt_injection().

Even pre-patch there was a

	if (kvm_cpu_has_interrupt(vcpu))
		return false;

in dm_request_for_irq_injection() which your patch would have changed to

	if (lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu))
		return false;

Your patch certainly works, but _what_ does

		!(lapic_in_kernel(vcpu) && kvm_cpu_has_extint(vcpu)) &&
  		kvm_cpu_accept_dm_intr(vcpu)

mean in terms of the vcpu's state?  I have no idea, in fact at this 
point I barely have an idea of what 
kvm_vcpu_ready_for_interrupt_injection does.  Let's figure it out.


First act
~~~~~~~~~

First of all let's take a step back from your patch.  Let's just look at 
kvm_cpu_has_interrupt(vcpu) and trivially remove the APIC case from 
kvm_cpu_has_interrupt:

+static bool xxx(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+	if (!lapic_in_kernel(vcpu))
+		return vcpu->arch.interrupt.injected;
+	else
+		return kvm_cpu_has_extint(vcpu);
+}

  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
+		!xxx(vcpu) &&
  		kvm_cpu_accept_dm_intr(vcpu);

Again, no idea does "xxx" do, much less its combination with 
kvm_cpu_accept_dm_intr.  We need to dive further down.


Second act
~~~~~~~~~~

kvm_cpu_accept_dm_intr can be rewritten like this:

         if (!lapic_in_kernel(vcpu))
		return true;
	else
                 return kvm_apic_accept_pic_intr(vcpu));

Therefore, we can commonize the "if"s in our xxx function with those 
from kvm_cpu_accept_dm_intr.  Remembering that the first act used the 
negation of xxx, the patch now takes this shape

+static int yyy(struct kvm_vcpu *vcpu)
+{
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+	if (!lapic_in_kernel(vcpu))
+		return !vcpu->arch.interrupt.injected;
+	else
+		return (!kvm_cpu_has_extint(vcpu) &&
+			kvm_apic_accept_pic_intr(vcpu));
+}

  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
- 		kvm_cpu_accept_dm_intr(vcpu);
+		yyy(vcpu);

This doesn't seem like progress, but we're not done...


Third act
~~~~~~~~~

Let's look at the arms of yyy's "if" statement one by one.

If !lapic_in_kernel, the return statement will always be true because 
the function is called under !kvm_event_needs_reinjection(vcpu).  So 
we're already at

static int yyy(struct kvm_vcpu *vcpu)
{
	WARN_ON(pic_in_kernel(vcpu->kvm));
	if (!lapic_in_kernel(vcpu))
		return true;
	
	return (!kvm_cpu_has_extint(vcpu) &&
		kvm_apic_accept_pic_intr(vcpu));
}

As to the "else" branch, irqchip_split is true so 
kvm_cpu_has_extint(vcpu) is "kvm_apic_accept_pic_intr(v) && 
pending_userspace_extint(v)".  More simplifications ahead!

	!(A && B) && A
     =>  (!A || !B) && A
     =>  A && !B

that is:

static int yyy(struct kvm_vcpu *vcpu)
{
	WARN_ON(pic_in_kernel(vcpu->kvm));
	if (!lapic_in_kernel(vcpu))
		return true;
	
	return (kvm_apic_accept_pic_intr(vcpu) &&
		!pending_userspace_extint(vcpu));
}

which makes sense: focusing on ExtINT and ignoring event reinjection 
(which is handled by the caller), the vCPU is ready for interrupt 
injection if:

- there is no LAPIC (so ExtINT injection is in the hands of userspace), or

- PIC interrupts are being accepted, and userspace's last ExtINT isn't 
still pending.

Thus, the final patch is:

  static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
  {
+	WARN_ON(pic_in_kernel(vcpu->kvm));
+
  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
-		kvm_cpu_accept_dm_intr(vcpu);
+		(!lapic_in_kernel(vcpu)
+		 || (kvm_apic_accept_pic_intr(vcpu)
+		     && !pending_userspace_extint(v));
  }

I'm wondering if this one fails as well...

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv
  2020-11-26 18:00         ` Paolo Bonzini
@ 2020-11-26 19:07           ` David Woodhouse
  0 siblings, 0 replies; 11+ messages in thread
From: David Woodhouse @ 2020-11-26 19:07 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 1136 bytes --]

On Thu, 2020-11-26 at 19:00 +0100, Paolo Bonzini wrote:
>   static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu
> *vcpu)
>   {
> +       WARN_ON(pic_in_kernel(vcpu->kvm));
> +

But what if I *want* to inject Xen event channel interrupts while the
actual PIC is in the kernel? :)

Not that I'll have to once the kernel is fixed and I can enable my
shiny new userspace PIC/PIT/IOAPIC code, I suppose.... 

>         return kvm_arch_interrupt_allowed(vcpu) &&
> -               !kvm_cpu_has_interrupt(vcpu) &&
>                 !kvm_event_needs_reinjection(vcpu) &&
> -               kvm_cpu_accept_dm_intr(vcpu);
> +               (!lapic_in_kernel(vcpu)
> +                || (kvm_apic_accept_pic_intr(vcpu)
> +                    && !pending_userspace_extint(v));
>   }
> 

I'll give this version a spin...


static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
{
	return kvm_arch_interrupt_allowed(vcpu) &&
		!kvm_event_needs_reinjection(vcpu) &&
		(!lapic_in_kernel(vcpu)
		 || (kvm_apic_accept_pic_intr(vcpu)
		     && vcpu->arch.pending_external_vector == -1));
}

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-26 17:59         ` Paolo Bonzini
@ 2020-11-26 21:48           ` David Woodhouse
  2020-11-27  4:37             ` Paolo Bonzini
  0 siblings, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2020-11-26 21:48 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Borghorst, Hendrik
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

On Thu, 2020-11-26 at 18:59 +0100, Paolo Bonzini wrote:
> On 26/11/20 18:29, David Woodhouse wrote:
> > On Thu, 2020-11-26 at 11:10 +0000, David Woodhouse wrote:
> > > 
> > > > whether or not there's an IRQ in the
> > > > LAPIC should be irrelevant when deciding to exit to userspace.  Note, the
> > > > reinjection check covers vcpu->arch.interrupt.injected for the case where LAPIC
> > > > is in userspace.
> > > > 
> > > >           return kvm_arch_interrupt_allowed(vcpu) &&
> > > >                  (!lapic_in_kernel(vcpu) || !kvm_cpu_has_extint(vcpu)) &&
> > > >                  !kvm_event_needs_reinjection(vcpu) &&
> > > >                  kvm_cpu_accept_dm_intr(vcpu);
> > > > }
> > > 
> > > Makes sense. I'm putting this version through some testing and will
> > > post it later...
> > 
> > Hm, that survived enough test iterations to persuade me to post it, but
> > then seems to have fallen over later. I'm reverting to the
> > kvm_cpu_has_injectable_intr() version to leave that one running too and
> > be sure it's gone in that.

FWIW I've just reproduced that hang on one of the iterations *without*
"noapic" on its command line at all; this one was just with
'clearcpuid=450'. That's clearing the ARAT bit to force it to use the
HPET+MSI for timers.

The earlier one that had failed was 'noapic clearcpuid=450'. So that
one looks like a separate bug, and I get to go frown at our HPET
emulation instead. It probably wasn't a failure of the fix we're
looking at here.

I'm going to go and check if I can reproduce that even with the in-
kernel irqchip mode, and claim it's someone else's problem for now :)

> !kvm_cpu_has_injectable_intr(vcpu) boils down (assuming no nested virt) to
> 
>          if (!lapic_in_kernel(v))
>                  return !v->arch.interrupt.injected;
> 
>          if (kvm_cpu_has_extint(v))
>                  return 0;
> 
>          return 1;
> 
> and Sean's proposal instead is the same indeed (the first "if" doesn't 
> matter), so there may be more than one bug.
> 
> But it turns out that with some more inlining and Boolean algebra, we 
> can actually figure out what the code does. :)  I had just finished 
> writing a looong review of your patch starting from that idea, so I'll 
> post it.

Neat. Your version, once I made it build, ought to be functionally
identical to the one I posted; just a bit neater.

Although I do kind of like the symmetry of my original version using
kvm_cpu_has_injectable_intr(), which is the condition used in
vcpu_enter_guest() for enabling the interrupt window vmexit in the
first place. It makes sense for those to match.

We enable the irq window if kvm_cpu_has_injectable_intr() or if
userspace asks. And when the exit happens, we feed it to userspace
unless kvm_cpu_has_injectable_intr().

If we go with your simpler version, I wonder if it makes sense to make
similar changes to the conditions in vcpu_enter_guest() to make it
clearer that they match?





[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] Fix split-irqchip vs interrupt injection window request.
  2020-11-26 21:48           ` David Woodhouse
@ 2020-11-27  4:37             ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2020-11-27  4:37 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson, Borghorst, Hendrik
  Cc: kvm, Sironi, Filippo, Raslan, KarimAllah, Matt Gingell,
	Steve Rutherford, liran

On 26/11/20 22:48, David Woodhouse wrote:
> Although I do kind of like the symmetry of my original version using
> kvm_cpu_has_injectable_intr(), which is the condition used in
> vcpu_enter_guest() for enabling the interrupt window vmexit in the
> first place. It makes sense for those to match.

In inject_pending_event, actually.

However there's also an interrupt window request in vcpu_enter_guest():

         bool req_int_win =
                 dm_request_for_irq_injection(vcpu) &&
                 kvm_cpu_accept_dm_intr(vcpu);

and this one definitely should indeed stay in sync with 
kvm_vcpu_ready_for_interrupt_injection.  This gives an even neater 
version of the patch:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 447edc0d1d5a..a05a2be05552 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4052,7 +4052,8 @@ static int kvm_vcpu_ioctl_set_lapic(struct 
kvm_vcpu *vcpu,
  static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
  {
  	return (!lapic_in_kernel(vcpu) ||
-		kvm_apic_accept_pic_intr(vcpu));
+		(kvm_apic_accept_pic_intr(vcpu)
+		 && !pending_userspace_extint(vcpu));
  }

  /*
@@ -4064,7 +4065,6 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu 
*vcpu)
  static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
  {
  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
  		!kvm_event_needs_reinjection(vcpu) &&
  		kvm_cpu_accept_dm_intr(vcpu);
  }

or even better:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 447edc0d1d5a..adbb519eece4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4051,8 +4051,10 @@ static int kvm_vcpu_ioctl_set_lapic(struct 
kvm_vcpu *vcpu,

  static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
  {
-	return (!lapic_in_kernel(vcpu) ||
-		kvm_apic_accept_pic_intr(vcpu));
+	if (lapic_in_kernel(vcpu))
+		return !v->arch.interrupt.injected;
+
+	return !kvm_cpu_has_extint(vcpu);
  }

  /*
@@ -4064,8 +4066,6 @@ static int kvm_cpu_accept_dm_intr(struct kvm_vcpu 
*vcpu)
  static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
  {
  	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
-		!kvm_event_needs_reinjection(vcpu) &&
  		kvm_cpu_accept_dm_intr(vcpu);
  }


since the call to kvm_event_needs_reinjection(vcpu) isn't really needed 
(maybe it was when Matt sent his original patches, but since then 
inject_pending_event has seen a significant overhaul).

Now this second possibility is very similar to Sean's suggestion, but 
it's actually code that I can understand.

> We enable the irq window if kvm_cpu_has_injectable_intr() or if
> userspace asks. And when the exit happens, we feed it to userspace
> unless kvm_cpu_has_injectable_intr().

What I don't like about it is that kvm_cpu_has_injectable_intr() has the 
more complicated handling of APIC interrupts.  By definition they don't 
matter here, we're considering whether to exit to userspace.

Paolo


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-11-27  4:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-12 13:03 [RFC] Further hack request_interrupt_window handling to work around kvm_cpu_has_interrupt() nesting breakage David Woodhouse
2020-11-25 15:10 ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
2020-11-25 21:19   ` Sean Christopherson
2020-11-26 11:10     ` David Woodhouse
2020-11-26 12:05       ` [PATCH] kvm/x86: Fix simultaneous ExtINT and lapic interrupt handling with APICv David Woodhouse
2020-11-26 18:00         ` Paolo Bonzini
2020-11-26 19:07           ` David Woodhouse
2020-11-26 17:29       ` [RFC PATCH] Fix split-irqchip vs interrupt injection window request David Woodhouse
2020-11-26 17:59         ` Paolo Bonzini
2020-11-26 21:48           ` David Woodhouse
2020-11-27  4:37             ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.