linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism
@ 2020-05-02  9:24 Suravee Suthikulpanit
  2020-05-02 17:19 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Suravee Suthikulpanit @ 2020-05-02  9:24 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: pbonzini, rkrcmar, joro, jon.grimm, borisvk, Suravee Suthikulpanit

commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces
the following regression.

BUG: stack guard page was hit at 000000008f595917 \
(stack is 00000000bdefe5a4..00000000ae2b06f5)
kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
RIP: 0010:kvm_set_irq+0x51/0x160 [kvm]
Call Trace:
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
 irqfd_resampler_ack+0x32/0x90 [kvm]
 kvm_notify_acked_irq+0x62/0xd0 [kvm]
 kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm]
 ioapic_set_irq+0x20e/0x240 [kvm]
 kvm_ioapic_set_irq+0x5c/0x80 [kvm]
 kvm_set_irq+0xbb/0x160 [kvm]
 ? kvm_hv_set_sint+0x20/0x20 [kvm]
....

This is due to re-entrancy of the lazy update EOI logic
when enable APICv with VFIO pass-through device, which
sets up kvm_irqfd() w/ KVM_IRQFD_FLAG_RESAMPLE.

Fixes by adding re-entrancy check logic.

Reported-by: borisvk@bstnet.org
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://www.spinics.net/lists/kvm/msg213512.html
Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/ioapic.c | 11 +++++++++++
 arch/x86/kvm/ioapic.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 750ff0b..a68f624 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -188,6 +188,14 @@ static void ioapic_lazy_update_eoi(struct kvm_ioapic *ioapic, int irq)
 	struct kvm_vcpu *vcpu;
 	union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
 
+	/*
+	 * Re-entrancy check due to KVM_IRQFD_FLAG_RESAMPLE can cause
+	 * stack overflow where irqfd_resampler_ack() repeatedly calls
+	 * ioapic_set_irq().
+	 */
+	if (atomic_cmpxchg(&ioapic->in_lazy_update_eoi, 0, 1))
+		return;
+
 	kvm_for_each_vcpu(i, vcpu, ioapic->kvm) {
 		if (!kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
 					 entry->fields.dest_id,
@@ -205,6 +213,8 @@ static void ioapic_lazy_update_eoi(struct kvm_ioapic *ioapic, int irq)
 					  irq);
 		break;
 	}
+
+	atomic_set(&ioapic->in_lazy_update_eoi, 0);
 }
 
 static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
@@ -707,6 +717,7 @@ int kvm_ioapic_init(struct kvm *kvm)
 		kvm->arch.vioapic = NULL;
 		kfree(ioapic);
 	}
+	atomic_set(&ioapic->in_lazy_update_eoi, 0);
 
 	return ret;
 }
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 2fb2e3c..898a967 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -91,6 +91,7 @@ struct kvm_ioapic {
 	struct delayed_work eoi_inject;
 	u32 irq_eoi[IOAPIC_NUM_PINS];
 	u32 irr_delivered;
+	atomic_t in_lazy_update_eoi;
 };
 
 #ifdef DEBUG
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism
  2020-05-02  9:24 [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism Suravee Suthikulpanit
@ 2020-05-02 17:19 ` Paolo Bonzini
  2020-05-04  6:31   ` Suravee Suthikulpanit
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2020-05-02 17:19 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, kvm
  Cc: rkrcmar, joro, jon.grimm, borisvk

On 02/05/20 11:24, Suravee Suthikulpanit wrote:
> This is due to re-entrancy of the lazy update EOI logic
> when enable APICv with VFIO pass-through device, which
> sets up kvm_irqfd() w/ KVM_IRQFD_FLAG_RESAMPLE.
> 
> Fixes by adding re-entrancy check logic.

This does not explain why this is the right fix.

The questions to answer are: what is causing the re-entrancy? and why 
is dropping the second EOI update safe?

The answer to the latter could well be "because we've already processed
it", but the answer to the former is more important.

The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state.  That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd.  But if the interrupt has _not_ gone low then we get an
infinite loop.

So the actual root cause is that this is a level-triggered interrupt,
otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID
high and then low and you wouldn't have the infinite loop.  But in the
case of level-triggered interrupts the VMEXIT already happens because
TMR is set; only edge-triggered interrupts need the lazy invocation
of the ack notifier.  So this should be the fix:

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 7668fed1ce65..ca2d73cd00a3 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
 	}
 
 	/*
-	 * AMD SVM AVIC accelerate EOI write and do not trap,
-	 * in-kernel IOAPIC will not be able to receive the EOI.
+	 * AMD SVM AVIC accelerate EOI write iff the interrupt is level
+	 * triggered, in-kernel IOAPIC will not be able to receive the EOI.
 	 * In this case, we do lazy update of the pending EOI when
 	 * trying to set IOAPIC irq.
 	 */
-	if (kvm_apicv_activated(ioapic->kvm))
+	if (edge && kvm_apicv_activated(ioapic->kvm))
 		ioapic_lazy_update_eoi(ioapic, irq);
 
 	/*

Did I miss anything in the above analysis with respect to AVIC?

Paolo


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism
  2020-05-02 17:19 ` Paolo Bonzini
@ 2020-05-04  6:31   ` Suravee Suthikulpanit
  2020-05-04  9:19     ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Suravee Suthikulpanit @ 2020-05-04  6:31 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: rkrcmar, joro, jon.grimm, borisvk

Paolo,

On 5/3/20 12:19 AM, Paolo Bonzini wrote:
> On 02/05/20 11:24, Suravee Suthikulpanit wrote:
> ....
> The questions to answer are: what is causing the re-entrancy? and why
> is dropping the second EOI update safe?
> 
> The answer to the latter could well be "because we've already processed
> it", but the answer to the former is more important.
> 
> The re-entrancy happens because the irq state is the OR of
> the interrupt state and the resamplefd state.  That is, we don't
> want to show the state as 0 until we've had a chance to set the
> resamplefd.  But if the interrupt has _not_ gone low then we get an
> infinite loop.

I'm not too familiar w/ the resamplefd.  I must have missed this part.
Could you please point out to me where the OR logic is?

> So the actual root cause is that this is a level-triggered interrupt,
> otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID
> high and then low and you wouldn't have the infinite loop.  

Okay.

> But in the case of level-triggered interrupts the VMEXIT already happens because
> TMR is set; only edge-triggered interrupts need the lazy invocation
> of the ack notifier.  

For AVIC, EOI write for level-triggered would have also be trapped.
And yes, edge-triggered needs lazy ack notifier.

> So this should be the fix:
> 
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 7668fed1ce65..ca2d73cd00a3 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
>   	}
>   
>   	/*
> -	 * AMD SVM AVIC accelerate EOI write and do not trap,
> -	 * in-kernel IOAPIC will not be able to receive the EOI.
> +	 * AMD SVM AVIC accelerate EOI write iff the interrupt is level
> +	 * triggered, in-kernel IOAPIC will not be able to receive the EOI.

Actually, it should be "AMD SVM AVIC accelerate EOI write iff the interrupt is _edge_ triggered".

>   	 * In this case, we do lazy update of the pending EOI when
>   	 * trying to set IOAPIC irq.
>   	 */
> -	if (kvm_apicv_activated(ioapic->kvm))
> +	if (edge && kvm_apicv_activated(ioapic->kvm))
>   		ioapic_lazy_update_eoi(ioapic, irq);
>   
>   	/*
> 
> Did I miss anything in the above analysis with respect to AVIC?


For AMD:
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

Thanks,
Suravee

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism
  2020-05-04  6:31   ` Suravee Suthikulpanit
@ 2020-05-04  9:19     ` Paolo Bonzini
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2020-05-04  9:19 UTC (permalink / raw)
  To: Suravee Suthikulpanit, linux-kernel, kvm
  Cc: rkrcmar, joro, jon.grimm, borisvk

On 04/05/20 08:31, Suravee Suthikulpanit wrote:
>>
>>
>> The re-entrancy happens because the irq state is the OR of
>> the interrupt state and the resamplefd state.  That is, we don't
>> want to show the state as 0 until we've had a chance to set the
>> resamplefd.  But if the interrupt has _not_ gone low then we get an
>> infinite loop.
> 
> I'm not too familiar w/ the resamplefd.  I must have missed this part.
> Could you please point out to me where the OR logic is?

It's because kvm_ioapic_set_irq gets the actual state of the interrupt
line from __kvm_irq_line_state before calling ioapic_set_irq.

>> But in the case of level-triggered interrupts the VMEXIT already
>> happens because TMR is set; only edge-triggered interrupts need
>> the lazy invocation of the ack notifier.  
> 
> For AVIC, EOI write for level-triggered would have also be trapped.

Yes, I was talking about AVIC only there.

> And yes, edge-triggered needs lazy ack notifier.
> 
>> So this should be the fix:
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 7668fed1ce65..ca2d73cd00a3 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic
>> *ioapic, unsigned int irq,
>>       }
>>         /*
>> -     * AMD SVM AVIC accelerate EOI write and do not trap,
>> -     * in-kernel IOAPIC will not be able to receive the EOI.
>> +     * AMD SVM AVIC accelerate EOI write iff the interrupt is level
>> +     * triggered, in-kernel IOAPIC will not be able to receive the EOI.
> 
> Actually, it should be "AMD SVM AVIC accelerate EOI write iff the
> interrupt is _edge_ triggered".

Of course, thanks.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-04  9:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-02  9:24 [PATCH v2] kvm: ioapic: Introduce arch-specific check for lazy update EOI mechanism Suravee Suthikulpanit
2020-05-02 17:19 ` Paolo Bonzini
2020-05-04  6:31   ` Suravee Suthikulpanit
2020-05-04  9:19     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).