Re: [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set

From: "Zhang, Yang Z" <yang.z.zhang@intel.com>
To: Zhang Haoyu <zhanghy@sangfor.com>,
	Jason Wang <jasowang@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>,
	Gleb Natapov <gleb@kernel.org>,
	Michael S.Tsirkin <mst@redhat.com>
Subject: Re: [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set
Date: Fri, 29 Aug 2014 04:07:42 +0000	[thread overview]
Message-ID: <A9667DDFB95DB7438FA9D7D576C3D87E0AB71066@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <201408291114352277288@sangfor.com>

Zhang Haoyu wrote on 2014-08-29:
> Hi, Yang, Gleb, Michael,
> Could you help review below patch please?

I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI.

> 
> Thanks,
> Zhang Haoyu
> 
>>> Hi Jason,
>>> I tested below patch, it's okay, the e1000 interrupt storm disappeared.
>>> But I am going to make a bit change on it, could you help review it?
>>> 
>>>> Currently, we call ioapic_service() immediately when we find the irq
>>>> is still active during eoi broadcast. But for real hardware, there's
>>>> some dealy between the EOI writing and irq delivery (system bus
>>>> latency?). So we need to emulate this behavior. Otherwise, for a
>>>> guest who haven't register a proper irq handler , it would stay in
>>>> the interrupt routine as this irq would be re-injected immediately
>>>> after guest enables interrupt. This would lead guest can't move
>>>> forward and may miss the possibility to get proper irq handler
>>>> registered (one example is windows guest resuming from hibernation).
>>>> 
>>>> As there's no way to differ the unhandled irq from new raised ones,
>>>> this patch solve this problems by scheduling a delayed work when the
>>>> count of irq injected during eoi broadcast exceeds a threshold value.
>>>> After this patch, the guest can move a little forward when there's no
>>>> suitable irq handler in case it may register one very soon and for
>>>> guest who has a bad irq detection routine ( such as note_interrupt()
>>>> in linux ), this bad irq would be recognized soon as in the past.
>>>> 
>>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> ---
>>>> virt/kvm/ioapic.c |   47
>>>> +++++++++++++++++++++++++++++++++++++++++++++-- virt/kvm/ioapic.h |  
>>>>  2 ++ 2 files changed, 47 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index
>>>> dcaf272..892253e 100644 --- a/virt/kvm/ioapic.c +++
>>>> b/virt/kvm/ioapic.c <at>  <at>  -221,6 +221,24  <at>  <at>  int
>>>> kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level)
>>>> 	return ret; }
>>>> 
>>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) +{
>>>> +	int i, ret; +	struct kvm_ioapic *ioapic = container_of(work, struct
>>>> kvm_ioapic, +						 eoi_inject.work); +	spin_lock(&ioapic->lock);
>>>> +	for (i = 0; i < IOAPIC_NUM_PINS; i++) { +		union
>>>> kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; + +		if
>>>> (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) +			continue; + +		if
>>>> (ioapic->irr & (1 << i) && !ent->fields.remote_irr) +			ret =
>>>> ioapic_service(ioapic, i); +	} +	spin_unlock(&ioapic->lock); +} +
>>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int
>>>> vector, 				     int trigger_mode) { <at>  <at>  -249,8 +267,29  <at>
>>>>  <at>  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic,
>>>> int vector,
>>>> 
>>>> 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>>> 		ent->fields.remote_irr = 0;
>>>> -		if (!ent->fields.mask && (ioapic->irr & (1 << i)))
>>>> -			ioapic_service(ioapic, i);
>>>> +		if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
>>>> +			++ioapic->irq_eoi;
>>> -+			++ioapic->irq_eoi;
>>> ++		    ++ioapic->irq_eoi[i];
>>>> +			if (ioapic->irq_eoi == 100) {
>>> -+			if (ioapic->irq_eoi == 100) {
>>> ++			if (ioapic->irq_eoi[i] == 100) {
>>>> +				/*
>>>> +				 * Real hardware does not deliver the irq so
>>>> +				 * immediately during eoi broadcast, so we need
>>>> +				 * to emulate this behavior. Otherwise, for
>>>> +				 * guests who has not registered handler of a
>>>> +				 * level irq, this irq would be injected
>>>> +				 * immediately after guest enables interrupt
>>>> +				 * (which happens usually at the end of the
>>>> +				 * common interrupt routine). This would lead
>>>> +				 * guest can't move forward and may miss the
>>>> +				 * possibility to get proper irq handler
>>>> +				 * registered. So we need to give some breath to
>>>> +				 * guest. TODO: 1 is too long?
>>>> +				 */
>>>> +				schedule_delayed_work(&ioapic->eoi_inject, 1);
>>>> +				ioapic->irq_eoi = 0;
>>> -+				ioapic->irq_eoi = 0;
>>> ++				ioapic->irq_eoi[i] = 0;
>>>> +			} else {
>>>> +				ioapic_service(ioapic, i);
>>>> +			}
>>>> +		}
>>> ++		else {
>>> ++			ioapic->irq_eoi[i] = 0;
>>> ++		}
>>>> 	}
>>>> }
>>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi
>>> broadcast, it's possible that another interrupt's (not current eoi's
>>> corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow
>>> continually, and not too long, ioapic->irq_eoi will reach to 100. I
>>> want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;".
>>> Any ideas?
>>> 
>>> Zhang Haoyu
>> 
>> I don't have objection to this per irq counter, but I don't see obvious
>> difference.
>> 
>> Btw, the patch has been rejected in the past since we're not sure
>> whether this behaviour is architectural (or ask Intel?). You need
>> convince the Gleb and other kvm maintainers for this idea first :).

Best regards,
Yang