From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Zhang Haoyu" Subject: Re: [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits correspondingioapic->irr bit always set Date: Thu, 4 Sep 2014 09:56:51 +0800 Message-ID: <201409040956496187355@sangfor.com> References: <201408231836387399956@sangfor.com>, <53FAA874.70703@redhat.com>, <201408251517235889695@sangfor.com>, <53FAE5EB.8080809@redhat.com>, <201408261728240882530@sangfor.com>, <53FD681D.7070602@redhat.com>, <201408271731497879013@sangfor.com>, <201408282055150251894@sangfor.com>, <20140902154406.GA23374@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit Cc: "Jason Wang" , "qemu-devel" , "kvm" To: "Michael S. Tsirkin" Return-path: Received: from smtp.sanfor.com ([58.251.49.30]:57239 "EHLO mail.sangfor.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751039AbaIDE1Y (ORCPT ); Thu, 4 Sep 2014 00:27:24 -0400 Sender: kvm-owner@vger.kernel.org List-ID: >> Hi Jason, >> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >> But I am going to make a bit change on it, could you help review it? >> >> >Currently, we call ioapic_service() immediately when we find the irq is still >> >active during eoi broadcast. But for real hardware, there's some dealy between >> >the EOI writing and irq delivery (system bus latency?). So we need to emulate >> >this behavior. Otherwise, for a guest who haven't register a proper irq handler >> >, it would stay in the interrupt routine as this irq would be re-injected >> >immediately after guest enables interrupt. This would lead guest can't move >> >forward and may miss the possibility to get proper irq handler registered (one >> >example is windows guest resuming from hibernation). >> > >> >As there's no way to differ the unhandled irq from new raised ones, this patch >> >solve this problems by scheduling a delayed work when the count of irq injected >> >during eoi broadcast exceeds a threshold value. After this patch, the guest can >> >move a little forward when there's no suitable irq handler in case it may >> >register one very soon and for guest who has a bad irq detection routine ( such >> >as note_interrupt() in linux ), this bad irq would be recognized soon as in the >> >past. >> > >> >Signed-off-by: Jason Wang redhat.com> >> >--- >> > virt/kvm/ioapic.c | 47 +++++++++++++++++++++++++++++++++++++++++++++-- >> > virt/kvm/ioapic.h | 2 ++ >> > 2 files changed, 47 insertions(+), 2 deletions(-) >> > >> >diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c >> >index dcaf272..892253e 100644 >> >--- a/virt/kvm/ioapic.c >> >+++ b/virt/kvm/ioapic.c >> > -221,6 +221,24 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >> > return ret; >> > } >> > >> >+static void kvm_ioapic_eoi_inject_work(struct work_struct *work) >> >+{ >> >+ int i, ret; >> >+ struct kvm_ioapic *ioapic = container_of(work, struct kvm_ioapic, >> >+ eoi_inject.work); >> >+ spin_lock(&ioapic->lock); >> >+ for (i = 0; i < IOAPIC_NUM_PINS; i++) { >> >+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; >> >+ >> >+ if (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) >> >+ continue; >> >+ >> >+ if (ioapic->irr & (1 << i) && !ent->fields.remote_irr) >> >+ ret = ioapic_service(ioapic, i); >> >+ } >> >+ spin_unlock(&ioapic->lock); >> >+} >> >+ >> > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >> > int trigger_mode) >> > { >> > -249,8 +267,29 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, >> > >> > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >> > ent->fields.remote_irr = 0; >> >- if (!ent->fields.mask && (ioapic->irr & (1 << i))) >> >- ioapic_service(ioapic, i); >> >+ if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >> >+ ++ioapic->irq_eoi; >> -+ ++ioapic->irq_eoi; >> ++ ++ioapic->irq_eoi[i]; >> >+ if (ioapic->irq_eoi == 100) { >> -+ if (ioapic->irq_eoi == 100) { >> ++ if (ioapic->irq_eoi[i] == 100) { >> >+ /* >> >+ * Real hardware does not deliver the irq so >> >+ * immediately during eoi broadcast, so we need >> >+ * to emulate this behavior. Otherwise, for >> >+ * guests who has not registered handler of a >> >+ * level irq, this irq would be injected >> >+ * immediately after guest enables interrupt >> >+ * (which happens usually at the end of the >> >+ * common interrupt routine). This would lead >> >+ * guest can't move forward and may miss the >> >+ * possibility to get proper irq handler >> >+ * registered. So we need to give some breath to >> >+ * guest. TODO: 1 is too long? >> >+ */ >> >+ schedule_delayed_work(&ioapic->eoi_inject, 1); >> >+ ioapic->irq_eoi = 0; >> -+ ioapic->irq_eoi = 0; >> ++ ioapic->irq_eoi[i] = 0; >> >+ } else { >> >+ ioapic_service(ioapic, i); >> >+ } >> >+ } >> ++ else { >> ++ ioapic->irq_eoi[i] = 0; >> ++ } >> > } >> > } >> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi broadcast, >> it's possible that another interrupt's (not current eoi's corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow continually, >> and not too long, ioapic->irq_eoi will reach to 100. >> I want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >> Any ideas? >> >> Zhang Haoyu > >I'm a bit concerned how this will affect realtime guests. >Worth adding a flag to enable this, so that e.g. virtio is not >affected? > Your concern is reasonable. If applying Jason's original patch, sometimes the virtio's interrupt delay is more than 4ms(my host's HZ=250), but very rarely happened. And with my above change on it(per irq counter instead of total irq counter), the delayed virtio interrupt is more rarely happened, then I use 1000 instead of 100 on "if (ioapic->irq_eoi[i] == 1000)", I made a test for 10min, the delayed virtio interrupt has not happened. Thanks, Zhang Haoyu > >> > >> > -375,12 +414,14 void kvm_ioapic_reset(struct kvm_ioapic *ioapic) >> > { >> > int i; >> > >> >+ cancel_delayed_work_sync(&ioapic->eoi_inject); >> > for (i = 0; i < IOAPIC_NUM_PINS; i++) >> > ioapic->redirtbl[i].fields.mask = 1; >> > ioapic->base_address = IOAPIC_DEFAULT_BASE_ADDRESS; >> > ioapic->ioregsel = 0; >> > ioapic->irr = 0; >> > ioapic->id = 0; >> >+ ioapic->irq_eoi = 0; >> -+ ioapic->irq_eoi = 0; >> ++ memset(ioapic->irq_eoi, 0x00, IOAPIC_NUM_PINS); >> > update_handled_vectors(ioapic); >> > } >> > >> > -398,6 +439,7 int kvm_ioapic_init(struct kvm *kvm) >> > if (!ioapic) >> > return -ENOMEM; >> > spin_lock_init(&ioapic->lock); >> >+ INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work); >> > kvm->arch.vioapic = ioapic; >> > kvm_ioapic_reset(ioapic); >> > kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops); >> > -418,6 +460,7 void kvm_ioapic_destroy(struct kvm *kvm) >> > { >> > struct kvm_ioapic *ioapic = kvm->arch.vioapic; >> > >> >+ cancel_delayed_work_sync(&ioapic->eoi_inject); >> > if (ioapic) { >> > kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev); >> > kvm->arch.vioapic = NULL; >> >diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h >> >index 0b190c3..8938e66 100644 >> >--- a/virt/kvm/ioapic.h >> >+++ b/virt/kvm/ioapic.h >> > -47,6 +47,8 struct kvm_ioapic { >> > void (*ack_notifier)(void *opaque, int irq); >> > spinlock_t lock; >> > DECLARE_BITMAP(handled_vectors, 256); >> >+ struct delayed_work eoi_inject; >> >+ u32 irq_eoi; >> -+ u32 irq_eoi; >> ++ u32 irq_eoi[IOAPIC_NUM_PINS]; >> > }; >> > >> > #ifdef DEBUG >>