From: "Zhang, Yang Z" <yang.z.zhang@intel.com> To: Zhang Haoyu <zhanghy@sangfor.com>, Jason Wang <jasowang@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>, Gleb Natapov <gleb@kernel.org>, Michael S.Tsirkin <mst@redhat.com> Subject: Re: [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set Date: Fri, 29 Aug 2014 04:07:42 +0000 [thread overview] Message-ID: <A9667DDFB95DB7438FA9D7D576C3D87E0AB71066@SHSMSX104.ccr.corp.intel.com> (raw) In-Reply-To: <201408291114352277288@sangfor.com> Zhang Haoyu wrote on 2014-08-29: > Hi, Yang, Gleb, Michael, > Could you help review below patch please? I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI. > > Thanks, > Zhang Haoyu > >>> Hi Jason, >>> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >>> But I am going to make a bit change on it, could you help review it? >>> >>>> Currently, we call ioapic_service() immediately when we find the irq >>>> is still active during eoi broadcast. But for real hardware, there's >>>> some dealy between the EOI writing and irq delivery (system bus >>>> latency?). So we need to emulate this behavior. Otherwise, for a >>>> guest who haven't register a proper irq handler , it would stay in >>>> the interrupt routine as this irq would be re-injected immediately >>>> after guest enables interrupt. This would lead guest can't move >>>> forward and may miss the possibility to get proper irq handler >>>> registered (one example is windows guest resuming from hibernation). >>>> >>>> As there's no way to differ the unhandled irq from new raised ones, >>>> this patch solve this problems by scheduling a delayed work when the >>>> count of irq injected during eoi broadcast exceeds a threshold value. >>>> After this patch, the guest can move a little forward when there's no >>>> suitable irq handler in case it may register one very soon and for >>>> guest who has a bad irq detection routine ( such as note_interrupt() >>>> in linux ), this bad irq would be recognized soon as in the past. >>>> >>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> --- >>>> virt/kvm/ioapic.c | 47 >>>> +++++++++++++++++++++++++++++++++++++++++++++-- virt/kvm/ioapic.h | >>>> 2 ++ 2 files changed, 47 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index >>>> dcaf272..892253e 100644 --- a/virt/kvm/ioapic.c +++ >>>> b/virt/kvm/ioapic.c <at> <at> -221,6 +221,24 <at> <at> int >>>> kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >>>> return ret; } >>>> >>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) +{ >>>> + int i, ret; + struct kvm_ioapic *ioapic = container_of(work, struct >>>> kvm_ioapic, + eoi_inject.work); + spin_lock(&ioapic->lock); >>>> + for (i = 0; i < IOAPIC_NUM_PINS; i++) { + union >>>> kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; + + if >>>> (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) + continue; + + if >>>> (ioapic->irr & (1 << i) && !ent->fields.remote_irr) + ret = >>>> ioapic_service(ioapic, i); + } + spin_unlock(&ioapic->lock); +} + >>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int >>>> vector, int trigger_mode) { <at> <at> -249,8 +267,29 <at> >>>> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, >>>> int vector, >>>> >>>> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >>>> ent->fields.remote_irr = 0; >>>> - if (!ent->fields.mask && (ioapic->irr & (1 << i))) >>>> - ioapic_service(ioapic, i); >>>> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >>>> + ++ioapic->irq_eoi; >>> -+ ++ioapic->irq_eoi; >>> ++ ++ioapic->irq_eoi[i]; >>>> + if (ioapic->irq_eoi == 100) { >>> -+ if (ioapic->irq_eoi == 100) { >>> ++ if (ioapic->irq_eoi[i] == 100) { >>>> + /* >>>> + * Real hardware does not deliver the irq so >>>> + * immediately during eoi broadcast, so we need >>>> + * to emulate this behavior. Otherwise, for >>>> + * guests who has not registered handler of a >>>> + * level irq, this irq would be injected >>>> + * immediately after guest enables interrupt >>>> + * (which happens usually at the end of the >>>> + * common interrupt routine). This would lead >>>> + * guest can't move forward and may miss the >>>> + * possibility to get proper irq handler >>>> + * registered. So we need to give some breath to >>>> + * guest. TODO: 1 is too long? >>>> + */ >>>> + schedule_delayed_work(&ioapic->eoi_inject, 1); >>>> + ioapic->irq_eoi = 0; >>> -+ ioapic->irq_eoi = 0; >>> ++ ioapic->irq_eoi[i] = 0; >>>> + } else { >>>> + ioapic_service(ioapic, i); >>>> + } >>>> + } >>> ++ else { >>> ++ ioapic->irq_eoi[i] = 0; >>> ++ } >>>> } >>>> } >>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi >>> broadcast, it's possible that another interrupt's (not current eoi's >>> corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow >>> continually, and not too long, ioapic->irq_eoi will reach to 100. I >>> want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >>> Any ideas? >>> >>> Zhang Haoyu >> >> I don't have objection to this per irq counter, but I don't see obvious >> difference. >> >> Btw, the patch has been rejected in the past since we're not sure >> whether this behaviour is architectural (or ask Intel?). You need >> convince the Gleb and other kvm maintainers for this idea first :). Best regards, Yang
WARNING: multiple messages have this Message-ID (diff)
From: "Zhang, Yang Z" <yang.z.zhang@intel.com> To: Zhang Haoyu <zhanghy@sangfor.com>, Jason Wang <jasowang@redhat.com>, qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>, Gleb Natapov <gleb@kernel.org>, "Michael S.Tsirkin" <mst@redhat.com> Subject: Re: [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr bit always set Date: Fri, 29 Aug 2014 04:07:42 +0000 [thread overview] Message-ID: <A9667DDFB95DB7438FA9D7D576C3D87E0AB71066@SHSMSX104.ccr.corp.intel.com> (raw) In-Reply-To: <201408291114352277288@sangfor.com> Zhang Haoyu wrote on 2014-08-29: > Hi, Yang, Gleb, Michael, > Could you help review below patch please? I don't quite understand the background. Why ioacpi->irr is setting before EOI? It should be driver's responsibility to clear the interrupt before issuing EOI. > > Thanks, > Zhang Haoyu > >>> Hi Jason, >>> I tested below patch, it's okay, the e1000 interrupt storm disappeared. >>> But I am going to make a bit change on it, could you help review it? >>> >>>> Currently, we call ioapic_service() immediately when we find the irq >>>> is still active during eoi broadcast. But for real hardware, there's >>>> some dealy between the EOI writing and irq delivery (system bus >>>> latency?). So we need to emulate this behavior. Otherwise, for a >>>> guest who haven't register a proper irq handler , it would stay in >>>> the interrupt routine as this irq would be re-injected immediately >>>> after guest enables interrupt. This would lead guest can't move >>>> forward and may miss the possibility to get proper irq handler >>>> registered (one example is windows guest resuming from hibernation). >>>> >>>> As there's no way to differ the unhandled irq from new raised ones, >>>> this patch solve this problems by scheduling a delayed work when the >>>> count of irq injected during eoi broadcast exceeds a threshold value. >>>> After this patch, the guest can move a little forward when there's no >>>> suitable irq handler in case it may register one very soon and for >>>> guest who has a bad irq detection routine ( such as note_interrupt() >>>> in linux ), this bad irq would be recognized soon as in the past. >>>> >>>> Signed-off-by: Jason Wang <jasowang <at> redhat.com> --- >>>> virt/kvm/ioapic.c | 47 >>>> +++++++++++++++++++++++++++++++++++++++++++++-- virt/kvm/ioapic.h | >>>> 2 ++ 2 files changed, 47 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index >>>> dcaf272..892253e 100644 --- a/virt/kvm/ioapic.c +++ >>>> b/virt/kvm/ioapic.c <at> <at> -221,6 +221,24 <at> <at> int >>>> kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) >>>> return ret; } >>>> >>>> +static void kvm_ioapic_eoi_inject_work(struct work_struct *work) +{ >>>> + int i, ret; + struct kvm_ioapic *ioapic = container_of(work, struct >>>> kvm_ioapic, + eoi_inject.work); + spin_lock(&ioapic->lock); >>>> + for (i = 0; i < IOAPIC_NUM_PINS; i++) { + union >>>> kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; + + if >>>> (ent->fields.trig_mode != IOAPIC_LEVEL_TRIG) + continue; + + if >>>> (ioapic->irr & (1 << i) && !ent->fields.remote_irr) + ret = >>>> ioapic_service(ioapic, i); + } + spin_unlock(&ioapic->lock); +} + >>>> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int >>>> vector, int trigger_mode) { <at> <at> -249,8 +267,29 <at> >>>> <at> static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, >>>> int vector, >>>> >>>> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); >>>> ent->fields.remote_irr = 0; >>>> - if (!ent->fields.mask && (ioapic->irr & (1 << i))) >>>> - ioapic_service(ioapic, i); >>>> + if (!ent->fields.mask && (ioapic->irr & (1 << i))) { >>>> + ++ioapic->irq_eoi; >>> -+ ++ioapic->irq_eoi; >>> ++ ++ioapic->irq_eoi[i]; >>>> + if (ioapic->irq_eoi == 100) { >>> -+ if (ioapic->irq_eoi == 100) { >>> ++ if (ioapic->irq_eoi[i] == 100) { >>>> + /* >>>> + * Real hardware does not deliver the irq so >>>> + * immediately during eoi broadcast, so we need >>>> + * to emulate this behavior. Otherwise, for >>>> + * guests who has not registered handler of a >>>> + * level irq, this irq would be injected >>>> + * immediately after guest enables interrupt >>>> + * (which happens usually at the end of the >>>> + * common interrupt routine). This would lead >>>> + * guest can't move forward and may miss the >>>> + * possibility to get proper irq handler >>>> + * registered. So we need to give some breath to >>>> + * guest. TODO: 1 is too long? >>>> + */ >>>> + schedule_delayed_work(&ioapic->eoi_inject, 1); >>>> + ioapic->irq_eoi = 0; >>> -+ ioapic->irq_eoi = 0; >>> ++ ioapic->irq_eoi[i] = 0; >>>> + } else { >>>> + ioapic_service(ioapic, i); >>>> + } >>>> + } >>> ++ else { >>> ++ ioapic->irq_eoi[i] = 0; >>> ++ } >>>> } >>>> } >>> I think ioapic->irq_eoi is prone to reach to 100, because during a eoi >>> broadcast, it's possible that another interrupt's (not current eoi's >>> corresponding interrupt) irr is set, so the ioapic->irq_eoi will grow >>> continually, and not too long, ioapic->irq_eoi will reach to 100. I >>> want to add "u32 irq_eoi[IOAPIC_NUM_PINS];" instead of "u32 irq_eoi;". >>> Any ideas? >>> >>> Zhang Haoyu >> >> I don't have objection to this per irq counter, but I don't see obvious >> difference. >> >> Btw, the patch has been rejected in the past since we're not sure >> whether this behaviour is architectural (or ask Intel?). You need >> convince the Gleb and other kvm maintainers for this idea first :). Best regards, Yang
next prev parent reply other threads:[~2014-08-29 4:07 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-08-23 10:36 [question] e1000 interrupt storm happened because of its corresponding ioapic->irr bit always set Zhang Haoyu 2014-08-23 10:36 ` [Qemu-devel] " Zhang Haoyu 2014-08-25 3:07 ` Jason Wang 2014-08-25 7:17 ` [question] e1000 interrupt storm happened becauseof " Zhang Haoyu 2014-08-25 7:17 ` [Qemu-devel] " Zhang Haoyu 2014-08-25 7:29 ` Jason Wang 2014-08-25 7:29 ` [Qemu-devel] " Jason Wang 2014-08-25 8:27 ` [question] e1000 interrupt storm happened becauseof its correspondingioapic->irr " Zhang Haoyu 2014-08-25 8:27 ` [Qemu-devel] " Zhang Haoyu 2014-08-26 9:28 ` Zhang Haoyu 2014-08-26 9:28 ` [Qemu-devel] " Zhang Haoyu 2014-08-27 5:09 ` Jason Wang 2014-08-27 5:09 ` [Qemu-devel] " Jason Wang 2014-08-27 9:31 ` [Qemu-devel] [question] e1000 interrupt storm happened becauseofits " Zhang Haoyu 2014-08-28 7:12 ` Jason Wang 2014-08-28 12:55 ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits " Zhang Haoyu 2014-08-29 2:50 ` Jason Wang 2014-08-29 3:14 ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofitscorrespondingioapic->irr " Zhang Haoyu 2014-08-29 4:07 ` Zhang, Yang Z [this message] 2014-08-29 4:07 ` Zhang, Yang Z 2014-08-29 4:28 ` Jason Wang 2014-09-02 15:44 ` [Qemu-devel] [question] e1000 interrupt storm happenedbecauseofits correspondingioapic->irr " Michael S. Tsirkin 2014-09-04 1:56 ` [Qemu-devel] [question] e1000 interrupt stormhappenedbecauseofits " Zhang Haoyu 2014-09-04 4:57 ` Jason Wang 2014-08-25 7:32 ` [question] e1000 interrupt storm happened becauseof its corresponding ioapic->irr " Jason Wang 2014-08-25 7:32 ` [Qemu-devel] " Jason Wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=A9667DDFB95DB7438FA9D7D576C3D87E0AB71066@SHSMSX104.ccr.corp.intel.com \ --to=yang.z.zhang@intel.com \ --cc=gleb@kernel.org \ --cc=jasowang@redhat.com \ --cc=kvm@vger.kernel.org \ --cc=mst@redhat.com \ --cc=qemu-devel@nongnu.org \ --cc=zhanghy@sangfor.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.