From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932584AbeCLCeR convert rfc822-to-8bit (ORCPT ); Sun, 11 Mar 2018 22:34:17 -0400 Received: from mx01.hxt-semitech.com.96.203.223.in-addr.arpa ([223.203.96.7]:59786 "EHLO barracuda.hxt-semitech.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932426AbeCLCeP (ORCPT ); Sun, 11 Mar 2018 22:34:15 -0400 X-ASG-Debug-ID: 1520822048-093b7e4ed70e7a0001-xx1T2L X-Barracuda-Envelope-From: shunyong.yang@hxt-semitech.com From: "Yang, Shunyong" To: "marc.zyngier@arm.com" , "cdall@kernel.org" CC: "linux-kernel@vger.kernel.org" , "ard.biesheuvel@linaro.org" , "kvmarm@lists.cs.columbia.edu" , "Zheng, Joey" , "will.deacon@arm.com" , "linux-arm-kernel@lists.infradead.org" , "david.daney@cavium.com" , "eric.auger@redhat.com" Subject: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling Thread-Topic: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling X-ASG-Orig-Subj: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling Thread-Index: AQHTuaqKUuRM9/vW0E6mL5YCFQL5MQ== Date: Mon, 12 Mar 2018 02:33:44 +0000 Message-ID: <1520822024.2985.12.camel@hxt-semitech.com> References: <1520492490-7943-1-git-send-email-shunyong.yang@hxt-semitech.com> <9ad47673-068e-f732-d2ca-9c76a8fbdfbc@arm.com> <0a15633d-8944-cb9b-3e6b-b08ee5ec42b9@arm.com> <20180308161900.GC1917@lvm> <86r2oubho3.wl-marc.zyngier@arm.com> <20180309213612.GD1917@lvm> <86k1ukazr0.wl-marc.zyngier@arm.com> <20180311121733.643fb1a5@why.wild-wind.fr.eu.org> In-Reply-To: <20180311121733.643fb1a5@why.wild-wind.fr.eu.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.64.6.51] Content-Type: text/plain; charset="iso-8859-15" Content-ID: <084D4A40BA59B6499E1A4EF86942AD23@hxt-semitech.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Barracuda-Connect: UNKNOWN[10.128.0.15] X-Barracuda-Start-Time: 1520822048 X-Barracuda-Encrypted: ECDHE-RSA-AES256-SHA X-Barracuda-URL: https://192.168.50.101:443/cgi-mod/mark.cgi X-Barracuda-BRTS-Status: 1 X-Barracuda-Bayes: SPAM GLOBAL 0.9694 1.0000 3.9909 X-Barracuda-Spam-Score: 3.99 X-Barracuda-Spam-Status: No, SCORE=3.99 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.48845 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Marc, On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: > On Sun, 11 Mar 2018 01:55:08 +0000 > Christoffer Dall wrote: > > > > > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier > m> wrote: > > > > > > On Fri, 09 Mar 2018 21:36:12 +0000, > > > Christoffer Dall wrote:   > > > > > > > > > > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:   > > > > > > > > > > I'd be more confident if we did forbid P+A for such > > > > > interrupts > > > > > altogether, as they really feel like another kind of HW > > > > > interrupt.   > > > > How about a slightly bigger hammer:  Can we avoid doing P+A for > > > > level > > > > interrupts completely?  I don't think that really makes much > > > > sense, and > > > > I think we simply everything if we just come back out and > > > > resample the > > > > line.  For an edge, something like a network card, there's a > > > > potential > > > > performance win to appending a new pending state, but I doubt > > > > that this > > > > is the case for level interrupts.   > > > I started implementing the same thing yesterday. Somehow, it > > > feels > > > slightly better to have the same flow for all level interrupts, > > > including the timer, and we only use the MI on EOI as a way to > > > trigger > > > the next state of injection. Still testing, but looking good so > > > far. > > > > > > I'm still puzzled that we have this level-but-not-quite behaviour > > > for > > > VFIO interrupts. At some point, it is going to bite us badly. > > >   > > Where is the departure from level-triggered behavior with VFIO?  As > > far as I can tell, the GIC flow of the interrupts will be just a > > level > > interrupt,  > The GIC is fine, I believe. What is not exactly fine is the > signalling > from the device, which will never be dropped until the EOI has been > detected. > > > > > but we just need to make sure the resamplefd mechanism is > > supported for both types of interrupts.  Whether or not that's a > > decent mechanism seems orthogonal to me, but that's a discussion > > for > > another day I think. > Given that VFIO is built around this mechanism, I don't think we have > a > choice but to support it. Anyway, I came up with the following patch, > which I tested on Seattle with mtty. It also survived my usual > hammering of cyclictest, hackbench  and bulk VM installs. > > Shunyong, could you please give it a go? > > Thanks, > > M. > I have tested the patch. It works on QDF2400 platform and kvm_notify_acked_irq() is called when state is idle. BTW, I have following questions when I was debugging the issue. Coud you please give me some help? 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); 2)In some __hyp_text code where printk() will cause "HYP panic:", such as in __kvm_vcpu_run(). How can I output debug information? Thanks. Shunyong. > From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 > 2001 > From: Marc Zyngier > Date: Fri, 9 Mar 2018 14:59:40 +0000 > Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for > level >  interrupts > > It was recently reported that VFIO mediated devices, and anything > that VFIO exposes as level interrupts, do no strictly follow the > expected logic of such interrupts as it only lowers the input > line when the guest has EOId the interrupt at the GIC level, rather > than when it Acked the interrupt at the device level. > > The GIC's Active+Pending state is fundamentally incompatible with > this behaviour, as it prevents KVM from observing the EOI, and in > turn results in VFIO never dropping the line. This results in an > interrupt storm in the guest, which it really never expected. > > As we cannot really change VFIO to follow the strict rules of level > signalling, let's forbid the A+P state altogether, as it is in the > end only an optimization. It ensures that we will transition via > an invalid state, which we can use to notify VFIO of the EOI. > > Signed-off-by: Marc Zyngier > --- >  virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------ > ------------ >  virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------ > ------------ >  2 files changed, 56 insertions(+), 38 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > v2.c > index 29556f71b691..9356d749da1d 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu > *vcpu) >  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq > *irq, int lr) >  { >   u32 val = irq->intid; > + bool allow_pending = true; >   > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= GICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= GICH_LR_HW; > + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > + /* > +  * Never set pending+active on a HW interrupt, as > the > +  * pending state is kept at the physical distributor > +  * level. > +  */ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= GICH_LR_EOI; > + > + /* > +  * Software resampling doesn't work very > well > +  * if we allow P+A, so let's not do that. > +  */ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { >   val |= GICH_LR_PENDING_BIT; >   >   if (irq->config == VGIC_CONFIG_EDGE) > @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) >   } >   } >   > - if (irq->active) > - val |= GICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= GICH_LR_HW; > - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > - /* > -  * Never set pending+active on a HW interrupt, as > the > -  * pending state is kept at the physical distributor > -  * level. > -  */ > - if (irq->active && irq_is_pending(irq)) > - val &= ~GICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= GICH_LR_EOI; > - } > - >   /* >    * Level-triggered mapped IRQs are special because we only > observe >    * rising edges as input to the VGIC.  We therefore lower > the line > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > v3.c > index 0ff2006f3781..6b484575cafb 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) >  { >   u32 model = vcpu->kvm->arch.vgic.vgic_model; >   u64 val = irq->intid; > + bool allow_pending = true; >   > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= ICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= ICH_LR_HW; > + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > + /* > +  * Never set pending+active on a HW interrupt, as > the > +  * pending state is kept at the physical distributor > +  * level. > +  */ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= ICH_LR_EOI; > + > + /* > +  * Software resampling doesn't work very > well > +  * if we allow P+A, so let's not do that. > +  */ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { >   val |= ICH_LR_PENDING_BIT; >   >   if (irq->config == VGIC_CONFIG_EDGE) > @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) >   } >   } >   > - if (irq->active) > - val |= ICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= ICH_LR_HW; > - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > - /* > -  * Never set pending+active on a HW interrupt, as > the > -  * pending state is kept at the physical distributor > -  * level. > -  */ > - if (irq->active && irq_is_pending(irq)) > - val &= ~ICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= ICH_LR_EOI; > - } > - >   /* >    * Level-triggered mapped IRQs are special because we only > observe >    * rising edges as input to the VGIC.  We therefore lower > the line > --  > 2.14.2 > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: shunyong.yang@hxt-semitech.com (Yang, Shunyong) Date: Mon, 12 Mar 2018 02:33:44 +0000 Subject: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling In-Reply-To: <20180311121733.643fb1a5@why.wild-wind.fr.eu.org> References: <1520492490-7943-1-git-send-email-shunyong.yang@hxt-semitech.com> <9ad47673-068e-f732-d2ca-9c76a8fbdfbc@arm.com> <0a15633d-8944-cb9b-3e6b-b08ee5ec42b9@arm.com> <20180308161900.GC1917@lvm> <86r2oubho3.wl-marc.zyngier@arm.com> <20180309213612.GD1917@lvm> <86k1ukazr0.wl-marc.zyngier@arm.com> <20180311121733.643fb1a5@why.wild-wind.fr.eu.org> Message-ID: <1520822024.2985.12.camel@hxt-semitech.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi, Marc, On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote: > On Sun, 11 Mar 2018 01:55:08 +0000 > Christoffer Dall wrote: > > > > > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier > m> wrote: > > > > > > On Fri, 09 Mar 2018 21:36:12 +0000, > > > Christoffer Dall wrote:?? > > > > > > > > > > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:?? > > > > > > > > > > I'd be more confident if we did forbid P+A for such > > > > > interrupts > > > > > altogether, as they really feel like another kind of HW > > > > > interrupt.?? > > > > How about a slightly bigger hammer:??Can we avoid doing P+A for > > > > level > > > > interrupts completely???I don't think that really makes much > > > > sense, and > > > > I think we simply everything if we just come back out and > > > > resample the > > > > line.??For an edge, something like a network card, there's a > > > > potential > > > > performance win to appending a new pending state, but I doubt > > > > that this > > > > is the case for level interrupts.?? > > > I started implementing the same thing yesterday. Somehow, it > > > feels > > > slightly better to have the same flow for all level interrupts, > > > including the timer, and we only use the MI on EOI as a way to > > > trigger > > > the next state of injection. Still testing, but looking good so > > > far. > > > > > > I'm still puzzled that we have this level-but-not-quite behaviour > > > for > > > VFIO interrupts. At some point, it is going to bite us badly. > > > ? > > Where is the departure from level-triggered behavior with VFIO???As > > far as I can tell, the GIC flow of the interrupts will be just a > > level > > interrupt,? > The GIC is fine, I believe. What is not exactly fine is the > signalling > from the device, which will never be dropped until the EOI has been > detected. > > > > > but we just need to make sure the resamplefd mechanism is > > supported for both types of interrupts.??Whether or not that's a > > decent mechanism seems orthogonal to me, but that's a discussion > > for > > another day I think. > Given that VFIO is built around this mechanism, I don't think we have > a > choice but to support it. Anyway, I came up with the following patch, > which I tested on Seattle with mtty. It also survived my usual > hammering of cyclictest, hackbench??and bulk VM installs. > > Shunyong, could you please give it a go? > > Thanks, > > M. > I have tested the patch. It works on QDF2400 platform and?kvm_notify_acked_irq() is called when state is idle. BTW, I have following questions when I was debugging the issue. Coud you please give me some help? 1)what does "mi" mean in gic code? such as lr_signals_eoi_mi(); 2)In some __hyp_text?code where printk() will cause "HYP panic:", such as in __kvm_vcpu_run(). How can I output debug information? Thanks. Shunyong. > From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00 > 2001 > From: Marc Zyngier > Date: Fri, 9 Mar 2018 14:59:40 +0000 > Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for > level > ?interrupts > > It was recently reported that VFIO mediated devices, and anything > that VFIO exposes as level interrupts, do no strictly follow the > expected logic of such interrupts as it only lowers the input > line when the guest has EOId the interrupt at the GIC level, rather > than when it Acked the interrupt at the device level. > > The GIC's Active+Pending state is fundamentally incompatible with > this behaviour, as it prevents KVM from observing the EOI, and in > turn results in VFIO never dropping the line. This results in an > interrupt storm in the guest, which it really never expected. > > As we cannot really change VFIO to follow the strict rules of level > signalling, let's forbid the A+P state altogether, as it is in the > end only an optimization. It ensures that we will transition via > an invalid state, which we can use to notify VFIO of the EOI. > > Signed-off-by: Marc Zyngier > --- > ?virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------ > ------------ > ?virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------ > ------------ > ?2 files changed, 56 insertions(+), 38 deletions(-) > > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic- > v2.c > index 29556f71b691..9356d749da1d 100644 > --- a/virt/kvm/arm/vgic/vgic-v2.c > +++ b/virt/kvm/arm/vgic/vgic-v2.c > @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu > *vcpu) > ?void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq > *irq, int lr) > ?{ > ? u32 val = irq->intid; > + bool allow_pending = true; > ? > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= GICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= GICH_LR_HW; > + val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > + /* > + ?* Never set pending+active on a HW interrupt, as > the > + ?* pending state is kept at the physical distributor > + ?* level. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= GICH_LR_EOI; > + > + /* > + ?* Software resampling doesn't work very > well > + ?* if we allow P+A, so let's not do that. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > ? val |= GICH_LR_PENDING_BIT; > ? > ? if (irq->config == VGIC_CONFIG_EDGE) > @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ? } > ? } > ? > - if (irq->active) > - val |= GICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= GICH_LR_HW; > - val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT; > - /* > - ?* Never set pending+active on a HW interrupt, as > the > - ?* pending state is kept at the physical distributor > - ?* level. > - ?*/ > - if (irq->active && irq_is_pending(irq)) > - val &= ~GICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= GICH_LR_EOI; > - } > - > ? /* > ? ?* Level-triggered mapped IRQs are special because we only > observe > ? ?* rising edges as input to the VGIC.??We therefore lower > the line > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic- > v3.c > index 0ff2006f3781..6b484575cafb 100644 > --- a/virt/kvm/arm/vgic/vgic-v3.c > +++ b/virt/kvm/arm/vgic/vgic-v3.c > @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ?{ > ? u32 model = vcpu->kvm->arch.vgic.vgic_model; > ? u64 val = irq->intid; > + bool allow_pending = true; > ? > - if (irq_is_pending(irq)) { > + if (irq->active) > + val |= ICH_LR_ACTIVE_BIT; > + > + if (irq->hw) { > + val |= ICH_LR_HW; > + val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > + /* > + ?* Never set pending+active on a HW interrupt, as > the > + ?* pending state is kept at the physical distributor > + ?* level. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } else { > + if (irq->config == VGIC_CONFIG_LEVEL) { > + val |= ICH_LR_EOI; > + > + /* > + ?* Software resampling doesn't work very > well > + ?* if we allow P+A, so let's not do that. > + ?*/ > + if (irq->active) > + allow_pending = false; > + } > + } > + > + if (allow_pending && irq_is_pending(irq)) { > ? val |= ICH_LR_PENDING_BIT; > ? > ? if (irq->config == VGIC_CONFIG_EDGE) > @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, > struct vgic_irq *irq, int lr) > ? } > ? } > ? > - if (irq->active) > - val |= ICH_LR_ACTIVE_BIT; > - > - if (irq->hw) { > - val |= ICH_LR_HW; > - val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT; > - /* > - ?* Never set pending+active on a HW interrupt, as > the > - ?* pending state is kept at the physical distributor > - ?* level. > - ?*/ > - if (irq->active && irq_is_pending(irq)) > - val &= ~ICH_LR_PENDING_BIT; > - } else { > - if (irq->config == VGIC_CONFIG_LEVEL) > - val |= ICH_LR_EOI; > - } > - > ? /* > ? ?* Level-triggered mapped IRQs are special because we only > observe > ? ?* rising edges as input to the VGIC.??We therefore lower > the line > --? > 2.14.2 > >