From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: Bug report and patch about IRQ freezing after gic_restore_state Date: Tue, 21 May 2013 13:00:58 +0100 Message-ID: References: <19843405.178081369010517219.JavaMail.weblogic@epv6ml12> <519A20FE.3030307@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <519A20FE.3030307@linaro.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Julien Grall Cc: Stefano Stabellini , Ian Campbell , "jaeyong.yoo@samsung.com" , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Mon, 20 May 2013, Julien Grall wrote: > On 05/20/2013 01:41 AM, Jaeyong Yoo wrote: > > Hello, > > > I'm running xen on Arndale board and if I run both iperf and du command at Dom0, > > one of IRQ (either SATA or network) suddenly stop occuring anymore. > > After some investigation, I found out that when context switching at Xen, > > IRQs in LR (about to be delivered to Doms) could be lost and never occur anymore. > > Here goes function call sequence that this problem occurs: > > (in context switching) > > - schedule_tail > > - ctxt_switch_from > > - local_irq_enable > > - // after this part, some IRQ can occur and could be directly written to LR > > - ctxt_switch_to > > - ... (some more functions) > > - // before the above IRQ is delivered to Dom (and maintenance IRQ not called), > > // gic_restore_state can be called > > - gic_restore_state /* when restoring gic state, the above IRQ > > * (written to LR) is overwritten > > * to the previous values, and somehow, > > * the corresponding IRQ never occur again */ > > > > I made the following patch (i.e., enable local irq after gic_restore_state) > > for preventing the above problem. > > Thanks for the patch, I was looking with a similar error on the Arndale > Board for a couple of day. Indeed, thanks for the analysis of the bug and the patch! It is a particularly difficult bug to track down because it can only happen if an irq arrives after ctxt_switch_from and before ctxt_switch_to, and the irq is for the next vcpu to be scheduled on the pcpu (otherwise the v == current check at the beginning of gic_set_guest_irq would catch that). Rather than extending the check in gic_set_guest_irq, I think it is wise to run ctxt_switch_to with interrupts disabled. > > Signed-off-by: Jaeyong Yoo > > --- > > xen/arch/arm/domain.c | 4 ++-- > > xen/arch/arm/gic.c | 4 ++-- > > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index f71b582..2c3b132 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n) > > /* VGIC */ > > gic_restore_state(n); > > + local_irq_enable(); > > + > > Could you move the local_irq_enable right after ctxt_switch_to? Right, good idea. > > /* XXX VFP */ > > /* XXX MPU */ > > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev) > > { > > ctxt_switch_from(prev); > > - local_irq_enable(); > > - > > /* TODO > > update_runstate_area(current); > > */ > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index d4f0a43..8186ad8 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > - spin_lock_irq(&gic.lock); > > + spin_lock(&gic.lock); > > this_cpu(lr_mask) = v->arch.lr_mask; > > for ( i=0; i > GICH[GICH_LR + i] = v->arch.gic_lr[i]; > > - spin_unlock_irq(&gic.lock); > > + spin_unlock(&gic.lock); > > As the IRQ is disabled and the GICH registers can only be modified by > the current physical CPU, I think you can remove the spin_{,un}lock and > replace it by a dsb. Yes, we can remove the spin_lock but I don't think we need a dsb there. See the presence of an isb() two lines below.