From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755407AbcAXWcz (ORCPT ); Sun, 24 Jan 2016 17:32:55 -0500 Received: from mailgw.movielink.com.au ([175.103.28.14]:47414 "EHLO mailgw.movielink.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752821AbcAXWcu (ORCPT ); Sun, 24 Jan 2016 17:32:50 -0500 X-Greylist: delayed 1310 seconds by postgrey-1.27 at vger.kernel.org; Sun, 24 Jan 2016 17:32:49 EST Date: Sat, 23 Jan 2016 18:01:21 -0800 From: "Paul E. McKenney" To: Geert Uytterhoeven Cc: "linux-kernel@vger.kernel.org" , Ingo Molnar , jiangshanlai@gmail.com, dipankar@in.ibm.com, Andrew Morton , Mathieu Desnoyers , Josh Triplett , Thomas Gleixner , Peter Zijlstra , Steven Rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar , "linux-arm-kernel@lists.infradead.org" , linux-renesas-soc@vger.kernel.org, arnd@arndb.de, olof@lixom.net Subject: Re: RCU lockup? (was: Re: [PATCH v2 tip/core/rcu 10/14] rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()) Message-ID: <20160124020121.GD4503@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160121160657.GW3818@linux.vnet.ibm.com> <20160122204412.GN3818@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Movielink-MailScanner-Information: Please contact the ISP for more information X-Movielink-MailScanner-ID: 24FC9991347.A471F X-Movielink-MailScanner: Found to be clean X-Movielink-MailScanner-From: paulmck@linux.vnet.ibm.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 23, 2016 at 10:43:19AM +0100, Geert Uytterhoeven wrote: > Hi Paul, > > On Fri, Jan 22, 2016 at 9:44 PM, Paul E. McKenney > wrote: > > On Fri, Jan 22, 2016 at 09:55:44AM +0100, Geert Uytterhoeven wrote: > >> On Thu, Jan 21, 2016 at 5:06 PM, Paul E. McKenney > >> wrote: > >> > On Thu, Jan 21, 2016 at 02:22:56PM +0100, Geert Uytterhoeven wrote: > >> >> On Thu, Dec 10, 2015 at 12:10 AM, Paul E. McKenney > >> >> wrote: > >> >> > This commit replaces a local_irq_save()/local_irq_restore() pair with > >> >> > a lockdep assertion that interrupts are already disabled. This should > >> >> > remove the corresponding overhead from the interrupt entry/exit fastpaths. > >> >> > > >> >> > This change was inspired by the fact that Iftekhar Ahmed's mutation > >> >> > testing showed that removing rcu_irq_enter()'s call to local_ird_restore() > >> >> > had no effect, which might indicate that interrupts were always enabled > >> >> > anyway. > >> >> > > >> >> > Signed-off-by: Paul E. McKenney > >> >> > --- > >> >> > include/linux/rcupdate.h | 4 ++-- > >> >> > include/linux/rcutiny.h | 8 ++++++++ > >> >> > include/linux/rcutree.h | 2 ++ > >> >> > include/linux/tracepoint.h | 4 ++-- > >> >> > kernel/rcu/tree.c | 32 ++++++++++++++++++++++++++------ > >> >> > 5 files changed, 40 insertions(+), 10 deletions(-) > >> >> > >> >> This commit (7c9906ca5e582a773fff696975e312cef58a7386) is triggering lock ups > >> >> during boot on r8a7791/koelsch (dual Cortex A15). Probably this commit does not > >> >> contain the real bug, but a symptom. > >> > > >> > On the off-chance that it is related, here is Ding Tianhong's patch > >> > that addressed some lockups: > >> > > >> > http://www.eenyhelp.com/patch-rfc-locking-mutexes-dont-spin-owner-when-wait-list-not-null-help-215929641.html > >> > > >> > Does that help in your case? > >> > >> Unfortunately not. > > > > We could revert the RCU patch without any real problems -- it is after > > all just an optimization. > > I replaced the calls to rcu_irq_{enter,exit}() in irq_{enter,exit}() by their > _irqson counterparts, which should be equivalent to the old code, but the issue > persisted. Strange... Indeed... > Does it matter that arm has > #define __ARCH_IRQ_EXIT_IRQS_DISABLED 1 > ? No idea. I added Arnd and Olof on CC in case they can tell us more. > I tried JTAG, but enabling JTAG on r8a7791/koelsch requires changing a switch > on the board, which also disables the second CPU core, and thus makes the issue > disappear... :-( > > Hmmm... One issue that we have seen before is that the irq-disabled > > indication is a software flag that is not always in sync with > > hardware conditions. Might it be that we are hitting a situation where > > irqs_disabled() is giving the wrong answer, thus suppressing the lockdep > > warning? > > Possible. I tried adding 'if(!irqs_disabled) printk("something")' just before > the RCU_LOCKDEP_WARN(), but it never triggered. Worse, the issue went away by > doing that :-( That would be "if (!irqs_disabled())..." with the "()", correct? But if you had lockdep enabled, and if lockdep didn't complain, I would not expect the "if" to complain either. The fact that the problem was suppressed by the extra check is a bit annoying, I will grant you that! Thanx, Paul > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From mboxrd@z Thu Jan 1 00:00:00 1970 From: paulmck@linux.vnet.ibm.com (Paul E. McKenney) Date: Sat, 23 Jan 2016 18:01:21 -0800 Subject: RCU lockup? (was: Re: [PATCH v2 tip/core/rcu 10/14] rcu: Don't redundantly disable irqs in rcu_irq_{enter,exit}()) In-Reply-To: References: <20160121160657.GW3818@linux.vnet.ibm.com> <20160122204412.GN3818@linux.vnet.ibm.com> Message-ID: <20160124020121.GD4503@linux.vnet.ibm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Sat, Jan 23, 2016 at 10:43:19AM +0100, Geert Uytterhoeven wrote: > Hi Paul, > > On Fri, Jan 22, 2016 at 9:44 PM, Paul E. McKenney > wrote: > > On Fri, Jan 22, 2016 at 09:55:44AM +0100, Geert Uytterhoeven wrote: > >> On Thu, Jan 21, 2016 at 5:06 PM, Paul E. McKenney > >> wrote: > >> > On Thu, Jan 21, 2016 at 02:22:56PM +0100, Geert Uytterhoeven wrote: > >> >> On Thu, Dec 10, 2015 at 12:10 AM, Paul E. McKenney > >> >> wrote: > >> >> > This commit replaces a local_irq_save()/local_irq_restore() pair with > >> >> > a lockdep assertion that interrupts are already disabled. This should > >> >> > remove the corresponding overhead from the interrupt entry/exit fastpaths. > >> >> > > >> >> > This change was inspired by the fact that Iftekhar Ahmed's mutation > >> >> > testing showed that removing rcu_irq_enter()'s call to local_ird_restore() > >> >> > had no effect, which might indicate that interrupts were always enabled > >> >> > anyway. > >> >> > > >> >> > Signed-off-by: Paul E. McKenney > >> >> > --- > >> >> > include/linux/rcupdate.h | 4 ++-- > >> >> > include/linux/rcutiny.h | 8 ++++++++ > >> >> > include/linux/rcutree.h | 2 ++ > >> >> > include/linux/tracepoint.h | 4 ++-- > >> >> > kernel/rcu/tree.c | 32 ++++++++++++++++++++++++++------ > >> >> > 5 files changed, 40 insertions(+), 10 deletions(-) > >> >> > >> >> This commit (7c9906ca5e582a773fff696975e312cef58a7386) is triggering lock ups > >> >> during boot on r8a7791/koelsch (dual Cortex A15). Probably this commit does not > >> >> contain the real bug, but a symptom. > >> > > >> > On the off-chance that it is related, here is Ding Tianhong's patch > >> > that addressed some lockups: > >> > > >> > http://www.eenyhelp.com/patch-rfc-locking-mutexes-dont-spin-owner-when-wait-list-not-null-help-215929641.html > >> > > >> > Does that help in your case? > >> > >> Unfortunately not. > > > > We could revert the RCU patch without any real problems -- it is after > > all just an optimization. > > I replaced the calls to rcu_irq_{enter,exit}() in irq_{enter,exit}() by their > _irqson counterparts, which should be equivalent to the old code, but the issue > persisted. Strange... Indeed... > Does it matter that arm has > #define __ARCH_IRQ_EXIT_IRQS_DISABLED 1 > ? No idea. I added Arnd and Olof on CC in case they can tell us more. > I tried JTAG, but enabling JTAG on r8a7791/koelsch requires changing a switch > on the board, which also disables the second CPU core, and thus makes the issue > disappear... :-( > > Hmmm... One issue that we have seen before is that the irq-disabled > > indication is a software flag that is not always in sync with > > hardware conditions. Might it be that we are hitting a situation where > > irqs_disabled() is giving the wrong answer, thus suppressing the lockdep > > warning? > > Possible. I tried adding 'if(!irqs_disabled) printk("something")' just before > the RCU_LOCKDEP_WARN(), but it never triggered. Worse, the issue went away by > doing that :-( That would be "if (!irqs_disabled())..." with the "()", correct? But if you had lockdep enabled, and if lockdep didn't complain, I would not expect the "if" to complain either. The fact that the problem was suppressed by the extra check is a bit annoying, I will grant you that! Thanx, Paul > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.