From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1423411AbcBQT20 (ORCPT ); Wed, 17 Feb 2016 14:28:26 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:49704 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030213AbcBQT2X (ORCPT ); Wed, 17 Feb 2016 14:28:23 -0500 X-IBM-Helo: d03dlp03.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 17 Feb 2016 11:28:17 -0800 From: "Paul E. McKenney" To: Ross Green Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, Mathieu Desnoyers , josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, Eric Dumazet , dvhart@linux.intel.com, =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , oleg@redhat.com, pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160217192817.GA21818@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160217054549.GB6719@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160217054549.GB6719@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16021719-8236-0000-0000-0000163649D5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 16, 2016 at 09:45:49PM -0800, Paul E. McKenney wrote: > On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote: > > Continued testing with the latest linux-4.5-rc3 release. > > > > Please find attached a copy of traces from dmesg: > > > > There is a lot more debug and trace data so hopefully this will shed > > some light on what might be happening here. > > > > My testing remains run a series of simple benchmarks, let that run to > > completion and then leave the system idle away with just a few daemons > > running. > > > > the self detected stalls in this instance turned up after a days run time. > > There were NO heavy artificial computational loads on the machine. > > It does indeed look quiet on that dmesg for a good long time. > > The following insanely crude not-for-mainline hack -might- be producing > good results in my testing. It will take some time before I can claim > statistically different results. But please feel free to give it a go > in the meantime. (Thanks to Al Viro for pointing me in this direction.) No joy, just a statistical anomaly. :-( Thanx, Paul > ------------------------------------------------------------------------ > > commit 0c2c8d9fd1641809830a7a75f84dcad69936ef56 > Author: Paul E. McKenney > Date: Tue Feb 16 15:42:36 2016 -0800 > > rcu: Crude exploratory hack > > Signed-off-by: Paul E. McKenney > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 507d0ed48b97..5928e084620d 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2194,8 +2194,10 @@ static int __noreturn rcu_gp_kthread(void *arg) > READ_ONCE(rsp->gpnum), > TPS("fqswait")); > rsp->gp_state = RCU_GP_WAIT_FQS; > - ret = wait_event_interruptible_timeout(rsp->gp_wq, > - rcu_gp_fqs_check_wake(rsp, &gf), j); > + ret = schedule_timeout_interruptible(j > 0 ? j : 1); > + rcu_gp_fqs_check_wake(rsp, &gf); > + // ret = wait_event_interruptible_timeout(rsp->gp_wq, > + // rcu_gp_fqs_check_wake(rsp, &gf), j); > rsp->gp_state = RCU_GP_DOING_FQS; > /* Locking provides needed memory barriers. */ > /* If grace period done, leave loop. */