From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757998AbcBTGcx (ORCPT ); Sat, 20 Feb 2016 01:32:53 -0500 Received: from e35.co.us.ibm.com ([32.97.110.153]:33440 "EHLO e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757397AbcBTGcu (ORCPT ); Sat, 20 Feb 2016 01:32:50 -0500 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 19 Feb 2016 22:32:48 -0800 From: "Paul E. McKenney" To: Ross Green Cc: John Stultz , Mathieu Desnoyers , Thomas Gleixner , Peter Zijlstra , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , Josh Triplett , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160220063248.GE3522@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160217192817.GA21818@linux.vnet.ibm.com> <20160217194554.GO6357@twins.programming.kicks-ass.net> <20160217202829.GO6719@linux.vnet.ibm.com> <20160217231945.GA21140@linux.vnet.ibm.com> <1568248905.2264.1455837260992.JavaMail.zimbra@efficios.com> <20160219173343.GB3522@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16022006-0013-0000-0000-00001D0BC038 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 20, 2016 at 03:34:30PM +1100, Ross Green wrote: > On Sat, Feb 20, 2016 at 4:33 AM, Paul E. McKenney > wrote: > > On Thu, Feb 18, 2016 at 08:13:18PM -0800, John Stultz wrote: > >> On Thu, Feb 18, 2016 at 7:56 PM, Ross Green wrote: > >> > Well a bonus extra! > >> > Kept everything running and there was another stall. > >> > So i have included the demsg output for perusal. > >> > > >> > Just to clear things up there is no hotplug involved in this system. > >> > It is a standard Pandaboard ES Ti4460 two processor system. > >> > I use this for testing as a generic armv7 processor, plus can keep it > >> > just running along for testing for a long time. the system has a total > >> > of 23-25 process running on average. Mainly standard daemons. There is > >> > certainly no heavy processing going on. I run a series of benchmarks > >> > that are cpu intensive for the first 20 miinutes after boot and then > >> > just leave it idle away. checking every so often to see how it has > >> > gone. > >> > As mentioned I have observed these stalls going back to 3.17 kernel. > >> > It will often take up to a week to record such a stall. I will > >> > typically test every new release kernel, so the -rc? series will get > >> > around a weeks testing. > >> > >> Sorry. Kind of hopping in a bit late here. Is this always happening > >> with just the pandaboard? Or are you seeing this on different > >> machines? > >> > >> Have you tried enabling CONFIG_DEBUG_TIMEKEEPING just in case > >> something is going awry there? > > > > Excellent point -- timekeeping issues have caused this sort of issue > > in the past. > > > > Ross, on your next test, could you please enable CONFIG_DEBUG_TIMEKEEPING > > as John suggests? > > > > Thanx, Paul > > > As John has suggested have already enabled CONFIG_DEBUG_TIMEKEEPING. > > So far just on 1 day running. > > Sigh...!! Nothing to report as yet, only one day on the clock. > Its like watching grass grow! I hear you! Though I was thinking in terms of watching paint dry... Thanx, Paul