From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2993138AbcBSReh (ORCPT ); Fri, 19 Feb 2016 12:34:37 -0500 Received: from e36.co.us.ibm.com ([32.97.110.154]:47784 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2993092AbcBSRd5 (ORCPT ); Fri, 19 Feb 2016 12:33:57 -0500 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Fri, 19 Feb 2016 09:33:43 -0800 From: "Paul E. McKenney" To: John Stultz Cc: Ross Green , Mathieu Desnoyers , Thomas Gleixner , Peter Zijlstra , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , Josh Triplett , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160219173343.GB3522@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160217054549.GB6719@linux.vnet.ibm.com> <20160217192817.GA21818@linux.vnet.ibm.com> <20160217194554.GO6357@twins.programming.kicks-ass.net> <20160217202829.GO6719@linux.vnet.ibm.com> <20160217231945.GA21140@linux.vnet.ibm.com> <1568248905.2264.1455837260992.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16021917-0021-0000-0000-00001735AFCC Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 18, 2016 at 08:13:18PM -0800, John Stultz wrote: > On Thu, Feb 18, 2016 at 7:56 PM, Ross Green wrote: > > Well a bonus extra! > > Kept everything running and there was another stall. > > So i have included the demsg output for perusal. > > > > Just to clear things up there is no hotplug involved in this system. > > It is a standard Pandaboard ES Ti4460 two processor system. > > I use this for testing as a generic armv7 processor, plus can keep it > > just running along for testing for a long time. the system has a total > > of 23-25 process running on average. Mainly standard daemons. There is > > certainly no heavy processing going on. I run a series of benchmarks > > that are cpu intensive for the first 20 miinutes after boot and then > > just leave it idle away. checking every so often to see how it has > > gone. > > As mentioned I have observed these stalls going back to 3.17 kernel. > > It will often take up to a week to record such a stall. I will > > typically test every new release kernel, so the -rc? series will get > > around a weeks testing. > > Sorry. Kind of hopping in a bit late here. Is this always happening > with just the pandaboard? Or are you seeing this on different > machines? > > Have you tried enabling CONFIG_DEBUG_TIMEKEEPING just in case > something is going awry there? Excellent point -- timekeeping issues have caused this sort of issue in the past. Ross, on your next test, could you please enable CONFIG_DEBUG_TIMEKEEPING as John suggests? Thanx, Paul