From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2993512AbcBTEef (ORCPT ); Fri, 19 Feb 2016 23:34:35 -0500 Received: from mail-lf0-f53.google.com ([209.85.215.53]:34242 "EHLO mail-lf0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2992772AbcBTEec (ORCPT ); Fri, 19 Feb 2016 23:34:32 -0500 MIME-Version: 1.0 In-Reply-To: <20160219173343.GB3522@linux.vnet.ibm.com> References: <20160217054549.GB6719@linux.vnet.ibm.com> <20160217192817.GA21818@linux.vnet.ibm.com> <20160217194554.GO6357@twins.programming.kicks-ass.net> <20160217202829.GO6719@linux.vnet.ibm.com> <20160217231945.GA21140@linux.vnet.ibm.com> <1568248905.2264.1455837260992.JavaMail.zimbra@efficios.com> <20160219173343.GB3522@linux.vnet.ibm.com> Date: Sat, 20 Feb 2016 15:34:30 +1100 Message-ID: Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 From: Ross Green To: Paul McKenney Cc: John Stultz , Mathieu Desnoyers , Thomas Gleixner , Peter Zijlstra , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , Josh Triplett , rostedt , David Howells , Eric Dumazet , Darren Hart , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Oleg Nesterov , pranith kumar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 20, 2016 at 4:33 AM, Paul E. McKenney wrote: > On Thu, Feb 18, 2016 at 08:13:18PM -0800, John Stultz wrote: >> On Thu, Feb 18, 2016 at 7:56 PM, Ross Green wrote: >> > Well a bonus extra! >> > Kept everything running and there was another stall. >> > So i have included the demsg output for perusal. >> > >> > Just to clear things up there is no hotplug involved in this system. >> > It is a standard Pandaboard ES Ti4460 two processor system. >> > I use this for testing as a generic armv7 processor, plus can keep it >> > just running along for testing for a long time. the system has a total >> > of 23-25 process running on average. Mainly standard daemons. There is >> > certainly no heavy processing going on. I run a series of benchmarks >> > that are cpu intensive for the first 20 miinutes after boot and then >> > just leave it idle away. checking every so often to see how it has >> > gone. >> > As mentioned I have observed these stalls going back to 3.17 kernel. >> > It will often take up to a week to record such a stall. I will >> > typically test every new release kernel, so the -rc? series will get >> > around a weeks testing. >> >> Sorry. Kind of hopping in a bit late here. Is this always happening >> with just the pandaboard? Or are you seeing this on different >> machines? >> >> Have you tried enabling CONFIG_DEBUG_TIMEKEEPING just in case >> something is going awry there? > > Excellent point -- timekeeping issues have caused this sort of issue > in the past. > > Ross, on your next test, could you please enable CONFIG_DEBUG_TIMEKEEPING > as John suggests? > > Thanx, Paul > As John has suggested have already enabled CONFIG_DEBUG_TIMEKEEPING. So far just on 1 day running. Sigh...!! Nothing to report as yet, only one day on the clock. Its like watching grass grow! Regards, Ross Green