From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1757064AbcCUR0X (ORCPT <rfc822;w@1wt.eu>);
	Mon, 21 Mar 2016 13:26:23 -0400
Received: from e35.co.us.ibm.com ([32.97.110.153]:45774 "EHLO
	e35.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1755732AbcCUR0V (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 21 Mar 2016 13:26:21 -0400
X-IBM-Helo: d03dlp03.boulder.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Date: Mon, 21 Mar 2016 10:26:16 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>, Ross Green <rgkernel@gmail.com>,
        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
        John Stultz <john.stultz@linaro.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Peter Zijlstra <peterz@infradead.org>,
        lkml <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@kernel.org>,
        Lai Jiangshan <jiangshanlai@gmail.com>, dipankar@in.ibm.com,
        Andrew Morton <akpm@linux-foundation.org>,
        rostedt <rostedt@goodmis.org>, David Howells <dhowells@redhat.com>,
        Eric Dumazet <edumazet@google.com>,
        Darren Hart <dvhart@linux.intel.com>,
        =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com>,
        Oleg Nesterov <oleg@redhat.com>, pranith kumar <bobby.prani@gmail.com>,
        "Chatre, Reinette" <reinette.chatre@intel.com>
Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17
Message-ID: <20160321172616.GU4287@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <CANfgCY2yWp65-f1ujJ_-8yDp8Xp7KPHLJ8u+rfZmz=evkpghVw@mail.gmail.com>
 <CANfgCY0SaXWkCFq=dHGG38AnFd3Rd+wvVGQ6TH9DYow881YUWA@mail.gmail.com>
 <686568926.5862.1456259651418.JavaMail.zimbra@efficios.com>
 <20160223205522.GT3522@linux.vnet.ibm.com>
 <CANfgCY04Cnjoq1jQvQCd48Pt+_4Y9tcG6A7bCHHPWWWL6TFuEQ@mail.gmail.com>
 <CANfgCY3XCmArP9sJsdsQPGygyzWhzAXMS+jnge3_DCNYsONQyg@mail.gmail.com>
 <20160226005638.GV3522@linux.vnet.ibm.com>
 <20160318210011.GA571@cloud>
 <20160318235641.GH4287@linux.vnet.ibm.com>
 <20160321092230.75f23fa9@yairi>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160321092230.75f23fa9@yairi>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16032117-0013-0000-0000-000020A7502F
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Mar 21, 2016 at 09:22:30AM -0700, Jacob Pan wrote:
> On Fri, 18 Mar 2016 16:56:41 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > On Fri, Mar 18, 2016 at 02:00:11PM -0700, Josh Triplett wrote:
> > > On Thu, Feb 25, 2016 at 04:56:38PM -0800, Paul E. McKenney wrote:

[ . . . ]

> > > We're seeing a similar stall (~60 seconds) on an x86 development
> > > system here.  Any luck tracking down the cause of this?  If not, any
> > > suggestions for traces that might be helpful?
> > 
> > The dmesg containing the stall, the kernel version, and the .config
> > would be helpful!  Working on a torture test specific to this bug...
> > 
> > 							Thanx, Paul
> > 
> +Reinette, she has the system that can reproduce the issue. I
> believe she is having some other problems with it at the moment. But
> the .config should be available. Version is v4.5.

A couple of additional questions:

1.	Is the test running on bare metal or virtualized?  If the
	latter, what is the host?

2.	Does the workload involve CPU hotplug?

3.	Are you seeing things like this in dmesg?

	"rcu_preempt kthread starved for 21033 jiffies"
	"rcu_sched kthread starved for 32103 jiffies"
	"rcu_bh kthread starved for 84031 jiffies"

	If not, you are probably facing some other bug, and should
	proceed debugging as described in Documentation/RCU/stallwarn.txt.

							Thanx, Paul