From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752706AbcF3RwZ (ORCPT ); Thu, 30 Jun 2016 13:52:25 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:30786 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752595AbcF3RwV (ORCPT ); Thu, 30 Jun 2016 13:52:21 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Thu, 30 Jun 2016 10:52:16 -0700 From: "Paul E. McKenney" To: Ross Green Cc: Peter Zijlstra , Mathieu Desnoyers , "Chatre, Reinette" , Jacob Pan , Josh Triplett , John Stultz , Thomas Gleixner , lkml , Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , rostedt , David Howells , Eric Dumazet , Darren Hart , =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , Oleg Nesterov , pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Reply-To: paulmck@linux.vnet.ibm.com References: <20160328062547.GD6344@twins.programming.kicks-ass.net> <20160328130841.GE4287@linux.vnet.ibm.com> <20160329002518.GA13058@linux.vnet.ibm.com> <20160329002814.GB13058@linux.vnet.ibm.com> <20160329134908.GA27588@linux.vnet.ibm.com> <20160330145547.GA3929@linux.vnet.ibm.com> <20160331154255.GA22915@linux.vnet.ibm.com> <20160403081853.GA32220@linux.vnet.ibm.com> <20160507152501.GW3593@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160507152501.GW3593@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16063017-0044-0000-0000-0000008A8EEC X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16063017-0045-0000-0000-000004A09D6B Message-Id: <20160630175216.GA8535@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-06-30_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1606300169 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 07, 2016 at 08:25:01AM -0700, Paul E. McKenney wrote: > On Fri, May 06, 2016 at 04:25:16PM +1000, Ross Green wrote: > > On Sun, Apr 3, 2016 at 6:18 PM, Paul E. McKenney > > wrote: [ . . . ] > > Thought i would update a few runs with the linux-4.6-rc kernels. > > > > I have attached log outputs through dmesg showing rcu_preempt stall warnings. > > > > > > Thought it might be interesting for someone else to look at. > > > > Currently running linux-4.6-rc6 in testing. > > Thank you for sending these, I will look them over! > > Still working to reproduce this quickly enough to do real debug... :-/ And Peter Zijlstra's patch looks to have hugely reduced the rate of occurrence of this bug in my testing: lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.com Almost all of the issues I am seeing now are transient, do not trigger RCU CPU stall warnings, and would not have been visible to me before I upgraded my testing scripts and in-kernel code. In all the tests I have run with Peter's fix, I have seen only one run with RCU CPU stall warnings, and all the stalls in that run were transient, unlike those that I was seeing before his fix. I will of course be tracking this stuff down, but the low reproduction rates will make it slow going. I am guessing that you no longer see this issue? Thanx, Paul