From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755729AbcIJKTn (ORCPT ); Sat, 10 Sep 2016 06:19:43 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:36743 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752481AbcIJKTl (ORCPT ); Sat, 10 Sep 2016 06:19:41 -0400 X-IBM-Helo: d03dlp01.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Sat, 10 Sep 2016 03:19:38 -0700 From: "Paul E. McKenney" To: Rich Felker Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, tglx@linutronix.de Subject: Re: rcu_sched stalls in idle task introduced in pre-4.8? Reply-To: paulmck@linux.vnet.ibm.com References: <20160802170414.GA20083@brightrain.aerifal.cx> <20160802181636.GJ3482@linux.vnet.ibm.com> <20160802192036.GW15995@brightrain.aerifal.cx> <20160802194802.GK3482@linux.vnet.ibm.com> <20160802203217.GZ15995@brightrain.aerifal.cx> <20160802204504.GL3482@linux.vnet.ibm.com> <20160803161631.GA20790@linux.vnet.ibm.com> <20160908221653.GR15995@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160908221653.GR15995@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16091010-0012-0000-0000-0000109956A1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005738; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000185; SDB=6.00756066; UDB=6.00358155; IPR=6.00529126; BA=6.00004706; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012633; XFM=3.00000011; UTC=2016-09-10 10:19:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16091010-0013-0000-0000-00004554199A Message-Id: <20160910101938.GO32751@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-09_14:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609100146 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 08, 2016 at 06:16:53PM -0400, Rich Felker wrote: > On Wed, Aug 03, 2016 at 09:16:31AM -0700, Paul E. McKenney wrote: > > On Tue, Aug 02, 2016 at 01:45:04PM -0700, Paul E. McKenney wrote: > > > On Tue, Aug 02, 2016 at 04:32:17PM -0400, Rich Felker wrote: > > > > On Tue, Aug 02, 2016 at 12:48:02PM -0700, Paul E. McKenney wrote: > > > > [ . . . ] > > > > > > > Does the problem reproduces easily? > > > > > > > > Yes, it happens right after boot and repeats every 30-90 seconds or > > > > so. > > > > > > Well, that at least makes it easier to test any patches! > > > > > > > > A bisection might be very helpful. > > > > > > > > Bisection would require some manual work to setup because the whole > > > > reason I was rebasing on Linus's tree was to adapt the drivers to > > > > upstream infrastructure changes (the new cpuhp stuff replacing > > > > notifier for cpu starting). The unfortunate way it was done, each > > > > driver adds an enum to linux/cpuhotplug.h so all the patches have > > > > gratuitous conflicts. In addition, for older revisions in Linus's > > > > tree, there's at least one show-stopping (hang during boot) bug that > > > > needs a cherry-pick to fix. There may be other small issues too. I > > > > don't think they're at all insurmountible but it requires an annoying > > > > amount of scripting. > > > > > > I had to ask! Might eventually be necessary, but let's see what we > > > can learn from what you currently have. > > > > And at first glance, my overnight run looks uglier than I would expect. > > I am now running tests at v4.7, and will run other tests to see if > > there really is a statistically significant degradation. If there is, > > then I might be able to bisect, though with nine-hour runs this could > > take quite some time. > > Any more thoughts on this? I'm testing v4.8-rc5 (plus jcore drivers > not yet upstream) and it's still happening. Not seeing it, but please do send me a recent splat from your dmesg and your .config. Because I am not seeing it, I also suggest inspecting your jcore drivers with the information in Documentation/RCU/stallwarn.txt in mind. Thanx, Paul