From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752149AbdLBTY2 (ORCPT ); Sat, 2 Dec 2017 14:24:28 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:56786 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751908AbdLBTY1 (ORCPT ); Sat, 2 Dec 2017 14:24:27 -0500 Date: Sat, 2 Dec 2017 11:24:19 -0800 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, xiaolong.ye@intel.com, tglx@linutronix.de, cmetcalf@mellanox.com, cl@linux.com, torvalds@linux-foundation.org, lcapitulino@redhat.com, efault@gmx.de, peterz@infradead.org, riel@redhat.com, kernellwp@gmail.com, mingo@kernel.org, john.stultz@linaro.org Subject: Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Reply-To: paulmck@linux.vnet.ibm.com References: <20171130202046.GA27138@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17120219-0036-0000-0000-000002955969 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008143; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000243; SDB=6.00954462; UDB=6.00482300; IPR=6.00734498; BA=6.00005727; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018310; XFM=3.00000015; UTC=2017-12-02 19:24:24 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120219-0037-0000-0000-0000428DEB10 Message-Id: <20171202192419.GN7829@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-02_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712020286 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote: > 2017-11-30 21:20 UTC+01:00, Paul E. McKenney : > > Commit 5c4991e24c69 ("sched/isolation: Split out new > > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU > > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y > > and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's > > grace-period kthreads sleeping for a few jiffies, but never being > > awakened: > > > > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208 > > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 > > [ 116.356543] Call Trace: > > [ 116.357008] __schedule+0x493/0x620 > > [ 116.357682] schedule+0x24/0x40 > > [ 116.358291] schedule_timeout+0x330/0x3b0 > > [ 116.359024] ? preempt_count_sub+0xea/0x140 > > [ 116.359806] ? collect_expired_timers+0xb0/0xb0 > > [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 > > > > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which > > prevents this behavior and seems like it was the original intention in > > any case. > > Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm > surprised about this stall. I'm even more surprised that setting > CONFIG_CPU_ISOLATION=y is enough to fix the issue because > CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which > is not good, but work in progress...). Yes, and after applying this patch, I get failures a few commits later, which appears to be due to other changes that break CONFIG_NO_HZ_FULL_ALL=y. So I have another patch staged that removes CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained, so rcutorture is likely to be the only user, and I don't see the point of having a Kconfig option for only one user. > Did you have any nohz_full= or isolcpus= boot options? Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 on an eight-CPU test. But it is relatively easy to test. Running the rcutorture TREE04 scenario on a four-socket x86 gets me RCU CPU stall warnings within a few minutes more than half the time. ;-) Thanx, Paul