From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750824AbdCIFYl (ORCPT ); Thu, 9 Mar 2017 00:24:41 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:45398 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750739AbdCIFYj (ORCPT ); Thu, 9 Mar 2017 00:24:39 -0500 Date: Wed, 8 Mar 2017 21:24:14 -0800 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called Reply-To: paulmck@linux.vnet.ibm.com References: <20170308221656.GA11949@linux.vnet.ibm.com> <20170308234152.GA14819@linux.vnet.ibm.com> <20170309035527.GB6416@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170309035527.GB6416@lerouge> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17030905-0052-0000-0000-00000192F45B X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006748; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00832022; UDB=6.00408273; IPR=6.00609610; BA=6.00005201; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014567; XFM=3.00000013; UTC=2017-03-09 05:24:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17030905-0053-0000-0000-00004EF44EC7 Message-Id: <20170309052414.GQ30506@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-09_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703090042 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 09, 2017 at 04:55:29AM +0100, Frederic Weisbecker wrote: > On Wed, Mar 08, 2017 at 03:41:52PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 08, 2017 at 02:16:56PM -0800, Paul E. McKenney wrote: > > > Hello! > > > > > > I am seeing the following splat in rcutorture testing of v4.11-rc1: > > > > > > [ 30.694013] ============================= > > > [ 30.694013] WARNING: suspicious RCU usage > > > [ 30.694013] 4.11.0-rc1+ #1 Not tainted > > > [ 30.694013] ----------------------------- > > > [ 30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held! > > > [ 30.694013] > > > [ 30.694013] other info that might help us debug this: > > > [ 30.694013] > > > [ 30.694013] > > > [ 30.694013] RCU used illegally from offline CPU! > > > [ 30.694013] rcu_scheduler_active = 2, debug_locks = 0 > > > [ 30.694013] no locks held by swapper/1/0. > > > [ 30.694013] > > > [ 30.694013] stack backtrace: > > > [ 30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1 > > > [ 30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > > [ 30.694013] Call Trace: > > > [ 30.694013] dump_stack+0x67/0x99 > > > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 > > > [ 30.694013] get_work_pool+0x82/0x90 > > > [ 30.694013] __queue_work+0x70/0x5f0 > > > [ 30.694013] queue_work_on+0x33/0x70 > > > [ 30.694013] clear_sched_clock_stable+0x33/0x40 > > > [ 30.694013] early_init_intel+0xe7/0x2f0 > > > [ 30.694013] init_intel+0x11/0x350 > > > [ 30.694013] identify_cpu+0x344/0x5a0 > > > [ 30.694013] identify_secondary_cpu+0x18/0x80 > > > [ 30.694013] smp_store_cpu_info+0x39/0x40 > > > [ 30.694013] start_secondary+0x4e/0x100 > > > [ 30.694013] start_cpu+0x14/0x14 > > > > > > Here is the relevant code from x86's smp_callin(): > > > > > > /* > > > * Save our processor parameters. Note: this information > > > * is needed for clock calibration. > > > */ > > > smp_store_cpu_info(cpuid); > > > > > > /* > > > * Get our bogomips. > > > * Update loops_per_jiffy in cpu_data. Previous call to > > > * smp_store_cpu_info() stored a value that is close but not as > > > * accurate as the value just calculated. > > > */ > > > calibrate_delay(); > > > cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; > > > pr_debug("Stack at about %p\n", &cpuid); > > > > > > /* > > > * This must be done before setting cpu_online_mask > > > * or calling notify_cpu_starting. > > > */ > > > set_cpu_sibling_map(raw_smp_processor_id()); > > > wmb(); > > > > > > notify_cpu_starting(cpuid); > > > > > > The problem is that smp_store_cpu_info() indirectly invokes > > > schedule_work(), which wants to use RCU. But RCU isn't informed > > > of the incoming CPU until the call to notify_cpu_starting(), which > > > causes lockdep to complain bitterly about the use of RCU by the > > > premature call to schedule_work(). > > > > > > I considered just moving the notify_cpu_starting() earlier in the > > > sequence, but the comments make it seem like this would not be > > > a wise choice. > > > > > > Any suggestions? > > Calling schedule_work() from an offline (booting) CPU doesn't sound like a good > idea in the first place. And neither is it a good idea to allow using > RCU on a CPU that is not yet online. Fair point, though it is only RCU readers that are in use, so no RCU core code would be running, at least not until interrupts are enabled and so on. But I needed the patch to get this out of the way of my rcutorture testing, so it is serving a purpose whether or not it eventually hits mainline. > Perhaps we could delay this sched clock stability check to a later > stage in the secondary CPU boot-up code? Once the CPU is online > and RCU is initialized? For example it could be a CPU_ONLINE hotplug > callback. In theory, this does seem like a cleaner solution to me. In practice, I must defer to those who know the code better than I do. Thanx, Paul