From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751698AbdCIDzf (ORCPT ); Wed, 8 Mar 2017 22:55:35 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:33616 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751313AbdCIDzd (ORCPT ); Wed, 8 Mar 2017 22:55:33 -0500 Date: Thu, 9 Mar 2017 04:55:29 +0100 From: Frederic Weisbecker To: "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org, tglx@linutronix.de Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called Message-ID: <20170309035527.GB6416@lerouge> References: <20170308221656.GA11949@linux.vnet.ibm.com> <20170308234152.GA14819@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170308234152.GA14819@linux.vnet.ibm.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 08, 2017 at 03:41:52PM -0800, Paul E. McKenney wrote: > On Wed, Mar 08, 2017 at 02:16:56PM -0800, Paul E. McKenney wrote: > > Hello! > > > > I am seeing the following splat in rcutorture testing of v4.11-rc1: > > > > [ 30.694013] ============================= > > [ 30.694013] WARNING: suspicious RCU usage > > [ 30.694013] 4.11.0-rc1+ #1 Not tainted > > [ 30.694013] ----------------------------- > > [ 30.694013] /home/git/linux-2.6-tip/kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held! > > [ 30.694013] > > [ 30.694013] other info that might help us debug this: > > [ 30.694013] > > [ 30.694013] > > [ 30.694013] RCU used illegally from offline CPU! > > [ 30.694013] rcu_scheduler_active = 2, debug_locks = 0 > > [ 30.694013] no locks held by swapper/1/0. > > [ 30.694013] > > [ 30.694013] stack backtrace: > > [ 30.694013] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc1+ #1 > > [ 30.694013] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 30.694013] Call Trace: > > [ 30.694013] dump_stack+0x67/0x99 > > [ 30.694013] lockdep_rcu_suspicious+0xe7/0x120 > > [ 30.694013] get_work_pool+0x82/0x90 > > [ 30.694013] __queue_work+0x70/0x5f0 > > [ 30.694013] queue_work_on+0x33/0x70 > > [ 30.694013] clear_sched_clock_stable+0x33/0x40 > > [ 30.694013] early_init_intel+0xe7/0x2f0 > > [ 30.694013] init_intel+0x11/0x350 > > [ 30.694013] identify_cpu+0x344/0x5a0 > > [ 30.694013] identify_secondary_cpu+0x18/0x80 > > [ 30.694013] smp_store_cpu_info+0x39/0x40 > > [ 30.694013] start_secondary+0x4e/0x100 > > [ 30.694013] start_cpu+0x14/0x14 > > > > Here is the relevant code from x86's smp_callin(): > > > > /* > > * Save our processor parameters. Note: this information > > * is needed for clock calibration. > > */ > > smp_store_cpu_info(cpuid); > > > > /* > > * Get our bogomips. > > * Update loops_per_jiffy in cpu_data. Previous call to > > * smp_store_cpu_info() stored a value that is close but not as > > * accurate as the value just calculated. > > */ > > calibrate_delay(); > > cpu_data(cpuid).loops_per_jiffy = loops_per_jiffy; > > pr_debug("Stack at about %p\n", &cpuid); > > > > /* > > * This must be done before setting cpu_online_mask > > * or calling notify_cpu_starting. > > */ > > set_cpu_sibling_map(raw_smp_processor_id()); > > wmb(); > > > > notify_cpu_starting(cpuid); > > > > The problem is that smp_store_cpu_info() indirectly invokes > > schedule_work(), which wants to use RCU. But RCU isn't informed > > of the incoming CPU until the call to notify_cpu_starting(), which > > causes lockdep to complain bitterly about the use of RCU by the > > premature call to schedule_work(). > > > > I considered just moving the notify_cpu_starting() earlier in the > > sequence, but the comments make it seem like this would not be > > a wise choice. > > > > Any suggestions? Calling schedule_work() from an offline (booting) CPU doesn't sound like a good idea in the first place. And neither is it a good idea to allow using RCU on a CPU that is not yet online. Perhaps we could delay this sched clock stability check to a later stage in the secondary CPU boot-up code? Once the CPU is online and RCU is initialized? For example it could be a CPU_ONLINE hotplug callback.