From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932241AbdDEQZ4 (ORCPT ); Wed, 5 Apr 2017 12:25:56 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:58395 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752207AbdDEQZT (ORCPT ); Wed, 5 Apr 2017 12:25:19 -0400 Date: Wed, 5 Apr 2017 09:25:15 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: LKML Subject: Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt Reply-To: paulmck@linux.vnet.ibm.com References: <20170405093207.404f8deb@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170405093207.404f8deb@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17040516-0048-0000-0000-0000014CE527 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006886; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000208; SDB=6.00843503; UDB=6.00415614; IPR=6.00621681; BA=6.00005271; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014923; XFM=3.00000013; UTC=2017-04-05 16:25:16 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17040516-0049-0000-0000-0000404124F0 Message-Id: <20170405162515.GF1600@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-05_13:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1704050141 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 05, 2017 at 09:32:07AM -0400, Steven Rostedt wrote: > Hi Paul, > > My tests are triggering the following when I have lockdep enabled and > running the stack tracer: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51 > Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample] > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 > Call Trace: > > =============================== > [ ERR: suspicious RCU usage. ] > 4.11.0-rc3-test+ #48 Not tainted > ------------------------------- > /work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle! > > other info that might help us debug this: > > > RCU used illegally from idle CPU! > rcu_scheduler_active = 2, debug_locks = 0 > RCU used illegally from extended quiescent state! > 1 lock held by swapper/0/0: > #0: (rcu_read_lock){......}, at: [] rcu_read_lock+0x0/0x61 > > stack backtrace: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 > Call Trace: > dump_stack+0x68/0x92 > __warn+0xc2/0xdd > ? 0xffffffffa0026077 > warn_slowpath_null+0x1d/0x1f > module_assert_mutex_or_preempt+0x4f/0x51 > __module_address+0x2a/0xac > ? 0xffffffffa0026077 > __module_text_address+0x12/0x59 > ? 0xffffffffa0026077 > is_module_text_address+0xe/0x16 > __kernel_text_address+0x2b/0x80 > ? 0xffffffffa0026077 > unwind_get_return_address+0x50/0x5c > __save_stack_trace+0x76/0xbf > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > save_stack_trace+0x1b/0x1d > check_stack+0xec/0x24a > stack_trace_call+0x40/0x53 <<-- This is the start of the stack tracer > 0xffffffffa0026077 > ? ftrace_graph_caller+0x78/0xa8 > ? trace_hardirqs_off+0xd/0xf > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > rcu_eqs_enter_common.constprop.71+0x5/0x108 <<-- It appears to be testing the rcu internal code > rcu_idle_enter+0x51/0x72 > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > ? rcu_idle_enter+0x51/0x72 > do_idle+0xbb/0x1e5 > cpu_startup_entry+0x1f/0x21 > rest_init+0x127/0x12d > start_kernel+0x3f5/0x402 > x86_64_start_reservations+0x2f/0x31 > x86_64_start_kernel+0x17f/0x192 > start_cpu+0x14/0x14 > ? start_cpu+0x14/0x14 > ---[ end trace eeaad31bc52457e7 ]--- > > > Now the check_stack() function in kernel/trace/trace_stack.c has the > following: > > /* > * RCU may not be watching, make it see us. > * The stack trace code uses rcu_sched. > */ > rcu_irq_enter(); > > > What I'm guessing, is that because it is tracing the internals of the > rcu code, and checking its stack, the stack trace triggered in a > location where calling rcu_irq_enter() isn't sufficient to have rcu > become watchable. > > I may add code to check if rcu is indeed watching after the > rcu_irq_enter() call, and if it still isn't to simply exit out. > > Thoughts? Ouch!!! I really blew it telling you that you can use rcu_irq_enter()!!! RCU's eqs code disables interrupts, but has a call to the trace code exactly where it cannot handle an interrupt. Your rcu_irq_enter() therefore breaks things -- the ->dynticks_nesting has been set to zero, but ->dynticks still indicates non-idle. Therefore, rcu_irq_enter() sees zero nesting, and thus incorrectly increments ->dynticks, which tells RCU "idle". The later rcu_is_watching() checks therefore complain bitterly. I really don't want you to have to hack around this. So I should adjust RCU to avoid doing tracing during this period of time, which would allow you to continue using rcu_irq_enter() as you are now. (You don't trace from NMIs, do you? If you do, I guess I need to somehow disable tracing during the vulnerable period?) One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s trace_rcu_dyntick() statement into all its callers. Another would be to give rcu_eqs_enter_common() an additional argument that is the amount to subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the decrment after the tracing. The latter seems most reasonable at first glance, especially given that it confines the risky data-inconsistent period to a single contiguous piece of code. So how about the following lightly tested patch? I will check to see if something similar is needed for eqs exit. Could you please let me know if tracing happens in NMI handlers? If so, a bit of additional code will be needed. Thanx, Paul PS. Which reminds me, any short-term uses of RCU_TASKS? This represents 3 of my 16 test scenarios, which is getting hard to justify for something that isn't used. Especially given that I will need to add more scenarios for parallel-callbacks SRCU... ------------------------------------------------------------------------ commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e Author: Paul E. McKenney Date: Wed Apr 5 09:05:18 2017 -0700 rcu: Fix dyntick-idle tracing The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() (with my blessing) to allow the current _rcuidle alternative tracepoint name to be dispensed with while still maintaining good performance. Unfortunately, this causes RCU's dyntick-idle entry code's tracing to appear to RCU like an interrupt that occurs where RCU is not designed to handle interrupts. This commit fixes this problem by moving the zeroing of ->dynticks_nesting after the offending trace_rcu_dyntick() statement, which narrows the window of vulnerability to a pair of adjacent statements that are now marked with comments to that effect. Reported-by: Steven Rostedt Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 50fee7689e71..7308c88ee57a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state * - * If the new value of the ->dynticks_nesting counter now is zero, - * we really have entered idle, and must do the appropriate accounting. - * The caller must have disabled interrupts. + * Enter idle, doing appropriate accounting. The caller must have + * disabled interrupts. */ -static void rcu_eqs_enter_common(long long oldval, bool user) +static void rcu_eqs_enter_common(bool user) { struct rcu_state *rsp; struct rcu_data *rdp; - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); - trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)) { struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); - trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); + trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); rcu_ftrace_dump(DUMP_ORIG); WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", current->pid, current->comm, @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user) do_nocb_deferred_wakeup(rdp); } rcu_prepare_for_idle(); - rcu_dynticks_eqs_enter(); + rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ + rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ rcu_dynticks_task_enter(); /* @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user) */ static void rcu_eqs_enter(bool user) { - long long oldval; struct rcu_dynticks *rdtp; rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - (oldval & DYNTICK_TASK_NEST_MASK) == 0); - if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) { - rdtp->dynticks_nesting = 0; - rcu_eqs_enter_common(oldval, user); - } else { + (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); + if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) + rcu_eqs_enter_common(user); + else rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; - } } /** @@ -892,19 +888,18 @@ void rcu_user_enter(void) */ void rcu_irq_exit(void) { - long long oldval; struct rcu_dynticks *rdtp; RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!"); rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; - rdtp->dynticks_nesting--; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - rdtp->dynticks_nesting < 0); - if (rdtp->dynticks_nesting) - trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting); - else - rcu_eqs_enter_common(oldval, true); + rdtp->dynticks_nesting < 1); + if (rdtp->dynticks_nesting <= 1) { + rcu_eqs_enter_common(true); + } else { + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); + rdtp->dynticks_nesting--; + } rcu_sysidle_enter(1); }