* [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt @ 2017-04-05 13:32 Steven Rostedt 2017-04-05 16:25 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-05 13:32 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML Hi Paul, My tests are triggering the following when I have lockdep enabled and running the stack tracer: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51 Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 Call Trace: =============================== [ ERR: suspicious RCU usage. ] 4.11.0-rc3-test+ #48 Not tainted ------------------------------- /work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 0 RCU used illegally from extended quiescent state! 1 lock held by swapper/0/0: #0: (rcu_read_lock){......}, at: [<ffffffff810ebee3>] rcu_read_lock+0x0/0x61 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 Call Trace: dump_stack+0x68/0x92 __warn+0xc2/0xdd ? 0xffffffffa0026077 warn_slowpath_null+0x1d/0x1f module_assert_mutex_or_preempt+0x4f/0x51 __module_address+0x2a/0xac ? 0xffffffffa0026077 __module_text_address+0x12/0x59 ? 0xffffffffa0026077 is_module_text_address+0xe/0x16 __kernel_text_address+0x2b/0x80 ? 0xffffffffa0026077 unwind_get_return_address+0x50/0x5c __save_stack_trace+0x76/0xbf ? rcu_eqs_enter_common.constprop.71+0x5/0x108 save_stack_trace+0x1b/0x1d check_stack+0xec/0x24a stack_trace_call+0x40/0x53 <<-- This is the start of the stack tracer 0xffffffffa0026077 ? ftrace_graph_caller+0x78/0xa8 ? trace_hardirqs_off+0xd/0xf ? rcu_eqs_enter_common.constprop.71+0x5/0x108 rcu_eqs_enter_common.constprop.71+0x5/0x108 <<-- It appears to be testing the rcu internal code rcu_idle_enter+0x51/0x72 ? rcu_eqs_enter_common.constprop.71+0x5/0x108 ? rcu_idle_enter+0x51/0x72 do_idle+0xbb/0x1e5 cpu_startup_entry+0x1f/0x21 rest_init+0x127/0x12d start_kernel+0x3f5/0x402 x86_64_start_reservations+0x2f/0x31 x86_64_start_kernel+0x17f/0x192 start_cpu+0x14/0x14 ? start_cpu+0x14/0x14 ---[ end trace eeaad31bc52457e7 ]--- Now the check_stack() function in kernel/trace/trace_stack.c has the following: /* * RCU may not be watching, make it see us. * The stack trace code uses rcu_sched. */ rcu_irq_enter(); What I'm guessing, is that because it is tracing the internals of the rcu code, and checking its stack, the stack trace triggered in a location where calling rcu_irq_enter() isn't sufficient to have rcu become watchable. I may add code to check if rcu is indeed watching after the rcu_irq_enter() call, and if it still isn't to simply exit out. Thoughts? -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 13:32 [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt Steven Rostedt @ 2017-04-05 16:25 ` Paul E. McKenney 2017-04-05 16:45 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2017-04-05 16:25 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 09:32:07AM -0400, Steven Rostedt wrote: > Hi Paul, > > My tests are triggering the following when I have lockdep enabled and > running the stack tracer: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51 > Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample] > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 > Call Trace: > > =============================== > [ ERR: suspicious RCU usage. ] > 4.11.0-rc3-test+ #48 Not tainted > ------------------------------- > /work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle! > > other info that might help us debug this: > > > RCU used illegally from idle CPU! > rcu_scheduler_active = 2, debug_locks = 0 > RCU used illegally from extended quiescent state! > 1 lock held by swapper/0/0: > #0: (rcu_read_lock){......}, at: [<ffffffff810ebee3>] rcu_read_lock+0x0/0x61 > > stack backtrace: > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48 > Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014 > Call Trace: > dump_stack+0x68/0x92 > __warn+0xc2/0xdd > ? 0xffffffffa0026077 > warn_slowpath_null+0x1d/0x1f > module_assert_mutex_or_preempt+0x4f/0x51 > __module_address+0x2a/0xac > ? 0xffffffffa0026077 > __module_text_address+0x12/0x59 > ? 0xffffffffa0026077 > is_module_text_address+0xe/0x16 > __kernel_text_address+0x2b/0x80 > ? 0xffffffffa0026077 > unwind_get_return_address+0x50/0x5c > __save_stack_trace+0x76/0xbf > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > save_stack_trace+0x1b/0x1d > check_stack+0xec/0x24a > stack_trace_call+0x40/0x53 <<-- This is the start of the stack tracer > 0xffffffffa0026077 > ? ftrace_graph_caller+0x78/0xa8 > ? trace_hardirqs_off+0xd/0xf > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > rcu_eqs_enter_common.constprop.71+0x5/0x108 <<-- It appears to be testing the rcu internal code > rcu_idle_enter+0x51/0x72 > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > ? rcu_idle_enter+0x51/0x72 > do_idle+0xbb/0x1e5 > cpu_startup_entry+0x1f/0x21 > rest_init+0x127/0x12d > start_kernel+0x3f5/0x402 > x86_64_start_reservations+0x2f/0x31 > x86_64_start_kernel+0x17f/0x192 > start_cpu+0x14/0x14 > ? start_cpu+0x14/0x14 > ---[ end trace eeaad31bc52457e7 ]--- > > > Now the check_stack() function in kernel/trace/trace_stack.c has the > following: > > /* > * RCU may not be watching, make it see us. > * The stack trace code uses rcu_sched. > */ > rcu_irq_enter(); > > > What I'm guessing, is that because it is tracing the internals of the > rcu code, and checking its stack, the stack trace triggered in a > location where calling rcu_irq_enter() isn't sufficient to have rcu > become watchable. > > I may add code to check if rcu is indeed watching after the > rcu_irq_enter() call, and if it still isn't to simply exit out. > > Thoughts? Ouch!!! I really blew it telling you that you can use rcu_irq_enter()!!! RCU's eqs code disables interrupts, but has a call to the trace code exactly where it cannot handle an interrupt. Your rcu_irq_enter() therefore breaks things -- the ->dynticks_nesting has been set to zero, but ->dynticks still indicates non-idle. Therefore, rcu_irq_enter() sees zero nesting, and thus incorrectly increments ->dynticks, which tells RCU "idle". The later rcu_is_watching() checks therefore complain bitterly. I really don't want you to have to hack around this. So I should adjust RCU to avoid doing tracing during this period of time, which would allow you to continue using rcu_irq_enter() as you are now. (You don't trace from NMIs, do you? If you do, I guess I need to somehow disable tracing during the vulnerable period?) One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s trace_rcu_dyntick() statement into all its callers. Another would be to give rcu_eqs_enter_common() an additional argument that is the amount to subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the decrment after the tracing. The latter seems most reasonable at first glance, especially given that it confines the risky data-inconsistent period to a single contiguous piece of code. So how about the following lightly tested patch? I will check to see if something similar is needed for eqs exit. Could you please let me know if tracing happens in NMI handlers? If so, a bit of additional code will be needed. Thanx, Paul PS. Which reminds me, any short-term uses of RCU_TASKS? This represents 3 of my 16 test scenarios, which is getting hard to justify for something that isn't used. Especially given that I will need to add more scenarios for parallel-callbacks SRCU... ------------------------------------------------------------------------ commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Wed Apr 5 09:05:18 2017 -0700 rcu: Fix dyntick-idle tracing The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() (with my blessing) to allow the current _rcuidle alternative tracepoint name to be dispensed with while still maintaining good performance. Unfortunately, this causes RCU's dyntick-idle entry code's tracing to appear to RCU like an interrupt that occurs where RCU is not designed to handle interrupts. This commit fixes this problem by moving the zeroing of ->dynticks_nesting after the offending trace_rcu_dyntick() statement, which narrows the window of vulnerability to a pair of adjacent statements that are now marked with comments to that effect. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 50fee7689e71..7308c88ee57a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state * - * If the new value of the ->dynticks_nesting counter now is zero, - * we really have entered idle, and must do the appropriate accounting. - * The caller must have disabled interrupts. + * Enter idle, doing appropriate accounting. The caller must have + * disabled interrupts. */ -static void rcu_eqs_enter_common(long long oldval, bool user) +static void rcu_eqs_enter_common(bool user) { struct rcu_state *rsp; struct rcu_data *rdp; - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); - trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)) { struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); - trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); + trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); rcu_ftrace_dump(DUMP_ORIG); WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", current->pid, current->comm, @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user) do_nocb_deferred_wakeup(rdp); } rcu_prepare_for_idle(); - rcu_dynticks_eqs_enter(); + rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ + rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ rcu_dynticks_task_enter(); /* @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user) */ static void rcu_eqs_enter(bool user) { - long long oldval; struct rcu_dynticks *rdtp; rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - (oldval & DYNTICK_TASK_NEST_MASK) == 0); - if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) { - rdtp->dynticks_nesting = 0; - rcu_eqs_enter_common(oldval, user); - } else { + (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); + if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) + rcu_eqs_enter_common(user); + else rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; - } } /** @@ -892,19 +888,18 @@ void rcu_user_enter(void) */ void rcu_irq_exit(void) { - long long oldval; struct rcu_dynticks *rdtp; RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!"); rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; - rdtp->dynticks_nesting--; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - rdtp->dynticks_nesting < 0); - if (rdtp->dynticks_nesting) - trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting); - else - rcu_eqs_enter_common(oldval, true); + rdtp->dynticks_nesting < 1); + if (rdtp->dynticks_nesting <= 1) { + rcu_eqs_enter_common(true); + } else { + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); + rdtp->dynticks_nesting--; + } rcu_sysidle_enter(1); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 16:25 ` Paul E. McKenney @ 2017-04-05 16:45 ` Steven Rostedt 2017-04-05 17:59 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-05 16:45 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 09:25:15 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > Thoughts? > > Ouch!!! > > I really blew it telling you that you can use rcu_irq_enter()!!! > > RCU's eqs code disables interrupts, but has a call to the trace code > exactly where it cannot handle an interrupt. Your rcu_irq_enter() > therefore breaks things -- the ->dynticks_nesting has been set to zero, > but ->dynticks still indicates non-idle. Therefore, rcu_irq_enter() sees > zero nesting, and thus incorrectly increments ->dynticks, which tells RCU > "idle". The later rcu_is_watching() checks therefore complain bitterly. > > I really don't want you to have to hack around this. So I should adjust > RCU to avoid doing tracing during this period of time, which would allow > you to continue using rcu_irq_enter() as you are now. (You don't trace > from NMIs, do you? If you do, I guess I need to somehow disable tracing > during the vulnerable period?) Note, this is the stack tracer. It attaches itself to the function tracer, which traces all functions and checks the stack size for each function it calls. It performs a stack trace when it sees a stack larger than any stack it has seen before. It just so happens that it saw a larger stack when tracing inside the rcu functions. Note, this has nothing to do with tracepoints or TRACE_EVENT()s. Nor does it have any issue with function tracing itself and yes, function tracing does trace NMIs, but doesn't need RCU. And yes, stack tracing just punts (returns doing nothing) if it sees that it is in an NMI. Especially since stack tracing grabs locks. The problem is that the recording of the stack (save_stack_trace) does utilize RCU, and that's where it complains. We don't need to stop tracing, we just need to stop stack tracing. I'm fine with adding another check to see if we are in a critical RCU section, and just not trace the stack there. > > One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s > trace_rcu_dyntick() statement into all its callers. Another would be to > give rcu_eqs_enter_common() an additional argument that is the amount to > subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the > decrment after the tracing. The latter seems most reasonable at first > glance, especially given that it confines the risky data-inconsistent > period to a single contiguous piece of code. Note, this has nothing to do with trace_rcu_dyntick(). It's the function tracer tracing inside RCU, calling the stack tracer to record a new stack if it sees its larger than any stack before. All I need is a way to tell the stack tracer to not record a stack if it is in this RCU critical section. If you can add a "in_rcu_critical_section()" function, that the stack tracer can test, and simply exit out like it does if in_nmi() is set, that would work too. Below is my current work around. > > So how about the following lightly tested patch? > > I will check to see if something similar is needed for eqs exit. > Could you please let me know if tracing happens in NMI handlers? > If so, a bit of additional code will be needed. > > Thanx, Paul > > PS. Which reminds me, any short-term uses of RCU_TASKS? This represents > 3 of my 16 test scenarios, which is getting hard to justify for > something that isn't used. Especially given that I will need to > add more scenarios for parallel-callbacks SRCU... The RCU_TASK implementation is next on my todo list. Yes, there's going to be plenty of users very soon. Not for 4.12 but definitely for 4.13. Sorry for the delay in implementing that :-/ -- Steve diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c index 5fb1f2c87e6b..ac7241cf8c9c 100644 --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -105,6 +105,10 @@ check_stack(unsigned long ip, unsigned long *stack) */ rcu_irq_enter(); + /* If we traced rcu internals, rcu may still not be watching */ + if (unlikely(!rcu_is_watching())) + goto out; + /* In case another CPU set the tracer_frame on us */ if (unlikely(!frame_size)) this_size -= tracer_frame; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 16:45 ` Steven Rostedt @ 2017-04-05 17:59 ` Paul E. McKenney 2017-04-05 18:54 ` Steven Rostedt 2017-04-06 2:12 ` Steven Rostedt 0 siblings, 2 replies; 16+ messages in thread From: Paul E. McKenney @ 2017-04-05 17:59 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 12:45:39PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 09:25:15 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > Thoughts? > > > > Ouch!!! > > > > I really blew it telling you that you can use rcu_irq_enter()!!! > > > > RCU's eqs code disables interrupts, but has a call to the trace code > > exactly where it cannot handle an interrupt. Your rcu_irq_enter() > > therefore breaks things -- the ->dynticks_nesting has been set to zero, > > but ->dynticks still indicates non-idle. Therefore, rcu_irq_enter() sees > > zero nesting, and thus incorrectly increments ->dynticks, which tells RCU > > "idle". The later rcu_is_watching() checks therefore complain bitterly. > > > > I really don't want you to have to hack around this. So I should adjust > > RCU to avoid doing tracing during this period of time, which would allow > > you to continue using rcu_irq_enter() as you are now. (You don't trace > > from NMIs, do you? If you do, I guess I need to somehow disable tracing > > during the vulnerable period?) > > Note, this is the stack tracer. It attaches itself to the function > tracer, which traces all functions and checks the stack size for each > function it calls. It performs a stack trace when it sees a stack > larger than any stack it has seen before. It just so happens that it > saw a larger stack when tracing inside the rcu functions. > > Note, this has nothing to do with tracepoints or TRACE_EVENT()s. Nor > does it have any issue with function tracing itself and yes, function > tracing does trace NMIs, but doesn't need RCU. And yes, stack tracing > just punts (returns doing nothing) if it sees that it is in an NMI. > Especially since stack tracing grabs locks. Whew!!! ;-) > The problem is that the recording of the stack (save_stack_trace) does > utilize RCU, and that's where it complains. > > We don't need to stop tracing, we just need to stop stack tracing. I'm > fine with adding another check to see if we are in a critical RCU > section, and just not trace the stack there. > > > > > One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s > > trace_rcu_dyntick() statement into all its callers. Another would be to > > give rcu_eqs_enter_common() an additional argument that is the amount to > > subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the > > decrment after the tracing. The latter seems most reasonable at first > > glance, especially given that it confines the risky data-inconsistent > > period to a single contiguous piece of code. > > Note, this has nothing to do with trace_rcu_dyntick(). It's the > function tracer tracing inside RCU, calling the stack tracer to record > a new stack if it sees its larger than any stack before. All I need is > a way to tell the stack tracer to not record a stack if it is in this > RCU critical section. > > If you can add a "in_rcu_critical_section()" function, that the stack > tracer can test, and simply exit out like it does if in_nmi() is set, > that would work too. Below is my current work around. Except that the rcu_irq_enter() would already have triggered the bug that was (allegedly) fixed by my earlier patch. So, yes, the check for rcu_is_watching() would work around this bug, but the hope is that with my earlier fix, this workaround would not be needed. So could you please test my earlier patch? This patch does not conflict with anything on -rcu, so you could carry it if that helps. > > So how about the following lightly tested patch? > > > > I will check to see if something similar is needed for eqs exit. I did verify that the eqs-exit path does not have this problem, FWIW. > > Could you please let me know if tracing happens in NMI handlers? > > If so, a bit of additional code will be needed. > > > > Thanx, Paul > > > > PS. Which reminds me, any short-term uses of RCU_TASKS? This represents > > 3 of my 16 test scenarios, which is getting hard to justify for > > something that isn't used. Especially given that I will need to > > add more scenarios for parallel-callbacks SRCU... > > The RCU_TASK implementation is next on my todo list. Yes, there's going > to be plenty of users very soon. Not for 4.12 but definitely for 4.13. > > Sorry for the delay in implementing that :-/ OK, I will wait a few months before checking again... Thanx, Paul > -- Steve > > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > index 5fb1f2c87e6b..ac7241cf8c9c 100644 > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -105,6 +105,10 @@ check_stack(unsigned long ip, unsigned long *stack) > */ > rcu_irq_enter(); > > + /* If we traced rcu internals, rcu may still not be watching */ > + if (unlikely(!rcu_is_watching())) > + goto out; > + > /* In case another CPU set the tracer_frame on us */ > if (unlikely(!frame_size)) > this_size -= tracer_frame; > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 17:59 ` Paul E. McKenney @ 2017-04-05 18:54 ` Steven Rostedt 2017-04-05 19:08 ` Paul E. McKenney 2017-04-06 2:12 ` Steven Rostedt 1 sibling, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-05 18:54 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 10:59:25 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > Note, this has nothing to do with trace_rcu_dyntick(). It's the > > function tracer tracing inside RCU, calling the stack tracer to record > > a new stack if it sees its larger than any stack before. All I need is > > a way to tell the stack tracer to not record a stack if it is in this > > RCU critical section. > > > > If you can add a "in_rcu_critical_section()" function, that the stack > > tracer can test, and simply exit out like it does if in_nmi() is set, > > that would work too. Below is my current work around. > > Except that the rcu_irq_enter() would already have triggered the bug > that was (allegedly) fixed by my earlier patch. So, yes, the check for > rcu_is_watching() would work around this bug, but the hope is that > with my earlier fix, this workaround would not be needed. Note, if I had a "in_rcu_critical_section()" I wouldn't need to call rcu_irq_enter(). I could fall out before that. My current workaround does the check, even though it breaks things, it would hopefully fix things as it calls rcu_irq_exit() immediately. Would I would have is: if (in_rcu_critical_section()) goto out; rcu_irq_enter(); which would probably be the easiest fix. > > So could you please test my earlier patch? I could, but it wouldn't tell me anything immediately. It's a hard race to hit. Which I never could hit it when I tried, but it would appear to hit immediately when testing other things :-p Remember, it only triggers when a new max stack size is hit. The bug happens when that new max stack size is in the rcu critical section. I guess I could force it to trigger by inserting a call in your code that clears the max stack size. -- Steve > > This patch does not conflict with anything on -rcu, so you could > carry it if that helps. > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 18:54 ` Steven Rostedt @ 2017-04-05 19:08 ` Paul E. McKenney 2017-04-05 19:21 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2017-04-05 19:08 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 10:59:25 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > Note, this has nothing to do with trace_rcu_dyntick(). It's the > > > function tracer tracing inside RCU, calling the stack tracer to record > > > a new stack if it sees its larger than any stack before. All I need is > > > a way to tell the stack tracer to not record a stack if it is in this > > > RCU critical section. > > > > > > If you can add a "in_rcu_critical_section()" function, that the stack > > > tracer can test, and simply exit out like it does if in_nmi() is set, > > > that would work too. Below is my current work around. > > > > Except that the rcu_irq_enter() would already have triggered the bug > > that was (allegedly) fixed by my earlier patch. So, yes, the check for > > rcu_is_watching() would work around this bug, but the hope is that > > with my earlier fix, this workaround would not be needed. > > Note, if I had a "in_rcu_critical_section()" I wouldn't need to call > rcu_irq_enter(). I could fall out before that. My current workaround > does the check, even though it breaks things, it would hopefully fix > things as it calls rcu_irq_exit() immediately. OK, color me confused. What would "in_rcu_critical_section()" do? The rcu_is_watching() function tells you that RCU is not in an extended quiescent state, though its return value can be iffy in the middle of rcu_eqs_enter_common() -- which is why interrupts are disabled there. In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth() to determine whether you are within an RCU read-side critical section, which is what in_rcu_critical_section() sounds like to me, but I don't see how this information would help in this situation. What am I missing here? > Would I would have is: > > if (in_rcu_critical_section()) > goto out; > > rcu_irq_enter(); > > which would probably be the easiest fix. > > > > > > So could you please test my earlier patch? > > I could, but it wouldn't tell me anything immediately. It's a hard race > to hit. Which I never could hit it when I tried, but it would appear to > hit immediately when testing other things :-p > > Remember, it only triggers when a new max stack size is hit. The bug > happens when that new max stack size is in the rcu critical section. > > I guess I could force it to trigger by inserting a call in your code > that clears the max stack size. I know it is a bit of a hassle, but I would really appreciate the testing. Thanx, Paul > -- Steve > > > > > This patch does not conflict with anything on -rcu, so you could > > carry it if that helps. > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 19:08 ` Paul E. McKenney @ 2017-04-05 19:21 ` Steven Rostedt 2017-04-05 19:39 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-05 19:21 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 12:08:10 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote: > > On Wed, 5 Apr 2017 10:59:25 -0700 > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > > > Note, this has nothing to do with trace_rcu_dyntick(). It's the > > > > function tracer tracing inside RCU, calling the stack tracer to record > > > > a new stack if it sees its larger than any stack before. All I need is > > > > a way to tell the stack tracer to not record a stack if it is in this > > > > RCU critical section. > > > > > > > > If you can add a "in_rcu_critical_section()" function, that the stack > > > > tracer can test, and simply exit out like it does if in_nmi() is set, > > > > that would work too. Below is my current work around. > > > > > > Except that the rcu_irq_enter() would already have triggered the bug > > > that was (allegedly) fixed by my earlier patch. So, yes, the check for > > > rcu_is_watching() would work around this bug, but the hope is that > > > with my earlier fix, this workaround would not be needed. > > > > Note, if I had a "in_rcu_critical_section()" I wouldn't need to call > > rcu_irq_enter(). I could fall out before that. My current workaround > > does the check, even though it breaks things, it would hopefully fix > > things as it calls rcu_irq_exit() immediately. > > OK, color me confused. What would "in_rcu_critical_section()" do? > > The rcu_is_watching() function tells you that RCU is not in an extended > quiescent state, though its return value can be iffy in the middle of > rcu_eqs_enter_common() -- which is why interrupts are disabled there. > In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth() > to determine whether you are within an RCU read-side critical section, > which is what in_rcu_critical_section() sounds like to me, but I don't > see how this information would help in this situation. > > What am I missing here? > Would in_guts_of_internal_rcu_infrastructure_code() work? :-) Here's the crucial part of that stack dump again: save_stack_trace+0x1b/0x1d check_stack+0xec/0x24a stack_trace_call+0x40/0x53 0xffffffffa0026077 ? ftrace_graph_caller+0x78/0xa8 ? trace_hardirqs_off+0xd/0xf ? rcu_eqs_enter_common.constprop.71+0x5/0x108 rcu_eqs_enter_common.constprop.71+0x5/0x108 rcu_idle_enter+0x51/0x72 The stack trace was called on rcu_eqs_enter_common() inside the rcu_idle_enter() function call. Perhaps if I just let rcu disable stack tracing? Something like this: diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 50fee76..f894fc3 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -853,8 +853,10 @@ void rcu_idle_enter(void) unsigned long flags; local_irq_save(flags); + disable_stack_tracer(); rcu_eqs_enter(false); rcu_sysidle_enter(0); + enable_stack_tracer(); local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_enter); -- Steve ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 19:21 ` Steven Rostedt @ 2017-04-05 19:39 ` Paul E. McKenney 2017-04-05 19:52 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2017-04-05 19:39 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 03:21:41PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 12:08:10 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote: > > > On Wed, 5 Apr 2017 10:59:25 -0700 > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > > > > > Note, this has nothing to do with trace_rcu_dyntick(). It's the > > > > > function tracer tracing inside RCU, calling the stack tracer to record > > > > > a new stack if it sees its larger than any stack before. All I need is > > > > > a way to tell the stack tracer to not record a stack if it is in this > > > > > RCU critical section. > > > > > > > > > > If you can add a "in_rcu_critical_section()" function, that the stack > > > > > tracer can test, and simply exit out like it does if in_nmi() is set, > > > > > that would work too. Below is my current work around. > > > > > > > > Except that the rcu_irq_enter() would already have triggered the bug > > > > that was (allegedly) fixed by my earlier patch. So, yes, the check for > > > > rcu_is_watching() would work around this bug, but the hope is that > > > > with my earlier fix, this workaround would not be needed. > > > > > > Note, if I had a "in_rcu_critical_section()" I wouldn't need to call > > > rcu_irq_enter(). I could fall out before that. My current workaround > > > does the check, even though it breaks things, it would hopefully fix > > > things as it calls rcu_irq_exit() immediately. > > > > OK, color me confused. What would "in_rcu_critical_section()" do? > > > > The rcu_is_watching() function tells you that RCU is not in an extended > > quiescent state, though its return value can be iffy in the middle of > > rcu_eqs_enter_common() -- which is why interrupts are disabled there. > > In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth() > > to determine whether you are within an RCU read-side critical section, > > which is what in_rcu_critical_section() sounds like to me, but I don't > > see how this information would help in this situation. > > > > What am I missing here? > > Would in_guts_of_internal_rcu_infrastructure_code() work? :-) something_is_happening_here_what_it_is_isnt_clear_there_is_a_man_with_a_gun_over_there_telling_me_to_beware()? ;-) > Here's the crucial part of that stack dump again: > > save_stack_trace+0x1b/0x1d > check_stack+0xec/0x24a > stack_trace_call+0x40/0x53 > 0xffffffffa0026077 > ? ftrace_graph_caller+0x78/0xa8 > ? trace_hardirqs_off+0xd/0xf > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > rcu_eqs_enter_common.constprop.71+0x5/0x108 > rcu_idle_enter+0x51/0x72 > > > The stack trace was called on rcu_eqs_enter_common() inside the > rcu_idle_enter() function call. > > Perhaps if I just let rcu disable stack tracing? Something like this: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 50fee76..f894fc3 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -853,8 +853,10 @@ void rcu_idle_enter(void) > unsigned long flags; > > local_irq_save(flags); > + disable_stack_tracer(); > rcu_eqs_enter(false); > rcu_sysidle_enter(0); > + enable_stack_tracer(); > local_irq_restore(flags); > } > EXPORT_SYMBOL_GPL(rcu_idle_enter); That only covers one of two code paths, and still end up with the trace code thinking that RCU is paying attention to it when it really is not. But if there is a disable_stack_tracer() and enable_stack_tracer(), I could use them as shown below. (Hand-edited patch, probably doesn't apply, let alone build.) Thanx, Paul ------------------------------------------------------------------------ commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Wed Apr 5 09:05:18 2017 -0700 rcu: Fix dyntick-idle tracing The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() (with my blessing) to allow the current _rcuidle alternative tracepoint name to be dispensed with while still maintaining good performance. Unfortunately, this causes RCU's dyntick-idle entry code's tracing to appear to RCU like an interrupt that occurs where RCU is not designed to handle interrupts. This commit fixes this problem by moving the zeroing of ->dynticks_nesting after the offending trace_rcu_dyntick() statement, which narrows the window of vulnerability to a pair of adjacent statements that are now marked with comments to that effect. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 50fee7689e71..7308c88ee57a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) } /* - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state * - * If the new value of the ->dynticks_nesting counter now is zero, - * we really have entered idle, and must do the appropriate accounting. - * The caller must have disabled interrupts. + * Enter idle, doing appropriate accounting. The caller must have + * disabled interrupts. */ -static void rcu_eqs_enter_common(long long oldval, bool user) +static void rcu_eqs_enter_common(bool user) { struct rcu_state *rsp; struct rcu_data *rdp; - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); - trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && !is_idle_task(current)) { struct task_struct *idle __maybe_unused = idle_task(smp_processor_id()); - trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); + trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); rcu_ftrace_dump(DUMP_ORIG); WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", current->pid, current->comm, @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user) do_nocb_deferred_wakeup(rdp); } rcu_prepare_for_idle(); - rcu_dynticks_eqs_enter(); + disable_stack_tracer(); + rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ + rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ + enable_stack_tracer(); rcu_dynticks_task_enter(); /* @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user) */ static void rcu_eqs_enter(bool user) { - long long oldval; struct rcu_dynticks *rdtp; rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - (oldval & DYNTICK_TASK_NEST_MASK) == 0); - if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) { - rdtp->dynticks_nesting = 0; - rcu_eqs_enter_common(oldval, user); - } else { + (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); + if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) + rcu_eqs_enter_common(user); + else rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; - } } /** @@ -892,19 +888,18 @@ void rcu_user_enter(void) */ void rcu_irq_exit(void) { - long long oldval; struct rcu_dynticks *rdtp; RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!"); rdtp = this_cpu_ptr(&rcu_dynticks); - oldval = rdtp->dynticks_nesting; - rdtp->dynticks_nesting--; WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && - rdtp->dynticks_nesting < 0); - if (rdtp->dynticks_nesting) - trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting); - else - rcu_eqs_enter_common(oldval, true); + rdtp->dynticks_nesting < 1); + if (rdtp->dynticks_nesting <= 1) { + rcu_eqs_enter_common(true); + } else { + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); + rdtp->dynticks_nesting--; + } rcu_sysidle_enter(1); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 19:39 ` Paul E. McKenney @ 2017-04-05 19:52 ` Steven Rostedt 2017-04-05 20:42 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-05 19:52 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 12:39:28 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > Here's the crucial part of that stack dump again: > > > > save_stack_trace+0x1b/0x1d > > check_stack+0xec/0x24a > > stack_trace_call+0x40/0x53 > > 0xffffffffa0026077 > > ? ftrace_graph_caller+0x78/0xa8 > > ? trace_hardirqs_off+0xd/0xf > > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > > rcu_eqs_enter_common.constprop.71+0x5/0x108 > > rcu_idle_enter+0x51/0x72 > > > > > > The stack trace was called on rcu_eqs_enter_common() inside the > > rcu_idle_enter() function call. > > > > Perhaps if I just let rcu disable stack tracing? Something like this: > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 50fee76..f894fc3 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -853,8 +853,10 @@ void rcu_idle_enter(void) > > unsigned long flags; > > > > local_irq_save(flags); > > + disable_stack_tracer(); > > rcu_eqs_enter(false); > > rcu_sysidle_enter(0); > > + enable_stack_tracer(); > > local_irq_restore(flags); > > } > > EXPORT_SYMBOL_GPL(rcu_idle_enter); > > That only covers one of two code paths, and still end up with the > trace code thinking that RCU is paying attention to it when it really > is not. OK, that sounds like a separate bug to what I'm tracking. > > But if there is a disable_stack_tracer() and enable_stack_tracer(), > I could use them as shown below. (Hand-edited patch, probably doesn't > apply, let alone build.) > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Wed Apr 5 09:05:18 2017 -0700 > > rcu: Fix dyntick-idle tracing > > The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() > (with my blessing) to allow the current _rcuidle alternative tracepoint > name to be dispensed with while still maintaining good performance. > Unfortunately, this causes RCU's dyntick-idle entry code's tracing to > appear to RCU like an interrupt that occurs where RCU is not designed > to handle interrupts. > > This commit fixes this problem by moving the zeroing of ->dynticks_nesting > after the offending trace_rcu_dyntick() statement, which narrows the > window of vulnerability to a pair of adjacent statements that are now > marked with comments to that effect. > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 50fee7689e71..7308c88ee57a 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) > } > > /* > - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state > + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state > * > - * If the new value of the ->dynticks_nesting counter now is zero, > - * we really have entered idle, and must do the appropriate accounting. > - * The caller must have disabled interrupts. > + * Enter idle, doing appropriate accounting. The caller must have > + * disabled interrupts. > */ > -static void rcu_eqs_enter_common(long long oldval, bool user) > +static void rcu_eqs_enter_common(bool user) > { > struct rcu_state *rsp; > struct rcu_data *rdp; > - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) > + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); > > - trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); > + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); > if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > !user && !is_idle_task(current)) { > struct task_struct *idle __maybe_unused = > idle_task(smp_processor_id()); > > - trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); > + trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); > rcu_ftrace_dump(DUMP_ORIG); > WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", > current->pid, current->comm, > @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user) > do_nocb_deferred_wakeup(rdp); > } > rcu_prepare_for_idle(); > - rcu_dynticks_eqs_enter(); > + disable_stack_tracer(); > + rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ > + rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ > + enable_stack_tracer(); OK, do you want me to send you a patch, or should I take your patch into my tree and add this? -- Steve > rcu_dynticks_task_enter(); > > /* > @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user) > */ > static void rcu_eqs_enter(bool user) > { > - long long oldval; > struct rcu_dynticks *rdtp; > > rdtp = this_cpu_ptr(&rcu_dynticks); > - oldval = rdtp->dynticks_nesting; > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > - (oldval & DYNTICK_TASK_NEST_MASK) == 0); > - if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) { > - rdtp->dynticks_nesting = 0; > - rcu_eqs_enter_common(oldval, user); > - } else { > + (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); > + if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) > + rcu_eqs_enter_common(user); > + else > rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; > - } > } > > /** > @@ -892,19 +888,18 @@ void rcu_user_enter(void) > */ > void rcu_irq_exit(void) > { > - long long oldval; > struct rcu_dynticks *rdtp; > > RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!"); > rdtp = this_cpu_ptr(&rcu_dynticks); > - oldval = rdtp->dynticks_nesting; > - rdtp->dynticks_nesting--; > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > - rdtp->dynticks_nesting < 0); > - if (rdtp->dynticks_nesting) > - trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting); > - else > - rcu_eqs_enter_common(oldval, true); > + rdtp->dynticks_nesting < 1); > + if (rdtp->dynticks_nesting <= 1) { > + rcu_eqs_enter_common(true); > + } else { > + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); > + rdtp->dynticks_nesting--; > + } > rcu_sysidle_enter(1); > } > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 19:52 ` Steven Rostedt @ 2017-04-05 20:42 ` Paul E. McKenney 2017-04-06 1:31 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2017-04-05 20:42 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 03:52:23PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 12:39:28 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > Here's the crucial part of that stack dump again: > > > > > > save_stack_trace+0x1b/0x1d > > > check_stack+0xec/0x24a > > > stack_trace_call+0x40/0x53 > > > 0xffffffffa0026077 > > > ? ftrace_graph_caller+0x78/0xa8 > > > ? trace_hardirqs_off+0xd/0xf > > > ? rcu_eqs_enter_common.constprop.71+0x5/0x108 > > > rcu_eqs_enter_common.constprop.71+0x5/0x108 > > > rcu_idle_enter+0x51/0x72 > > > > > > > > > The stack trace was called on rcu_eqs_enter_common() inside the > > > rcu_idle_enter() function call. > > > > > > Perhaps if I just let rcu disable stack tracing? Something like this: > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 50fee76..f894fc3 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -853,8 +853,10 @@ void rcu_idle_enter(void) > > > unsigned long flags; > > > > > > local_irq_save(flags); > > > + disable_stack_tracer(); > > > rcu_eqs_enter(false); > > > rcu_sysidle_enter(0); > > > + enable_stack_tracer(); > > > local_irq_restore(flags); > > > } > > > EXPORT_SYMBOL_GPL(rcu_idle_enter); > > > > That only covers one of two code paths, and still end up with the > > trace code thinking that RCU is paying attention to it when it really > > is not. > > OK, that sounds like a separate bug to what I'm tracking. > > > > > But if there is a disable_stack_tracer() and enable_stack_tracer(), > > I could use them as shown below. (Hand-edited patch, probably doesn't > > apply, let alone build.) > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Wed Apr 5 09:05:18 2017 -0700 > > > > rcu: Fix dyntick-idle tracing > > > > The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() > > (with my blessing) to allow the current _rcuidle alternative tracepoint > > name to be dispensed with while still maintaining good performance. > > Unfortunately, this causes RCU's dyntick-idle entry code's tracing to > > appear to RCU like an interrupt that occurs where RCU is not designed > > to handle interrupts. > > > > This commit fixes this problem by moving the zeroing of ->dynticks_nesting > > after the offending trace_rcu_dyntick() statement, which narrows the > > window of vulnerability to a pair of adjacent statements that are now > > marked with comments to that effect. > > > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 50fee7689e71..7308c88ee57a 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) > > } > > > > /* > > - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state > > + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state > > * > > - * If the new value of the ->dynticks_nesting counter now is zero, > > - * we really have entered idle, and must do the appropriate accounting. > > - * The caller must have disabled interrupts. > > + * Enter idle, doing appropriate accounting. The caller must have > > + * disabled interrupts. > > */ > > -static void rcu_eqs_enter_common(long long oldval, bool user) > > +static void rcu_eqs_enter_common(bool user) > > { > > struct rcu_state *rsp; > > struct rcu_data *rdp; > > - RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);) > > + struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks); > > > > - trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting); > > + trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0); > > if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > > !user && !is_idle_task(current)) { > > struct task_struct *idle __maybe_unused = > > idle_task(smp_processor_id()); > > > > - trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0); > > + trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0); > > rcu_ftrace_dump(DUMP_ORIG); > > WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s", > > current->pid, current->comm, > > @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user) > > do_nocb_deferred_wakeup(rdp); > > } > > rcu_prepare_for_idle(); > > - rcu_dynticks_eqs_enter(); > > + disable_stack_tracer(); > > + rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */ > > + rcu_dynticks_eqs_enter(); /* After this, tracing works again. */ > > + enable_stack_tracer(); > > OK, do you want me to send you a patch, or should I take your patch > into my tree and add this? Given that I don't seem to have disable_stack_tracer() and enable_stack_tracer(), could you please add them in? Thanx, Paul > -- Steve > > > rcu_dynticks_task_enter(); > > > > /* > > @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user) > > */ > > static void rcu_eqs_enter(bool user) > > { > > - long long oldval; > > struct rcu_dynticks *rdtp; > > > > rdtp = this_cpu_ptr(&rcu_dynticks); > > - oldval = rdtp->dynticks_nesting; > > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > > - (oldval & DYNTICK_TASK_NEST_MASK) == 0); > > - if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) { > > - rdtp->dynticks_nesting = 0; > > - rcu_eqs_enter_common(oldval, user); > > - } else { > > + (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0); > > + if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) > > + rcu_eqs_enter_common(user); > > + else > > rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; > > - } > > } > > > > /** > > @@ -892,19 +888,18 @@ void rcu_user_enter(void) > > */ > > void rcu_irq_exit(void) > > { > > - long long oldval; > > struct rcu_dynticks *rdtp; > > > > RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!"); > > rdtp = this_cpu_ptr(&rcu_dynticks); > > - oldval = rdtp->dynticks_nesting; > > - rdtp->dynticks_nesting--; > > WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && > > - rdtp->dynticks_nesting < 0); > > - if (rdtp->dynticks_nesting) > > - trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting); > > - else > > - rcu_eqs_enter_common(oldval, true); > > + rdtp->dynticks_nesting < 1); > > + if (rdtp->dynticks_nesting <= 1) { > > + rcu_eqs_enter_common(true); > > + } else { > > + trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1); > > + rdtp->dynticks_nesting--; > > + } > > rcu_sysidle_enter(1); > > } > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 20:42 ` Paul E. McKenney @ 2017-04-06 1:31 ` Steven Rostedt 2017-04-06 4:07 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-06 1:31 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 13:42:29 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > OK, do you want me to send you a patch, or should I take your patch > > into my tree and add this? > > Given that I don't seem to have disable_stack_tracer() and > enable_stack_tracer(), could you please add them in? That's what I meant about "send you a patch" :-) -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-06 1:31 ` Steven Rostedt @ 2017-04-06 4:07 ` Paul E. McKenney 0 siblings, 0 replies; 16+ messages in thread From: Paul E. McKenney @ 2017-04-06 4:07 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 09:31:45PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 13:42:29 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > OK, do you want me to send you a patch, or should I take your patch > > > into my tree and add this? > > > > Given that I don't seem to have disable_stack_tracer() and > > enable_stack_tracer(), could you please add them in? > > That's what I meant about "send you a patch" :-) Reading/comprehension-challenged today, aren't I? This doesn't conflict with anything in my tree, so either your tree or my tree is fine by me. The advantage of your tree is that you could easily sync it up with the other changes needed. Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-05 17:59 ` Paul E. McKenney 2017-04-05 18:54 ` Steven Rostedt @ 2017-04-06 2:12 ` Steven Rostedt 2017-04-06 4:15 ` Paul E. McKenney 1 sibling, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-06 2:12 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 10:59:25 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > Could you please let me know if tracing happens in NMI handlers? > > > If so, a bit of additional code will be needed. > > > > > > Thanx, Paul > > > > > > PS. Which reminds me, any short-term uses of RCU_TASKS? This represents > > > 3 of my 16 test scenarios, which is getting hard to justify for > > > something that isn't used. Especially given that I will need to > > > add more scenarios for parallel-callbacks SRCU... > > > > The RCU_TASK implementation is next on my todo list. Yes, there's going > > to be plenty of users very soon. Not for 4.12 but definitely for 4.13. > > > > Sorry for the delay in implementing that :-/ > > OK, I will wait a few months before checking again... > Actually, I took a quick look at what needs to be done, and I think it is *really* easy, and may be available in 4.12! Here's the current patch. I can probably do a patch to allow optimized kprobes on PREEMPT kernels as well. -- Steve diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 8efd9fe..28e3019 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command) * callers are done before leaving this function. * The same goes for freeing the per_cpu data of the per_cpu * ops. - * - * Again, normal synchronize_sched() is not good enough. - * We need to do a hard force of sched synchronization. - * This is because we use preempt_disable() to do RCU, but - * the function tracers can be called where RCU is not watching - * (like before user_exit()). We can not rely on the RCU - * infrastructure to do the synchronization, thus we must do it - * ourselves. */ if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) { + /* + * We need to do a hard force of sched synchronization. + * This is because we use preempt_disable() to do RCU, but + * the function tracers can be called where RCU is not watching + * (like before user_exit()). We can not rely on the RCU + * infrastructure to do the synchronization, thus we must do it + * ourselves. + */ schedule_on_each_cpu(ftrace_sync); +#ifdef CONFIG_PREEMPT + /* + * When the kernel is preeptive, tasks can be preempted + * while on a ftrace trampoline. Just scheduling a task on + * a CPU is not good enough to flush them. Calling + * synchronize_rcu_tasks() will wait for those tasks to + * execute and either schedule voluntarily or enter user space. + */ + synchronize_rcu_tasks(); +#endif arch_ftrace_trampoline_free(ops); if (ops->flags & FTRACE_OPS_FL_PER_CPU) @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops) static void ftrace_update_trampoline(struct ftrace_ops *ops) { - -/* - * Currently there's no safe way to free a trampoline when the kernel - * is configured with PREEMPT. That is because a task could be preempted - * when it jumped to the trampoline, it may be preempted for a long time - * depending on the system load, and currently there's no way to know - * when it will be off the trampoline. If the trampoline is freed - * too early, when the task runs again, it will be executing on freed - * memory and crash. - */ -#ifdef CONFIG_PREEMPT - /* Currently, only non dynamic ops can have a trampoline */ - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) - return; -#endif - arch_ftrace_update_trampoline(ops); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-06 2:12 ` Steven Rostedt @ 2017-04-06 4:15 ` Paul E. McKenney 2017-04-06 14:14 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: Paul E. McKenney @ 2017-04-06 4:15 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Wed, Apr 05, 2017 at 10:12:24PM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 10:59:25 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > > Could you please let me know if tracing happens in NMI handlers? > > > > If so, a bit of additional code will be needed. > > > > > > > > Thanx, Paul > > > > > > > > PS. Which reminds me, any short-term uses of RCU_TASKS? This represents > > > > 3 of my 16 test scenarios, which is getting hard to justify for > > > > something that isn't used. Especially given that I will need to > > > > add more scenarios for parallel-callbacks SRCU... > > > > > > The RCU_TASK implementation is next on my todo list. Yes, there's going > > > to be plenty of users very soon. Not for 4.12 but definitely for 4.13. > > > > > > Sorry for the delay in implementing that :-/ > > > > OK, I will wait a few months before checking again... > > > > Actually, I took a quick look at what needs to be done, and I think it > is *really* easy, and may be available in 4.12! Here's the current > patch. Cool!!! > I can probably do a patch to allow optimized kprobes on PREEMPT kernels > as well. > > -- Steve > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 8efd9fe..28e3019 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command) > * callers are done before leaving this function. > * The same goes for freeing the per_cpu data of the per_cpu > * ops. > - * > - * Again, normal synchronize_sched() is not good enough. > - * We need to do a hard force of sched synchronization. > - * This is because we use preempt_disable() to do RCU, but > - * the function tracers can be called where RCU is not watching > - * (like before user_exit()). We can not rely on the RCU > - * infrastructure to do the synchronization, thus we must do it > - * ourselves. > */ > if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) { > + /* > + * We need to do a hard force of sched synchronization. > + * This is because we use preempt_disable() to do RCU, but > + * the function tracers can be called where RCU is not watching > + * (like before user_exit()). We can not rely on the RCU > + * infrastructure to do the synchronization, thus we must do it > + * ourselves. > + */ > schedule_on_each_cpu(ftrace_sync); Great header comment on ftrace_sync(): "Yes, function tracing is rude." And schedule_on_each_cpu() looks like a great workqueue gatling gun! ;-) > +#ifdef CONFIG_PREEMPT > + /* > + * When the kernel is preeptive, tasks can be preempted > + * while on a ftrace trampoline. Just scheduling a task on > + * a CPU is not good enough to flush them. Calling > + * synchronize_rcu_tasks() will wait for those tasks to > + * execute and either schedule voluntarily or enter user space. > + */ > + synchronize_rcu_tasks(); > +#endif How about this to save a line? if (IS_ENABLED(CONFIG_PREEMPT)) synchronize_rcu_tasks(); One thing that might speed this up a bit (or might not) would be to doe the schedule_on_each_cpu() from a delayed workqueue. That way, if any of the activity from schedule_on_each_cpu() involved a voluntary context switch (from a cond_resched() or some such), then synchronize_rcu_tasks() would get the benefit of that context switch. You would need a flush_work() to wait for that delayed workqueue as well, of course. Not sure whether it is worth it, but figured I should pass it along. > arch_ftrace_trampoline_free(ops); > > if (ops->flags & FTRACE_OPS_FL_PER_CPU) > @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops) > > static void ftrace_update_trampoline(struct ftrace_ops *ops) > { > - > -/* > - * Currently there's no safe way to free a trampoline when the kernel > - * is configured with PREEMPT. That is because a task could be preempted > - * when it jumped to the trampoline, it may be preempted for a long time > - * depending on the system load, and currently there's no way to know > - * when it will be off the trampoline. If the trampoline is freed > - * too early, when the task runs again, it will be executing on freed > - * memory and crash. > - */ > -#ifdef CONFIG_PREEMPT > - /* Currently, only non dynamic ops can have a trampoline */ > - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) > - return; > -#endif > - > arch_ftrace_update_trampoline(ops); > } Agreed, straightforward patch! Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-06 4:15 ` Paul E. McKenney @ 2017-04-06 14:14 ` Steven Rostedt 2017-04-06 14:59 ` Paul E. McKenney 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2017-04-06 14:14 UTC (permalink / raw) To: Paul E. McKenney; +Cc: LKML On Wed, 5 Apr 2017 21:15:15 -0700 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: \> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > > index 8efd9fe..28e3019 100644 > > --- a/kernel/trace/ftrace.c > > +++ b/kernel/trace/ftrace.c > > @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command) > > * callers are done before leaving this function. > > * The same goes for freeing the per_cpu data of the per_cpu > > * ops. > > - * > > - * Again, normal synchronize_sched() is not good enough. > > - * We need to do a hard force of sched synchronization. > > - * This is because we use preempt_disable() to do RCU, but > > - * the function tracers can be called where RCU is not watching > > - * (like before user_exit()). We can not rely on the RCU > > - * infrastructure to do the synchronization, thus we must do it > > - * ourselves. > > */ > > if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) { > > + /* > > + * We need to do a hard force of sched synchronization. > > + * This is because we use preempt_disable() to do RCU, but > > + * the function tracers can be called where RCU is not watching > > + * (like before user_exit()). We can not rely on the RCU > > + * infrastructure to do the synchronization, thus we must do it > > + * ourselves. > > + */ > > schedule_on_each_cpu(ftrace_sync); > > Great header comment on ftrace_sync(): "Yes, function tracing is rude." > And schedule_on_each_cpu() looks like a great workqueue gatling gun! ;-) > > > +#ifdef CONFIG_PREEMPT > > + /* > > + * When the kernel is preeptive, tasks can be preempted > > + * while on a ftrace trampoline. Just scheduling a task on > > + * a CPU is not good enough to flush them. Calling > > + * synchronize_rcu_tasks() will wait for those tasks to > > + * execute and either schedule voluntarily or enter user space. > > + */ > > + synchronize_rcu_tasks(); > > +#endif > > How about this to save a line? > > if (IS_ENABLED(CONFIG_PREEMPT)) > synchronize_rcu_tasks(); Ah, this works as gcc optimizes it out. Otherwise I received a compile error with synchronize_rcu_tasks() not defined. But that's because I never enabled CONFIG_TASKS_RCU. > > One thing that might speed this up a bit (or might not) would be to > doe the schedule_on_each_cpu() from a delayed workqueue. That way, > if any of the activity from schedule_on_each_cpu() involved a voluntary > context switch (from a cond_resched() or some such), then > synchronize_rcu_tasks() would get the benefit of that context switch. > > You would need a flush_work() to wait for that delayed workqueue > as well, of course. This is a very slow path, I'm not too interested in making it complex to speed it up. > > Not sure whether it is worth it, but figured I should pass it along. > > > arch_ftrace_trampoline_free(ops); > > > > if (ops->flags & FTRACE_OPS_FL_PER_CPU) > > @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops) > > > > static void ftrace_update_trampoline(struct ftrace_ops *ops) > > { > > - > > -/* > > - * Currently there's no safe way to free a trampoline when the kernel > > - * is configured with PREEMPT. That is because a task could be preempted > > - * when it jumped to the trampoline, it may be preempted for a long time > > - * depending on the system load, and currently there's no way to know > > - * when it will be off the trampoline. If the trampoline is freed > > - * too early, when the task runs again, it will be executing on freed > > - * memory and crash. > > - */ > > -#ifdef CONFIG_PREEMPT > > - /* Currently, only non dynamic ops can have a trampoline */ > > - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) > > - return; > > -#endif > > - > > arch_ftrace_update_trampoline(ops); > > } > > Agreed, straightforward patch! Great, I'll start making it official then. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt 2017-04-06 14:14 ` Steven Rostedt @ 2017-04-06 14:59 ` Paul E. McKenney 0 siblings, 0 replies; 16+ messages in thread From: Paul E. McKenney @ 2017-04-06 14:59 UTC (permalink / raw) To: Steven Rostedt; +Cc: LKML On Thu, Apr 06, 2017 at 10:14:25AM -0400, Steven Rostedt wrote: > On Wed, 5 Apr 2017 21:15:15 -0700 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > \> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > > > index 8efd9fe..28e3019 100644 > > > --- a/kernel/trace/ftrace.c > > > +++ b/kernel/trace/ftrace.c > > > @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command) > > > * callers are done before leaving this function. > > > * The same goes for freeing the per_cpu data of the per_cpu > > > * ops. > > > - * > > > - * Again, normal synchronize_sched() is not good enough. > > > - * We need to do a hard force of sched synchronization. > > > - * This is because we use preempt_disable() to do RCU, but > > > - * the function tracers can be called where RCU is not watching > > > - * (like before user_exit()). We can not rely on the RCU > > > - * infrastructure to do the synchronization, thus we must do it > > > - * ourselves. > > > */ > > > if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) { > > > + /* > > > + * We need to do a hard force of sched synchronization. > > > + * This is because we use preempt_disable() to do RCU, but > > > + * the function tracers can be called where RCU is not watching > > > + * (like before user_exit()). We can not rely on the RCU > > > + * infrastructure to do the synchronization, thus we must do it > > > + * ourselves. > > > + */ > > > schedule_on_each_cpu(ftrace_sync); > > > > Great header comment on ftrace_sync(): "Yes, function tracing is rude." > > And schedule_on_each_cpu() looks like a great workqueue gatling gun! ;-) > > > > > +#ifdef CONFIG_PREEMPT > > > + /* > > > + * When the kernel is preeptive, tasks can be preempted > > > + * while on a ftrace trampoline. Just scheduling a task on > > > + * a CPU is not good enough to flush them. Calling > > > + * synchronize_rcu_tasks() will wait for those tasks to > > > + * execute and either schedule voluntarily or enter user space. > > > + */ > > > + synchronize_rcu_tasks(); > > > +#endif > > > > How about this to save a line? > > > > if (IS_ENABLED(CONFIG_PREEMPT)) > > synchronize_rcu_tasks(); > > Ah, this works as gcc optimizes it out. Otherwise I received a compile > error with synchronize_rcu_tasks() not defined. But that's because I > never enabled CONFIG_TASKS_RCU. Should I define a synchronize_rcu_tasks() that does BUG() for this case? > > One thing that might speed this up a bit (or might not) would be to > > doe the schedule_on_each_cpu() from a delayed workqueue. That way, > > if any of the activity from schedule_on_each_cpu() involved a voluntary > > context switch (from a cond_resched() or some such), then > > synchronize_rcu_tasks() would get the benefit of that context switch. > > > > You would need a flush_work() to wait for that delayed workqueue > > as well, of course. > > This is a very slow path, I'm not too interested in making it complex > to speed it up. Makes sense to me! > > Not sure whether it is worth it, but figured I should pass it along. > > > > > arch_ftrace_trampoline_free(ops); > > > > > > if (ops->flags & FTRACE_OPS_FL_PER_CPU) > > > @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops) > > > > > > static void ftrace_update_trampoline(struct ftrace_ops *ops) > > > { > > > - > > > -/* > > > - * Currently there's no safe way to free a trampoline when the kernel > > > - * is configured with PREEMPT. That is because a task could be preempted > > > - * when it jumped to the trampoline, it may be preempted for a long time > > > - * depending on the system load, and currently there's no way to know > > > - * when it will be off the trampoline. If the trampoline is freed > > > - * too early, when the task runs again, it will be executing on freed > > > - * memory and crash. > > > - */ > > > -#ifdef CONFIG_PREEMPT > > > - /* Currently, only non dynamic ops can have a trampoline */ > > > - if (ops->flags & FTRACE_OPS_FL_DYNAMIC) > > > - return; > > > -#endif > > > - > > > arch_ftrace_update_trampoline(ops); > > > } > > > > Agreed, straightforward patch! > > Great, I'll start making it official then. Sounds very good! Thanx, Paul ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-04-06 14:59 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-05 13:32 [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt Steven Rostedt 2017-04-05 16:25 ` Paul E. McKenney 2017-04-05 16:45 ` Steven Rostedt 2017-04-05 17:59 ` Paul E. McKenney 2017-04-05 18:54 ` Steven Rostedt 2017-04-05 19:08 ` Paul E. McKenney 2017-04-05 19:21 ` Steven Rostedt 2017-04-05 19:39 ` Paul E. McKenney 2017-04-05 19:52 ` Steven Rostedt 2017-04-05 20:42 ` Paul E. McKenney 2017-04-06 1:31 ` Steven Rostedt 2017-04-06 4:07 ` Paul E. McKenney 2017-04-06 2:12 ` Steven Rostedt 2017-04-06 4:15 ` Paul E. McKenney 2017-04-06 14:14 ` Steven Rostedt 2017-04-06 14:59 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).