LKML Archive on lore.kernel.org
 help / color / Atom feed
* [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
@ 2017-04-05 13:32 Steven Rostedt
  2017-04-05 16:25 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-05 13:32 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

Hi Paul,

My tests are triggering the following when I have lockdep enabled and
running the stack tracer:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51
Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample]
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:

===============================
[ ERR: suspicious RCU usage.  ]
4.11.0-rc3-test+ #48 Not tainted
-------------------------------
/work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle!

other info that might help us debug this:


RCU used illegally from idle CPU!
rcu_scheduler_active = 2, debug_locks = 0
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
 #0:  (rcu_read_lock){......}, at: [<ffffffff810ebee3>] rcu_read_lock+0x0/0x61

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
Call Trace:
 dump_stack+0x68/0x92
 __warn+0xc2/0xdd
 ? 0xffffffffa0026077
 warn_slowpath_null+0x1d/0x1f
 module_assert_mutex_or_preempt+0x4f/0x51
 __module_address+0x2a/0xac
 ? 0xffffffffa0026077
 __module_text_address+0x12/0x59
 ? 0xffffffffa0026077
 is_module_text_address+0xe/0x16
 __kernel_text_address+0x2b/0x80
 ? 0xffffffffa0026077
 unwind_get_return_address+0x50/0x5c
 __save_stack_trace+0x76/0xbf
 ? rcu_eqs_enter_common.constprop.71+0x5/0x108
 save_stack_trace+0x1b/0x1d
 check_stack+0xec/0x24a
 stack_trace_call+0x40/0x53     <<-- This is the start of the stack tracer
 0xffffffffa0026077
 ? ftrace_graph_caller+0x78/0xa8
 ? trace_hardirqs_off+0xd/0xf
 ? rcu_eqs_enter_common.constprop.71+0x5/0x108
 rcu_eqs_enter_common.constprop.71+0x5/0x108   <<-- It appears to be testing the rcu internal code
 rcu_idle_enter+0x51/0x72
 ? rcu_eqs_enter_common.constprop.71+0x5/0x108
 ? rcu_idle_enter+0x51/0x72
 do_idle+0xbb/0x1e5
 cpu_startup_entry+0x1f/0x21
 rest_init+0x127/0x12d
 start_kernel+0x3f5/0x402
 x86_64_start_reservations+0x2f/0x31
 x86_64_start_kernel+0x17f/0x192
 start_cpu+0x14/0x14
 ? start_cpu+0x14/0x14
---[ end trace eeaad31bc52457e7 ]---


Now the check_stack() function in kernel/trace/trace_stack.c has the
following:

	/*
	 * RCU may not be watching, make it see us.
	 * The stack trace code uses rcu_sched.
	 */
	rcu_irq_enter();


What I'm guessing, is that because it is tracing the internals of the
rcu code, and checking its stack, the stack trace triggered in a
location where calling rcu_irq_enter() isn't sufficient to have rcu
become watchable.

I may add code to check if rcu is indeed watching after the
rcu_irq_enter() call, and if it still isn't to simply exit out.

Thoughts?

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 13:32 [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt Steven Rostedt
@ 2017-04-05 16:25 ` Paul E. McKenney
  2017-04-05 16:45   ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-05 16:25 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 09:32:07AM -0400, Steven Rostedt wrote:
> Hi Paul,
> 
> My tests are triggering the following when I have lockdep enabled and
> running the stack tracer:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 0 at /work/autotest/nobackup/linux-test.git/kernel/module.c:271 module_assert_mutex_or_preempt+0x4f/0x51
> Modules linked in: ppdev parport_pc parport [last unloaded: trace_events_sample]
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
> 
> ===============================
> [ ERR: suspicious RCU usage.  ]
> 4.11.0-rc3-test+ #48 Not tainted
> -------------------------------
> /work/autotest/nobackup/linux-test.git/include/linux/rcupdate.h:837 rcu_read_lock() used illegally while idle!
> 
> other info that might help us debug this:
> 
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 2, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
>  #0:  (rcu_read_lock){......}, at: [<ffffffff810ebee3>] rcu_read_lock+0x0/0x61
> 
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc3-test+ #48
> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
> Call Trace:
>  dump_stack+0x68/0x92
>  __warn+0xc2/0xdd
>  ? 0xffffffffa0026077
>  warn_slowpath_null+0x1d/0x1f
>  module_assert_mutex_or_preempt+0x4f/0x51
>  __module_address+0x2a/0xac
>  ? 0xffffffffa0026077
>  __module_text_address+0x12/0x59
>  ? 0xffffffffa0026077
>  is_module_text_address+0xe/0x16
>  __kernel_text_address+0x2b/0x80
>  ? 0xffffffffa0026077
>  unwind_get_return_address+0x50/0x5c
>  __save_stack_trace+0x76/0xbf
>  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
>  save_stack_trace+0x1b/0x1d
>  check_stack+0xec/0x24a
>  stack_trace_call+0x40/0x53     <<-- This is the start of the stack tracer
>  0xffffffffa0026077
>  ? ftrace_graph_caller+0x78/0xa8
>  ? trace_hardirqs_off+0xd/0xf
>  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
>  rcu_eqs_enter_common.constprop.71+0x5/0x108   <<-- It appears to be testing the rcu internal code
>  rcu_idle_enter+0x51/0x72
>  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
>  ? rcu_idle_enter+0x51/0x72
>  do_idle+0xbb/0x1e5
>  cpu_startup_entry+0x1f/0x21
>  rest_init+0x127/0x12d
>  start_kernel+0x3f5/0x402
>  x86_64_start_reservations+0x2f/0x31
>  x86_64_start_kernel+0x17f/0x192
>  start_cpu+0x14/0x14
>  ? start_cpu+0x14/0x14
> ---[ end trace eeaad31bc52457e7 ]---
> 
> 
> Now the check_stack() function in kernel/trace/trace_stack.c has the
> following:
> 
> 	/*
> 	 * RCU may not be watching, make it see us.
> 	 * The stack trace code uses rcu_sched.
> 	 */
> 	rcu_irq_enter();
> 
> 
> What I'm guessing, is that because it is tracing the internals of the
> rcu code, and checking its stack, the stack trace triggered in a
> location where calling rcu_irq_enter() isn't sufficient to have rcu
> become watchable.
> 
> I may add code to check if rcu is indeed watching after the
> rcu_irq_enter() call, and if it still isn't to simply exit out.
> 
> Thoughts?

Ouch!!!

I really blew it telling you that you can use rcu_irq_enter()!!!

RCU's eqs code disables interrupts, but has a call to the trace code
exactly where it cannot handle an interrupt.  Your rcu_irq_enter()
therefore breaks things -- the ->dynticks_nesting has been set to zero,
but ->dynticks still indicates non-idle.  Therefore, rcu_irq_enter() sees
zero nesting, and thus incorrectly increments ->dynticks, which tells RCU
"idle".  The later rcu_is_watching() checks therefore complain bitterly.

I really don't want you to have to hack around this.  So I should adjust
RCU to avoid doing tracing during this period of time, which would allow
you to continue using rcu_irq_enter() as you are now.  (You don't trace
from NMIs, do you?  If you do, I guess I need to somehow disable tracing
during the vulnerable period?)

One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s
trace_rcu_dyntick() statement into all its callers.  Another would be to
give rcu_eqs_enter_common() an additional argument that is the amount to
subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the
decrment after the tracing.  The latter seems most reasonable at first
glance, especially given that it confines the risky data-inconsistent
period to a single contiguous piece of code.

So how about the following lightly tested patch?

I will check to see if something similar is needed for eqs exit.
Could you please let me know if tracing happens in NMI handlers?
If so, a bit of additional code will be needed.

							Thanx, Paul

PS.  Which reminds me, any short-term uses of RCU_TASKS?  This represents
     3 of my 16 test scenarios, which is getting hard to justify for
     something that isn't used.  Especially given that I will need to
     add more scenarios for parallel-callbacks SRCU...

------------------------------------------------------------------------

commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Apr 5 09:05:18 2017 -0700

    rcu: Fix dyntick-idle tracing
    
    The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
    (with my blessing) to allow the current _rcuidle alternative tracepoint
    name to be dispensed with while still maintaining good performance.
    Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
    appear to RCU like an interrupt that occurs where RCU is not designed
    to handle interrupts.
    
    This commit fixes this problem by moving the zeroing of ->dynticks_nesting
    after the offending trace_rcu_dyntick() statement, which narrows the
    window of vulnerability to a pair of adjacent statements that are now
    marked with comments to that effect.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50fee7689e71..7308c88ee57a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 }
 
 /*
- * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state
+ * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
  *
- * If the new value of the ->dynticks_nesting counter now is zero,
- * we really have entered idle, and must do the appropriate accounting.
- * The caller must have disabled interrupts.
+ * Enter idle, doing appropriate accounting.  The caller must have
+ * disabled interrupts.
  */
-static void rcu_eqs_enter_common(long long oldval, bool user)
+static void rcu_eqs_enter_common(bool user)
 {
 	struct rcu_state *rsp;
 	struct rcu_data *rdp;
-	RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
+	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
 
-	trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
+	trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
 	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 	    !user && !is_idle_task(current)) {
 		struct task_struct *idle __maybe_unused =
 			idle_task(smp_processor_id());
 
-		trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
+		trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
 		rcu_ftrace_dump(DUMP_ORIG);
 		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
 			  current->pid, current->comm,
@@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
 		do_nocb_deferred_wakeup(rdp);
 	}
 	rcu_prepare_for_idle();
-	rcu_dynticks_eqs_enter();
+	rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
+	rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
 	rcu_dynticks_task_enter();
 
 	/*
@@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
  */
 static void rcu_eqs_enter(bool user)
 {
-	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	rdtp = this_cpu_ptr(&rcu_dynticks);
-	oldval = rdtp->dynticks_nesting;
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
-		     (oldval & DYNTICK_TASK_NEST_MASK) == 0);
-	if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
-		rdtp->dynticks_nesting = 0;
-		rcu_eqs_enter_common(oldval, user);
-	} else {
+		     (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
+	if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
+		rcu_eqs_enter_common(user);
+	else
 		rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
-	}
 }
 
 /**
@@ -892,19 +888,18 @@ void rcu_user_enter(void)
  */
 void rcu_irq_exit(void)
 {
-	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
 	rdtp = this_cpu_ptr(&rcu_dynticks);
-	oldval = rdtp->dynticks_nesting;
-	rdtp->dynticks_nesting--;
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
-		     rdtp->dynticks_nesting < 0);
-	if (rdtp->dynticks_nesting)
-		trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
-	else
-		rcu_eqs_enter_common(oldval, true);
+		     rdtp->dynticks_nesting < 1);
+	if (rdtp->dynticks_nesting <= 1) {
+		rcu_eqs_enter_common(true);
+	} else {
+		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
+		rdtp->dynticks_nesting--;
+	}
 	rcu_sysidle_enter(1);
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 16:25 ` Paul E. McKenney
@ 2017-04-05 16:45   ` Steven Rostedt
  2017-04-05 17:59     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-05 16:45 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 09:25:15 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > Thoughts?  
> 
> Ouch!!!
> 
> I really blew it telling you that you can use rcu_irq_enter()!!!
> 
> RCU's eqs code disables interrupts, but has a call to the trace code
> exactly where it cannot handle an interrupt.  Your rcu_irq_enter()
> therefore breaks things -- the ->dynticks_nesting has been set to zero,
> but ->dynticks still indicates non-idle.  Therefore, rcu_irq_enter() sees
> zero nesting, and thus incorrectly increments ->dynticks, which tells RCU
> "idle".  The later rcu_is_watching() checks therefore complain bitterly.
> 
> I really don't want you to have to hack around this.  So I should adjust
> RCU to avoid doing tracing during this period of time, which would allow
> you to continue using rcu_irq_enter() as you are now.  (You don't trace
> from NMIs, do you?  If you do, I guess I need to somehow disable tracing
> during the vulnerable period?)

Note, this is the stack tracer. It attaches itself to the function
tracer, which traces all functions and checks the stack size for each
function it calls. It performs a stack trace when it sees a stack
larger than any stack it has seen before. It just so happens that it
saw a larger stack when tracing inside the rcu functions.

Note, this has nothing to do with tracepoints or TRACE_EVENT()s. Nor
does it have any issue with function tracing itself and yes, function
tracing does trace NMIs, but doesn't need RCU. And yes, stack tracing
just punts (returns doing nothing) if it sees that it is in an NMI.
Especially since stack tracing grabs locks.

The problem is that the recording of the stack (save_stack_trace) does
utilize RCU, and that's where it complains.

We don't need to stop tracing, we just need to stop stack tracing. I'm
fine with adding another check to see if we are in a critical RCU
section, and just not trace the stack there.

> 
> One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s
> trace_rcu_dyntick() statement into all its callers.  Another would be to
> give rcu_eqs_enter_common() an additional argument that is the amount to
> subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the
> decrment after the tracing.  The latter seems most reasonable at first
> glance, especially given that it confines the risky data-inconsistent
> period to a single contiguous piece of code.

Note, this has nothing to do with trace_rcu_dyntick(). It's the
function tracer tracing inside RCU, calling the stack tracer to record
a new stack if it sees its larger than any stack before. All I need is
a way to tell the stack tracer to not record a stack if it is in this
RCU critical section.

If you can add a "in_rcu_critical_section()" function, that the stack
tracer can test, and simply exit out like it does if in_nmi() is set,
that would work too. Below is my current work around.

> 
> So how about the following lightly tested patch?
> 
> I will check to see if something similar is needed for eqs exit.
> Could you please let me know if tracing happens in NMI handlers?
> If so, a bit of additional code will be needed.
> 
> 							Thanx, Paul
> 
> PS.  Which reminds me, any short-term uses of RCU_TASKS?  This represents
>      3 of my 16 test scenarios, which is getting hard to justify for
>      something that isn't used.  Especially given that I will need to
>      add more scenarios for parallel-callbacks SRCU...

The RCU_TASK implementation is next on my todo list. Yes, there's going
to be plenty of users very soon. Not for 4.12 but definitely for 4.13.

Sorry for the delay in implementing that :-/

-- Steve


diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 5fb1f2c87e6b..ac7241cf8c9c 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -105,6 +105,10 @@ check_stack(unsigned long ip, unsigned long *stack)
 	 */
 	rcu_irq_enter();
 
+	/* If we traced rcu internals, rcu may still not be watching */
+	if (unlikely(!rcu_is_watching()))
+		goto out;
+
 	/* In case another CPU set the tracer_frame on us */
 	if (unlikely(!frame_size))
 		this_size -= tracer_frame;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 16:45   ` Steven Rostedt
@ 2017-04-05 17:59     ` Paul E. McKenney
  2017-04-05 18:54       ` Steven Rostedt
  2017-04-06  2:12       ` Steven Rostedt
  0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-05 17:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 12:45:39PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 09:25:15 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > Thoughts?  
> > 
> > Ouch!!!
> > 
> > I really blew it telling you that you can use rcu_irq_enter()!!!
> > 
> > RCU's eqs code disables interrupts, but has a call to the trace code
> > exactly where it cannot handle an interrupt.  Your rcu_irq_enter()
> > therefore breaks things -- the ->dynticks_nesting has been set to zero,
> > but ->dynticks still indicates non-idle.  Therefore, rcu_irq_enter() sees
> > zero nesting, and thus incorrectly increments ->dynticks, which tells RCU
> > "idle".  The later rcu_is_watching() checks therefore complain bitterly.
> > 
> > I really don't want you to have to hack around this.  So I should adjust
> > RCU to avoid doing tracing during this period of time, which would allow
> > you to continue using rcu_irq_enter() as you are now.  (You don't trace
> > from NMIs, do you?  If you do, I guess I need to somehow disable tracing
> > during the vulnerable period?)
> 
> Note, this is the stack tracer. It attaches itself to the function
> tracer, which traces all functions and checks the stack size for each
> function it calls. It performs a stack trace when it sees a stack
> larger than any stack it has seen before. It just so happens that it
> saw a larger stack when tracing inside the rcu functions.
> 
> Note, this has nothing to do with tracepoints or TRACE_EVENT()s. Nor
> does it have any issue with function tracing itself and yes, function
> tracing does trace NMIs, but doesn't need RCU. And yes, stack tracing
> just punts (returns doing nothing) if it sees that it is in an NMI.
> Especially since stack tracing grabs locks.

Whew!!!  ;-)

> The problem is that the recording of the stack (save_stack_trace) does
> utilize RCU, and that's where it complains.
> 
> We don't need to stop tracing, we just need to stop stack tracing. I'm
> fine with adding another check to see if we are in a critical RCU
> section, and just not trace the stack there.
> 
> > 
> > One simple-and-stupid fix would be for me to move rcu_eqs_enter_common()'s
> > trace_rcu_dyntick() statement into all its callers.  Another would be to
> > give rcu_eqs_enter_common() an additional argument that is the amount to
> > subtract from ->dynticks_nesting, and have rcu_eqs_enter_common() do the
> > decrment after the tracing.  The latter seems most reasonable at first
> > glance, especially given that it confines the risky data-inconsistent
> > period to a single contiguous piece of code.
> 
> Note, this has nothing to do with trace_rcu_dyntick(). It's the
> function tracer tracing inside RCU, calling the stack tracer to record
> a new stack if it sees its larger than any stack before. All I need is
> a way to tell the stack tracer to not record a stack if it is in this
> RCU critical section.
> 
> If you can add a "in_rcu_critical_section()" function, that the stack
> tracer can test, and simply exit out like it does if in_nmi() is set,
> that would work too. Below is my current work around.

Except that the rcu_irq_enter() would already have triggered the bug
that was (allegedly) fixed by my earlier patch.  So, yes, the check for
rcu_is_watching() would work around this bug, but the hope is that
with my earlier fix, this workaround would not be needed.

So could you please test my earlier patch?

This patch does not conflict with anything on -rcu, so you could
carry it if that helps.

> > So how about the following lightly tested patch?
> > 
> > I will check to see if something similar is needed for eqs exit.

I did verify that the eqs-exit path does not have this problem, FWIW.

> > Could you please let me know if tracing happens in NMI handlers?
> > If so, a bit of additional code will be needed.
> > 
> > 							Thanx, Paul
> > 
> > PS.  Which reminds me, any short-term uses of RCU_TASKS?  This represents
> >      3 of my 16 test scenarios, which is getting hard to justify for
> >      something that isn't used.  Especially given that I will need to
> >      add more scenarios for parallel-callbacks SRCU...
> 
> The RCU_TASK implementation is next on my todo list. Yes, there's going
> to be plenty of users very soon. Not for 4.12 but definitely for 4.13.
> 
> Sorry for the delay in implementing that :-/

OK, I will wait a few months before checking again...

							Thanx, Paul

> -- Steve
> 
> 
> diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
> index 5fb1f2c87e6b..ac7241cf8c9c 100644
> --- a/kernel/trace/trace_stack.c
> +++ b/kernel/trace/trace_stack.c
> @@ -105,6 +105,10 @@ check_stack(unsigned long ip, unsigned long *stack)
>  	 */
>  	rcu_irq_enter();
> 
> +	/* If we traced rcu internals, rcu may still not be watching */
> +	if (unlikely(!rcu_is_watching()))
> +		goto out;
> +
>  	/* In case another CPU set the tracer_frame on us */
>  	if (unlikely(!frame_size))
>  		this_size -= tracer_frame;
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 17:59     ` Paul E. McKenney
@ 2017-04-05 18:54       ` Steven Rostedt
  2017-04-05 19:08         ` Paul E. McKenney
  2017-04-06  2:12       ` Steven Rostedt
  1 sibling, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-05 18:54 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 10:59:25 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > Note, this has nothing to do with trace_rcu_dyntick(). It's the
> > function tracer tracing inside RCU, calling the stack tracer to record
> > a new stack if it sees its larger than any stack before. All I need is
> > a way to tell the stack tracer to not record a stack if it is in this
> > RCU critical section.
> > 
> > If you can add a "in_rcu_critical_section()" function, that the stack
> > tracer can test, and simply exit out like it does if in_nmi() is set,
> > that would work too. Below is my current work around.  
> 
> Except that the rcu_irq_enter() would already have triggered the bug
> that was (allegedly) fixed by my earlier patch.  So, yes, the check for
> rcu_is_watching() would work around this bug, but the hope is that
> with my earlier fix, this workaround would not be needed.

Note, if I had a "in_rcu_critical_section()" I wouldn't need to call
rcu_irq_enter(). I could fall out before that. My current workaround
does the check, even though it breaks things, it would hopefully fix
things as it calls rcu_irq_exit() immediately.

Would I would have is:

	if (in_rcu_critical_section())
		goto out;

	rcu_irq_enter();

which would probably be the easiest fix.


> 
> So could you please test my earlier patch?

I could, but it wouldn't tell me anything immediately. It's a hard race
to hit. Which I never could hit it when I tried, but it would appear to
hit immediately when testing other things :-p

Remember, it only triggers when a new max stack size is hit. The bug
happens when that new max stack size is in the rcu critical section.

I guess I could force it to trigger by inserting a call in your code
that clears the max stack size.

-- Steve

> 
> This patch does not conflict with anything on -rcu, so you could
> carry it if that helps.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 18:54       ` Steven Rostedt
@ 2017-04-05 19:08         ` Paul E. McKenney
  2017-04-05 19:21           ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-05 19:08 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 10:59:25 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > Note, this has nothing to do with trace_rcu_dyntick(). It's the
> > > function tracer tracing inside RCU, calling the stack tracer to record
> > > a new stack if it sees its larger than any stack before. All I need is
> > > a way to tell the stack tracer to not record a stack if it is in this
> > > RCU critical section.
> > > 
> > > If you can add a "in_rcu_critical_section()" function, that the stack
> > > tracer can test, and simply exit out like it does if in_nmi() is set,
> > > that would work too. Below is my current work around.  
> > 
> > Except that the rcu_irq_enter() would already have triggered the bug
> > that was (allegedly) fixed by my earlier patch.  So, yes, the check for
> > rcu_is_watching() would work around this bug, but the hope is that
> > with my earlier fix, this workaround would not be needed.
> 
> Note, if I had a "in_rcu_critical_section()" I wouldn't need to call
> rcu_irq_enter(). I could fall out before that. My current workaround
> does the check, even though it breaks things, it would hopefully fix
> things as it calls rcu_irq_exit() immediately.

OK, color me confused.  What would "in_rcu_critical_section()" do?

The rcu_is_watching() function tells you that RCU is not in an extended
quiescent state, though its return value can be iffy in the middle of
rcu_eqs_enter_common() -- which is why interrupts are disabled there.
In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth()
to determine whether you are within an RCU read-side critical section,
which is what in_rcu_critical_section() sounds like to me, but I don't
see how this information would help in this situation.

What am I missing here?

> Would I would have is:
> 
> 	if (in_rcu_critical_section())
> 		goto out;
> 
> 	rcu_irq_enter();
> 
> which would probably be the easiest fix.
> 
> 
> > 
> > So could you please test my earlier patch?
> 
> I could, but it wouldn't tell me anything immediately. It's a hard race
> to hit. Which I never could hit it when I tried, but it would appear to
> hit immediately when testing other things :-p
> 
> Remember, it only triggers when a new max stack size is hit. The bug
> happens when that new max stack size is in the rcu critical section.
> 
> I guess I could force it to trigger by inserting a call in your code
> that clears the max stack size.

I know it is a bit of a hassle, but I would really appreciate the
testing.

							Thanx, Paul

> -- Steve
> 
> > 
> > This patch does not conflict with anything on -rcu, so you could
> > carry it if that helps.
> > 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 19:08         ` Paul E. McKenney
@ 2017-04-05 19:21           ` Steven Rostedt
  2017-04-05 19:39             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-05 19:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 12:08:10 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote:
> > On Wed, 5 Apr 2017 10:59:25 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> >   
> > > > Note, this has nothing to do with trace_rcu_dyntick(). It's the
> > > > function tracer tracing inside RCU, calling the stack tracer to record
> > > > a new stack if it sees its larger than any stack before. All I need is
> > > > a way to tell the stack tracer to not record a stack if it is in this
> > > > RCU critical section.
> > > > 
> > > > If you can add a "in_rcu_critical_section()" function, that the stack
> > > > tracer can test, and simply exit out like it does if in_nmi() is set,
> > > > that would work too. Below is my current work around.    
> > > 
> > > Except that the rcu_irq_enter() would already have triggered the bug
> > > that was (allegedly) fixed by my earlier patch.  So, yes, the check for
> > > rcu_is_watching() would work around this bug, but the hope is that
> > > with my earlier fix, this workaround would not be needed.  
> > 
> > Note, if I had a "in_rcu_critical_section()" I wouldn't need to call
> > rcu_irq_enter(). I could fall out before that. My current workaround
> > does the check, even though it breaks things, it would hopefully fix
> > things as it calls rcu_irq_exit() immediately.  
> 
> OK, color me confused.  What would "in_rcu_critical_section()" do?
> 
> The rcu_is_watching() function tells you that RCU is not in an extended
> quiescent state, though its return value can be iffy in the middle of
> rcu_eqs_enter_common() -- which is why interrupts are disabled there.
> In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth()
> to determine whether you are within an RCU read-side critical section,
> which is what in_rcu_critical_section() sounds like to me, but I don't
> see how this information would help in this situation.
> 
> What am I missing here?
> 

Would in_guts_of_internal_rcu_infrastructure_code() work? :-)

Here's the crucial part of that stack dump again:

 save_stack_trace+0x1b/0x1d
 check_stack+0xec/0x24a
 stack_trace_call+0x40/0x53
 0xffffffffa0026077
 ? ftrace_graph_caller+0x78/0xa8
 ? trace_hardirqs_off+0xd/0xf
 ? rcu_eqs_enter_common.constprop.71+0x5/0x108
 rcu_eqs_enter_common.constprop.71+0x5/0x108
 rcu_idle_enter+0x51/0x72


The stack trace was called on rcu_eqs_enter_common() inside the
rcu_idle_enter() function call.

Perhaps if I just let rcu disable stack tracing? Something like this:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50fee76..f894fc3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -853,8 +853,10 @@ void rcu_idle_enter(void)
 	unsigned long flags;
 
 	local_irq_save(flags);
+	disable_stack_tracer();
 	rcu_eqs_enter(false);
 	rcu_sysidle_enter(0);
+	enable_stack_tracer();
 	local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);


-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 19:21           ` Steven Rostedt
@ 2017-04-05 19:39             ` Paul E. McKenney
  2017-04-05 19:52               ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-05 19:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 03:21:41PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 12:08:10 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Apr 05, 2017 at 02:54:25PM -0400, Steven Rostedt wrote:
> > > On Wed, 5 Apr 2017 10:59:25 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >   
> > > > > Note, this has nothing to do with trace_rcu_dyntick(). It's the
> > > > > function tracer tracing inside RCU, calling the stack tracer to record
> > > > > a new stack if it sees its larger than any stack before. All I need is
> > > > > a way to tell the stack tracer to not record a stack if it is in this
> > > > > RCU critical section.
> > > > > 
> > > > > If you can add a "in_rcu_critical_section()" function, that the stack
> > > > > tracer can test, and simply exit out like it does if in_nmi() is set,
> > > > > that would work too. Below is my current work around.    
> > > > 
> > > > Except that the rcu_irq_enter() would already have triggered the bug
> > > > that was (allegedly) fixed by my earlier patch.  So, yes, the check for
> > > > rcu_is_watching() would work around this bug, but the hope is that
> > > > with my earlier fix, this workaround would not be needed.  
> > > 
> > > Note, if I had a "in_rcu_critical_section()" I wouldn't need to call
> > > rcu_irq_enter(). I could fall out before that. My current workaround
> > > does the check, even though it breaks things, it would hopefully fix
> > > things as it calls rcu_irq_exit() immediately.  
> > 
> > OK, color me confused.  What would "in_rcu_critical_section()" do?
> > 
> > The rcu_is_watching() function tells you that RCU is not in an extended
> > quiescent state, though its return value can be iffy in the middle of
> > rcu_eqs_enter_common() -- which is why interrupts are disabled there.
> > In preemptible RCU, you can (but shouldn't) use rcu_preempt_depth()
> > to determine whether you are within an RCU read-side critical section,
> > which is what in_rcu_critical_section() sounds like to me, but I don't
> > see how this information would help in this situation.
> > 
> > What am I missing here?
> 
> Would in_guts_of_internal_rcu_infrastructure_code() work? :-)

something_is_happening_here_what_it_is_isnt_clear_there_is_a_man_with_a_gun_over_there_telling_me_to_beware()?  ;-)

> Here's the crucial part of that stack dump again:
> 
>  save_stack_trace+0x1b/0x1d
>  check_stack+0xec/0x24a
>  stack_trace_call+0x40/0x53
>  0xffffffffa0026077
>  ? ftrace_graph_caller+0x78/0xa8
>  ? trace_hardirqs_off+0xd/0xf
>  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
>  rcu_eqs_enter_common.constprop.71+0x5/0x108
>  rcu_idle_enter+0x51/0x72
> 
> 
> The stack trace was called on rcu_eqs_enter_common() inside the
> rcu_idle_enter() function call.
> 
> Perhaps if I just let rcu disable stack tracing? Something like this:
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 50fee76..f894fc3 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -853,8 +853,10 @@ void rcu_idle_enter(void)
>  	unsigned long flags;
> 
>  	local_irq_save(flags);
> +	disable_stack_tracer();
>  	rcu_eqs_enter(false);
>  	rcu_sysidle_enter(0);
> +	enable_stack_tracer();
>  	local_irq_restore(flags);
>  }
>  EXPORT_SYMBOL_GPL(rcu_idle_enter);

That only covers one of two code paths, and still end up with the
trace code thinking that RCU is paying attention to it when it really
is not.

But if there is a disable_stack_tracer() and enable_stack_tracer(),
I could use them as shown below.  (Hand-edited patch, probably doesn't
apply, let alone build.)

							Thanx, Paul

------------------------------------------------------------------------

commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Apr 5 09:05:18 2017 -0700

    rcu: Fix dyntick-idle tracing
    
    The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
    (with my blessing) to allow the current _rcuidle alternative tracepoint
    name to be dispensed with while still maintaining good performance.
    Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
    appear to RCU like an interrupt that occurs where RCU is not designed
    to handle interrupts.
    
    This commit fixes this problem by moving the zeroing of ->dynticks_nesting
    after the offending trace_rcu_dyntick() statement, which narrows the
    window of vulnerability to a pair of adjacent statements that are now
    marked with comments to that effect.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 50fee7689e71..7308c88ee57a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 }
 
 /*
- * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state
+ * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
  *
- * If the new value of the ->dynticks_nesting counter now is zero,
- * we really have entered idle, and must do the appropriate accounting.
- * The caller must have disabled interrupts.
+ * Enter idle, doing appropriate accounting.  The caller must have
+ * disabled interrupts.
  */
-static void rcu_eqs_enter_common(long long oldval, bool user)
+static void rcu_eqs_enter_common(bool user)
 {
 	struct rcu_state *rsp;
 	struct rcu_data *rdp;
-	RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
+	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
 
-	trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
+	trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
 	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 	    !user && !is_idle_task(current)) {
 		struct task_struct *idle __maybe_unused =
 			idle_task(smp_processor_id());
 
-		trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
+		trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
 		rcu_ftrace_dump(DUMP_ORIG);
 		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
 			  current->pid, current->comm,
@@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
 		do_nocb_deferred_wakeup(rdp);
 	}
 	rcu_prepare_for_idle();
-	rcu_dynticks_eqs_enter();
+	disable_stack_tracer();
+	rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
+	rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
+	enable_stack_tracer();
 	rcu_dynticks_task_enter();
 
 	/*
@@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
  */
 static void rcu_eqs_enter(bool user)
 {
-	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	rdtp = this_cpu_ptr(&rcu_dynticks);
-	oldval = rdtp->dynticks_nesting;
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
-		     (oldval & DYNTICK_TASK_NEST_MASK) == 0);
-	if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
-		rdtp->dynticks_nesting = 0;
-		rcu_eqs_enter_common(oldval, user);
-	} else {
+		     (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
+	if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
+		rcu_eqs_enter_common(user);
+	else
 		rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
-	}
 }
 
 /**
@@ -892,19 +888,18 @@ void rcu_user_enter(void)
  */
 void rcu_irq_exit(void)
 {
-	long long oldval;
 	struct rcu_dynticks *rdtp;
 
 	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
 	rdtp = this_cpu_ptr(&rcu_dynticks);
-	oldval = rdtp->dynticks_nesting;
-	rdtp->dynticks_nesting--;
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
-		     rdtp->dynticks_nesting < 0);
-	if (rdtp->dynticks_nesting)
-		trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
-	else
-		rcu_eqs_enter_common(oldval, true);
+		     rdtp->dynticks_nesting < 1);
+	if (rdtp->dynticks_nesting <= 1) {
+		rcu_eqs_enter_common(true);
+	} else {
+		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
+		rdtp->dynticks_nesting--;
+	}
 	rcu_sysidle_enter(1);
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 19:39             ` Paul E. McKenney
@ 2017-04-05 19:52               ` Steven Rostedt
  2017-04-05 20:42                 ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-05 19:52 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 12:39:28 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:


> > Here's the crucial part of that stack dump again:
> > 
> >  save_stack_trace+0x1b/0x1d
> >  check_stack+0xec/0x24a
> >  stack_trace_call+0x40/0x53
> >  0xffffffffa0026077
> >  ? ftrace_graph_caller+0x78/0xa8
> >  ? trace_hardirqs_off+0xd/0xf
> >  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
> >  rcu_eqs_enter_common.constprop.71+0x5/0x108
> >  rcu_idle_enter+0x51/0x72
> > 
> > 
> > The stack trace was called on rcu_eqs_enter_common() inside the
> > rcu_idle_enter() function call.
> > 
> > Perhaps if I just let rcu disable stack tracing? Something like this:
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 50fee76..f894fc3 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -853,8 +853,10 @@ void rcu_idle_enter(void)
> >  	unsigned long flags;
> > 
> >  	local_irq_save(flags);
> > +	disable_stack_tracer();
> >  	rcu_eqs_enter(false);
> >  	rcu_sysidle_enter(0);
> > +	enable_stack_tracer();
> >  	local_irq_restore(flags);
> >  }
> >  EXPORT_SYMBOL_GPL(rcu_idle_enter);  
> 
> That only covers one of two code paths, and still end up with the
> trace code thinking that RCU is paying attention to it when it really
> is not.

OK, that sounds like a separate bug to what I'm tracking.

> 
> But if there is a disable_stack_tracer() and enable_stack_tracer(),
> I could use them as shown below.  (Hand-edited patch, probably doesn't
> apply, let alone build.)
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Wed Apr 5 09:05:18 2017 -0700
> 
>     rcu: Fix dyntick-idle tracing
>     
>     The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
>     (with my blessing) to allow the current _rcuidle alternative tracepoint
>     name to be dispensed with while still maintaining good performance.
>     Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
>     appear to RCU like an interrupt that occurs where RCU is not designed
>     to handle interrupts.
>     
>     This commit fixes this problem by moving the zeroing of ->dynticks_nesting
>     after the offending trace_rcu_dyntick() statement, which narrows the
>     window of vulnerability to a pair of adjacent statements that are now
>     marked with comments to that effect.
>     
>     Reported-by: Steven Rostedt <rostedt@goodmis.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 50fee7689e71..7308c88ee57a 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
>  }
>  
>  /*
> - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state
> + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
>   *
> - * If the new value of the ->dynticks_nesting counter now is zero,
> - * we really have entered idle, and must do the appropriate accounting.
> - * The caller must have disabled interrupts.
> + * Enter idle, doing appropriate accounting.  The caller must have
> + * disabled interrupts.
>   */
> -static void rcu_eqs_enter_common(long long oldval, bool user)
> +static void rcu_eqs_enter_common(bool user)
>  {
>  	struct rcu_state *rsp;
>  	struct rcu_data *rdp;
> -	RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
> +	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
>  
> -	trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
> +	trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
>  	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
>  	    !user && !is_idle_task(current)) {
>  		struct task_struct *idle __maybe_unused =
>  			idle_task(smp_processor_id());
>  
> -		trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
> +		trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
>  		rcu_ftrace_dump(DUMP_ORIG);
>  		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
>  			  current->pid, current->comm,
> @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
>  		do_nocb_deferred_wakeup(rdp);
>  	}
>  	rcu_prepare_for_idle();
> -	rcu_dynticks_eqs_enter();
> +	disable_stack_tracer();
> +	rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
> +	rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
> +	enable_stack_tracer();

OK, do you want me to send you a patch, or should I take your patch
into my tree and add this?

-- Steve

>  	rcu_dynticks_task_enter();
>  
>  	/*
> @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
>   */
>  static void rcu_eqs_enter(bool user)
>  {
> -	long long oldval;
>  	struct rcu_dynticks *rdtp;
>  
>  	rdtp = this_cpu_ptr(&rcu_dynticks);
> -	oldval = rdtp->dynticks_nesting;
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> -		     (oldval & DYNTICK_TASK_NEST_MASK) == 0);
> -	if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
> -		rdtp->dynticks_nesting = 0;
> -		rcu_eqs_enter_common(oldval, user);
> -	} else {
> +		     (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
> +	if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
> +		rcu_eqs_enter_common(user);
> +	else
>  		rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> -	}
>  }
>  
>  /**
> @@ -892,19 +888,18 @@ void rcu_user_enter(void)
>   */
>  void rcu_irq_exit(void)
>  {
> -	long long oldval;
>  	struct rcu_dynticks *rdtp;
>  
>  	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
>  	rdtp = this_cpu_ptr(&rcu_dynticks);
> -	oldval = rdtp->dynticks_nesting;
> -	rdtp->dynticks_nesting--;
>  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> -		     rdtp->dynticks_nesting < 0);
> -	if (rdtp->dynticks_nesting)
> -		trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
> -	else
> -		rcu_eqs_enter_common(oldval, true);
> +		     rdtp->dynticks_nesting < 1);
> +	if (rdtp->dynticks_nesting <= 1) {
> +		rcu_eqs_enter_common(true);
> +	} else {
> +		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
> +		rdtp->dynticks_nesting--;
> +	}
>  	rcu_sysidle_enter(1);
>  }
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 19:52               ` Steven Rostedt
@ 2017-04-05 20:42                 ` Paul E. McKenney
  2017-04-06  1:31                   ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-05 20:42 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 03:52:23PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 12:39:28 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > Here's the crucial part of that stack dump again:
> > > 
> > >  save_stack_trace+0x1b/0x1d
> > >  check_stack+0xec/0x24a
> > >  stack_trace_call+0x40/0x53
> > >  0xffffffffa0026077
> > >  ? ftrace_graph_caller+0x78/0xa8
> > >  ? trace_hardirqs_off+0xd/0xf
> > >  ? rcu_eqs_enter_common.constprop.71+0x5/0x108
> > >  rcu_eqs_enter_common.constprop.71+0x5/0x108
> > >  rcu_idle_enter+0x51/0x72
> > > 
> > > 
> > > The stack trace was called on rcu_eqs_enter_common() inside the
> > > rcu_idle_enter() function call.
> > > 
> > > Perhaps if I just let rcu disable stack tracing? Something like this:
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 50fee76..f894fc3 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -853,8 +853,10 @@ void rcu_idle_enter(void)
> > >  	unsigned long flags;
> > > 
> > >  	local_irq_save(flags);
> > > +	disable_stack_tracer();
> > >  	rcu_eqs_enter(false);
> > >  	rcu_sysidle_enter(0);
> > > +	enable_stack_tracer();
> > >  	local_irq_restore(flags);
> > >  }
> > >  EXPORT_SYMBOL_GPL(rcu_idle_enter);  
> > 
> > That only covers one of two code paths, and still end up with the
> > trace code thinking that RCU is paying attention to it when it really
> > is not.
> 
> OK, that sounds like a separate bug to what I'm tracking.
> 
> > 
> > But if there is a disable_stack_tracer() and enable_stack_tracer(),
> > I could use them as shown below.  (Hand-edited patch, probably doesn't
> > apply, let alone build.)
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 1a11d6fd0ae344f7dccd57d3c587d9ef2a4bf08e
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Wed Apr 5 09:05:18 2017 -0700
> > 
> >     rcu: Fix dyntick-idle tracing
> >     
> >     The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit()
> >     (with my blessing) to allow the current _rcuidle alternative tracepoint
> >     name to be dispensed with while still maintaining good performance.
> >     Unfortunately, this causes RCU's dyntick-idle entry code's tracing to
> >     appear to RCU like an interrupt that occurs where RCU is not designed
> >     to handle interrupts.
> >     
> >     This commit fixes this problem by moving the zeroing of ->dynticks_nesting
> >     after the offending trace_rcu_dyntick() statement, which narrows the
> >     window of vulnerability to a pair of adjacent statements that are now
> >     marked with comments to that effect.
> >     
> >     Reported-by: Steven Rostedt <rostedt@goodmis.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 50fee7689e71..7308c88ee57a 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -771,25 +771,24 @@ cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
> >  }
> >  
> >  /*
> > - * rcu_eqs_enter_common - current CPU is moving towards extended quiescent state
> > + * rcu_eqs_enter_common - current CPU is entering an extended quiescent state
> >   *
> > - * If the new value of the ->dynticks_nesting counter now is zero,
> > - * we really have entered idle, and must do the appropriate accounting.
> > - * The caller must have disabled interrupts.
> > + * Enter idle, doing appropriate accounting.  The caller must have
> > + * disabled interrupts.
> >   */
> > -static void rcu_eqs_enter_common(long long oldval, bool user)
> > +static void rcu_eqs_enter_common(bool user)
> >  {
> >  	struct rcu_state *rsp;
> >  	struct rcu_data *rdp;
> > -	RCU_TRACE(struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);)
> > +	struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> >  
> > -	trace_rcu_dyntick(TPS("Start"), oldval, rdtp->dynticks_nesting);
> > +	trace_rcu_dyntick(TPS("Start"), rdtp->dynticks_nesting, 0);
> >  	if (IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> >  	    !user && !is_idle_task(current)) {
> >  		struct task_struct *idle __maybe_unused =
> >  			idle_task(smp_processor_id());
> >  
> > -		trace_rcu_dyntick(TPS("Error on entry: not idle task"), oldval, 0);
> > +		trace_rcu_dyntick(TPS("Error on entry: not idle task"), rdtp->dynticks_nesting, 0);
> >  		rcu_ftrace_dump(DUMP_ORIG);
> >  		WARN_ONCE(1, "Current pid: %d comm: %s / Idle pid: %d comm: %s",
> >  			  current->pid, current->comm,
> > @@ -800,7 +799,8 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
> >  		do_nocb_deferred_wakeup(rdp);
> >  	}
> >  	rcu_prepare_for_idle();
> > -	rcu_dynticks_eqs_enter();
> > +	disable_stack_tracer();
> > +	rdtp->dynticks_nesting = 0; /* Breaks tracing momentarily. */
> > +	rcu_dynticks_eqs_enter(); /* After this, tracing works again. */
> > +	enable_stack_tracer();
> 
> OK, do you want me to send you a patch, or should I take your patch
> into my tree and add this?

Given that I don't seem to have disable_stack_tracer() and
enable_stack_tracer(), could you please add them in?

							Thanx, Paul

> -- Steve
> 
> >  	rcu_dynticks_task_enter();
> >  
> >  	/*
> > @@ -821,19 +821,15 @@ static void rcu_eqs_enter_common(long long oldval, bool user)
> >   */
> >  static void rcu_eqs_enter(bool user)
> >  {
> > -	long long oldval;
> >  	struct rcu_dynticks *rdtp;
> >  
> >  	rdtp = this_cpu_ptr(&rcu_dynticks);
> > -	oldval = rdtp->dynticks_nesting;
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> > -		     (oldval & DYNTICK_TASK_NEST_MASK) == 0);
> > -	if ((oldval & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE) {
> > -		rdtp->dynticks_nesting = 0;
> > -		rcu_eqs_enter_common(oldval, user);
> > -	} else {
> > +		     (rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == 0);
> > +	if ((rdtp->dynticks_nesting & DYNTICK_TASK_NEST_MASK) == DYNTICK_TASK_NEST_VALUE)
> > +		rcu_eqs_enter_common(user);
> > +	else
> >  		rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
> > -	}
> >  }
> >  
> >  /**
> > @@ -892,19 +888,18 @@ void rcu_user_enter(void)
> >   */
> >  void rcu_irq_exit(void)
> >  {
> > -	long long oldval;
> >  	struct rcu_dynticks *rdtp;
> >  
> >  	RCU_LOCKDEP_WARN(!irqs_disabled(), "rcu_irq_exit() invoked with irqs enabled!!!");
> >  	rdtp = this_cpu_ptr(&rcu_dynticks);
> > -	oldval = rdtp->dynticks_nesting;
> > -	rdtp->dynticks_nesting--;
> >  	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> > -		     rdtp->dynticks_nesting < 0);
> > -	if (rdtp->dynticks_nesting)
> > -		trace_rcu_dyntick(TPS("--="), oldval, rdtp->dynticks_nesting);
> > -	else
> > -		rcu_eqs_enter_common(oldval, true);
> > +		     rdtp->dynticks_nesting < 1);
> > +	if (rdtp->dynticks_nesting <= 1) {
> > +		rcu_eqs_enter_common(true);
> > +	} else {
> > +		trace_rcu_dyntick(TPS("--="), rdtp->dynticks_nesting, rdtp->dynticks_nesting - 1);
> > +		rdtp->dynticks_nesting--;
> > +	}
> >  	rcu_sysidle_enter(1);
> >  }
> >  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 20:42                 ` Paul E. McKenney
@ 2017-04-06  1:31                   ` Steven Rostedt
  2017-04-06  4:07                     ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-06  1:31 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 13:42:29 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > OK, do you want me to send you a patch, or should I take your patch
> > into my tree and add this?  
> 
> Given that I don't seem to have disable_stack_tracer() and
> enable_stack_tracer(), could you please add them in?

That's what I meant about "send you a patch" :-)

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-05 17:59     ` Paul E. McKenney
  2017-04-05 18:54       ` Steven Rostedt
@ 2017-04-06  2:12       ` Steven Rostedt
  2017-04-06  4:15         ` Paul E. McKenney
  1 sibling, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-06  2:12 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 10:59:25 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> > > Could you please let me know if tracing happens in NMI handlers?
> > > If so, a bit of additional code will be needed.
> > > 
> > > 							Thanx, Paul
> > > 
> > > PS.  Which reminds me, any short-term uses of RCU_TASKS?  This represents
> > >      3 of my 16 test scenarios, which is getting hard to justify for
> > >      something that isn't used.  Especially given that I will need to
> > >      add more scenarios for parallel-callbacks SRCU...  
> > 
> > The RCU_TASK implementation is next on my todo list. Yes, there's going
> > to be plenty of users very soon. Not for 4.12 but definitely for 4.13.
> > 
> > Sorry for the delay in implementing that :-/  
> 
> OK, I will wait a few months before checking again...
> 

Actually, I took a quick look at what needs to be done, and I think it
is *really* easy, and may be available in 4.12! Here's the current
patch.

I can probably do a patch to allow optimized kprobes on PREEMPT kernels
as well.

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8efd9fe..28e3019 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
 	 * callers are done before leaving this function.
 	 * The same goes for freeing the per_cpu data of the per_cpu
 	 * ops.
-	 *
-	 * Again, normal synchronize_sched() is not good enough.
-	 * We need to do a hard force of sched synchronization.
-	 * This is because we use preempt_disable() to do RCU, but
-	 * the function tracers can be called where RCU is not watching
-	 * (like before user_exit()). We can not rely on the RCU
-	 * infrastructure to do the synchronization, thus we must do it
-	 * ourselves.
 	 */
 	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
+		/*
+		 * We need to do a hard force of sched synchronization.
+		 * This is because we use preempt_disable() to do RCU, but
+		 * the function tracers can be called where RCU is not watching
+		 * (like before user_exit()). We can not rely on the RCU
+		 * infrastructure to do the synchronization, thus we must do it
+		 * ourselves.
+		 */
 		schedule_on_each_cpu(ftrace_sync);
 
+#ifdef CONFIG_PREEMPT
+		/*
+		 * When the kernel is preeptive, tasks can be preempted
+		 * while on a ftrace trampoline. Just scheduling a task on
+		 * a CPU is not good enough to flush them. Calling
+		 * synchronize_rcu_tasks() will wait for those tasks to
+		 * execute and either schedule voluntarily or enter user space.
+		 */
+		synchronize_rcu_tasks();
+#endif
 		arch_ftrace_trampoline_free(ops);
 
 		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
@@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops)
 {
-
-/*
- * Currently there's no safe way to free a trampoline when the kernel
- * is configured with PREEMPT. That is because a task could be preempted
- * when it jumped to the trampoline, it may be preempted for a long time
- * depending on the system load, and currently there's no way to know
- * when it will be off the trampoline. If the trampoline is freed
- * too early, when the task runs again, it will be executing on freed
- * memory and crash.
- */
-#ifdef CONFIG_PREEMPT
-	/* Currently, only non dynamic ops can have a trampoline */
-	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
-		return;
-#endif
-
 	arch_ftrace_update_trampoline(ops);
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-06  1:31                   ` Steven Rostedt
@ 2017-04-06  4:07                     ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-06  4:07 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 09:31:45PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 13:42:29 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > OK, do you want me to send you a patch, or should I take your patch
> > > into my tree and add this?  
> > 
> > Given that I don't seem to have disable_stack_tracer() and
> > enable_stack_tracer(), could you please add them in?
> 
> That's what I meant about "send you a patch" :-)

Reading/comprehension-challenged today, aren't I?

This doesn't conflict with anything in my tree, so either your tree
or my tree is fine by me.  The advantage of your tree is that you
could easily sync it up with the other changes needed.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-06  2:12       ` Steven Rostedt
@ 2017-04-06  4:15         ` Paul E. McKenney
  2017-04-06 14:14           ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-06  4:15 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Wed, Apr 05, 2017 at 10:12:24PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 10:59:25 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > > > Could you please let me know if tracing happens in NMI handlers?
> > > > If so, a bit of additional code will be needed.
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > PS.  Which reminds me, any short-term uses of RCU_TASKS?  This represents
> > > >      3 of my 16 test scenarios, which is getting hard to justify for
> > > >      something that isn't used.  Especially given that I will need to
> > > >      add more scenarios for parallel-callbacks SRCU...  
> > > 
> > > The RCU_TASK implementation is next on my todo list. Yes, there's going
> > > to be plenty of users very soon. Not for 4.12 but definitely for 4.13.
> > > 
> > > Sorry for the delay in implementing that :-/  
> > 
> > OK, I will wait a few months before checking again...
> > 
> 
> Actually, I took a quick look at what needs to be done, and I think it
> is *really* easy, and may be available in 4.12! Here's the current
> patch.

Cool!!!

> I can probably do a patch to allow optimized kprobes on PREEMPT kernels
> as well.
> 
> -- Steve
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8efd9fe..28e3019 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
>  	 * callers are done before leaving this function.
>  	 * The same goes for freeing the per_cpu data of the per_cpu
>  	 * ops.
> -	 *
> -	 * Again, normal synchronize_sched() is not good enough.
> -	 * We need to do a hard force of sched synchronization.
> -	 * This is because we use preempt_disable() to do RCU, but
> -	 * the function tracers can be called where RCU is not watching
> -	 * (like before user_exit()). We can not rely on the RCU
> -	 * infrastructure to do the synchronization, thus we must do it
> -	 * ourselves.
>  	 */
>  	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
> +		/*
> +		 * We need to do a hard force of sched synchronization.
> +		 * This is because we use preempt_disable() to do RCU, but
> +		 * the function tracers can be called where RCU is not watching
> +		 * (like before user_exit()). We can not rely on the RCU
> +		 * infrastructure to do the synchronization, thus we must do it
> +		 * ourselves.
> +		 */
>  		schedule_on_each_cpu(ftrace_sync);

Great header comment on ftrace_sync(): "Yes, function tracing is rude."
And schedule_on_each_cpu() looks like a great workqueue gatling gun!  ;-)

> +#ifdef CONFIG_PREEMPT
> +		/*
> +		 * When the kernel is preeptive, tasks can be preempted
> +		 * while on a ftrace trampoline. Just scheduling a task on
> +		 * a CPU is not good enough to flush them. Calling
> +		 * synchronize_rcu_tasks() will wait for those tasks to
> +		 * execute and either schedule voluntarily or enter user space.
> +		 */
> +		synchronize_rcu_tasks();
> +#endif

How about this to save a line?

		if (IS_ENABLED(CONFIG_PREEMPT))
			synchronize_rcu_tasks();

One thing that might speed this up a bit (or might not) would be to
doe the schedule_on_each_cpu() from a delayed workqueue.  That way,
if any of the activity from schedule_on_each_cpu() involved a voluntary
context switch (from a cond_resched() or some such), then
synchronize_rcu_tasks() would get the benefit of that context switch.

You would need a flush_work() to wait for that delayed workqueue
as well, of course.

Not sure whether it is worth it, but figured I should pass it along.

>  		arch_ftrace_trampoline_free(ops);
> 
>  		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
> @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
> 
>  static void ftrace_update_trampoline(struct ftrace_ops *ops)
>  {
> -
> -/*
> - * Currently there's no safe way to free a trampoline when the kernel
> - * is configured with PREEMPT. That is because a task could be preempted
> - * when it jumped to the trampoline, it may be preempted for a long time
> - * depending on the system load, and currently there's no way to know
> - * when it will be off the trampoline. If the trampoline is freed
> - * too early, when the task runs again, it will be executing on freed
> - * memory and crash.
> - */
> -#ifdef CONFIG_PREEMPT
> -	/* Currently, only non dynamic ops can have a trampoline */
> -	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
> -		return;
> -#endif
> -
>  	arch_ftrace_update_trampoline(ops);
>  }

Agreed, straightforward patch!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-06  4:15         ` Paul E. McKenney
@ 2017-04-06 14:14           ` Steven Rostedt
  2017-04-06 14:59             ` Paul E. McKenney
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-04-06 14:14 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: LKML

On Wed, 5 Apr 2017 21:15:15 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
\> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 8efd9fe..28e3019 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
> >  	 * callers are done before leaving this function.
> >  	 * The same goes for freeing the per_cpu data of the per_cpu
> >  	 * ops.
> > -	 *
> > -	 * Again, normal synchronize_sched() is not good enough.
> > -	 * We need to do a hard force of sched synchronization.
> > -	 * This is because we use preempt_disable() to do RCU, but
> > -	 * the function tracers can be called where RCU is not watching
> > -	 * (like before user_exit()). We can not rely on the RCU
> > -	 * infrastructure to do the synchronization, thus we must do it
> > -	 * ourselves.
> >  	 */
> >  	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
> > +		/*
> > +		 * We need to do a hard force of sched synchronization.
> > +		 * This is because we use preempt_disable() to do RCU, but
> > +		 * the function tracers can be called where RCU is not watching
> > +		 * (like before user_exit()). We can not rely on the RCU
> > +		 * infrastructure to do the synchronization, thus we must do it
> > +		 * ourselves.
> > +		 */
> >  		schedule_on_each_cpu(ftrace_sync);  
> 
> Great header comment on ftrace_sync(): "Yes, function tracing is rude."
> And schedule_on_each_cpu() looks like a great workqueue gatling gun!  ;-)
> 
> > +#ifdef CONFIG_PREEMPT
> > +		/*
> > +		 * When the kernel is preeptive, tasks can be preempted
> > +		 * while on a ftrace trampoline. Just scheduling a task on
> > +		 * a CPU is not good enough to flush them. Calling
> > +		 * synchronize_rcu_tasks() will wait for those tasks to
> > +		 * execute and either schedule voluntarily or enter user space.
> > +		 */
> > +		synchronize_rcu_tasks();
> > +#endif  
> 
> How about this to save a line?
> 
> 		if (IS_ENABLED(CONFIG_PREEMPT))
> 			synchronize_rcu_tasks();

Ah, this works as gcc optimizes it out. Otherwise I received a compile
error with synchronize_rcu_tasks() not defined. But that's because I
never enabled CONFIG_TASKS_RCU.


> 
> One thing that might speed this up a bit (or might not) would be to
> doe the schedule_on_each_cpu() from a delayed workqueue.  That way,
> if any of the activity from schedule_on_each_cpu() involved a voluntary
> context switch (from a cond_resched() or some such), then
> synchronize_rcu_tasks() would get the benefit of that context switch.
> 
> You would need a flush_work() to wait for that delayed workqueue
> as well, of course.

This is a very slow path, I'm not too interested in making it complex
to speed it up.

> 
> Not sure whether it is worth it, but figured I should pass it along.
> 
> >  		arch_ftrace_trampoline_free(ops);
> > 
> >  		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
> > @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
> > 
> >  static void ftrace_update_trampoline(struct ftrace_ops *ops)
> >  {
> > -
> > -/*
> > - * Currently there's no safe way to free a trampoline when the kernel
> > - * is configured with PREEMPT. That is because a task could be preempted
> > - * when it jumped to the trampoline, it may be preempted for a long time
> > - * depending on the system load, and currently there's no way to know
> > - * when it will be off the trampoline. If the trampoline is freed
> > - * too early, when the task runs again, it will be executing on freed
> > - * memory and crash.
> > - */
> > -#ifdef CONFIG_PREEMPT
> > -	/* Currently, only non dynamic ops can have a trampoline */
> > -	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
> > -		return;
> > -#endif
> > -
> >  	arch_ftrace_update_trampoline(ops);
> >  }  
> 
> Agreed, straightforward patch!

Great, I'll start making it official then.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt
  2017-04-06 14:14           ` Steven Rostedt
@ 2017-04-06 14:59             ` Paul E. McKenney
  0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2017-04-06 14:59 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML

On Thu, Apr 06, 2017 at 10:14:25AM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 21:15:15 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> \> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > > index 8efd9fe..28e3019 100644
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, int command)
> > >  	 * callers are done before leaving this function.
> > >  	 * The same goes for freeing the per_cpu data of the per_cpu
> > >  	 * ops.
> > > -	 *
> > > -	 * Again, normal synchronize_sched() is not good enough.
> > > -	 * We need to do a hard force of sched synchronization.
> > > -	 * This is because we use preempt_disable() to do RCU, but
> > > -	 * the function tracers can be called where RCU is not watching
> > > -	 * (like before user_exit()). We can not rely on the RCU
> > > -	 * infrastructure to do the synchronization, thus we must do it
> > > -	 * ourselves.
> > >  	 */
> > >  	if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
> > > +		/*
> > > +		 * We need to do a hard force of sched synchronization.
> > > +		 * This is because we use preempt_disable() to do RCU, but
> > > +		 * the function tracers can be called where RCU is not watching
> > > +		 * (like before user_exit()). We can not rely on the RCU
> > > +		 * infrastructure to do the synchronization, thus we must do it
> > > +		 * ourselves.
> > > +		 */
> > >  		schedule_on_each_cpu(ftrace_sync);  
> > 
> > Great header comment on ftrace_sync(): "Yes, function tracing is rude."
> > And schedule_on_each_cpu() looks like a great workqueue gatling gun!  ;-)
> > 
> > > +#ifdef CONFIG_PREEMPT
> > > +		/*
> > > +		 * When the kernel is preeptive, tasks can be preempted
> > > +		 * while on a ftrace trampoline. Just scheduling a task on
> > > +		 * a CPU is not good enough to flush them. Calling
> > > +		 * synchronize_rcu_tasks() will wait for those tasks to
> > > +		 * execute and either schedule voluntarily or enter user space.
> > > +		 */
> > > +		synchronize_rcu_tasks();
> > > +#endif  
> > 
> > How about this to save a line?
> > 
> > 		if (IS_ENABLED(CONFIG_PREEMPT))
> > 			synchronize_rcu_tasks();
> 
> Ah, this works as gcc optimizes it out. Otherwise I received a compile
> error with synchronize_rcu_tasks() not defined. But that's because I
> never enabled CONFIG_TASKS_RCU.

Should I define a synchronize_rcu_tasks() that does BUG() for this case?

> > One thing that might speed this up a bit (or might not) would be to
> > doe the schedule_on_each_cpu() from a delayed workqueue.  That way,
> > if any of the activity from schedule_on_each_cpu() involved a voluntary
> > context switch (from a cond_resched() or some such), then
> > synchronize_rcu_tasks() would get the benefit of that context switch.
> > 
> > You would need a flush_work() to wait for that delayed workqueue
> > as well, of course.
> 
> This is a very slow path, I'm not too interested in making it complex
> to speed it up.

Makes sense to me!

> > Not sure whether it is worth it, but figured I should pass it along.
> > 
> > >  		arch_ftrace_trampoline_free(ops);
> > > 
> > >  		if (ops->flags & FTRACE_OPS_FL_PER_CPU)
> > > @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct ftrace_ops *ops)
> > > 
> > >  static void ftrace_update_trampoline(struct ftrace_ops *ops)
> > >  {
> > > -
> > > -/*
> > > - * Currently there's no safe way to free a trampoline when the kernel
> > > - * is configured with PREEMPT. That is because a task could be preempted
> > > - * when it jumped to the trampoline, it may be preempted for a long time
> > > - * depending on the system load, and currently there's no way to know
> > > - * when it will be off the trampoline. If the trampoline is freed
> > > - * too early, when the task runs again, it will be executing on freed
> > > - * memory and crash.
> > > - */
> > > -#ifdef CONFIG_PREEMPT
> > > -	/* Currently, only non dynamic ops can have a trampoline */
> > > -	if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
> > > -		return;
> > > -#endif
> > > -
> > >  	arch_ftrace_update_trampoline(ops);
> > >  }  
> > 
> > Agreed, straightforward patch!
> 
> Great, I'll start making it official then.

Sounds very good!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05 13:32 [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt Steven Rostedt
2017-04-05 16:25 ` Paul E. McKenney
2017-04-05 16:45   ` Steven Rostedt
2017-04-05 17:59     ` Paul E. McKenney
2017-04-05 18:54       ` Steven Rostedt
2017-04-05 19:08         ` Paul E. McKenney
2017-04-05 19:21           ` Steven Rostedt
2017-04-05 19:39             ` Paul E. McKenney
2017-04-05 19:52               ` Steven Rostedt
2017-04-05 20:42                 ` Paul E. McKenney
2017-04-06  1:31                   ` Steven Rostedt
2017-04-06  4:07                     ` Paul E. McKenney
2017-04-06  2:12       ` Steven Rostedt
2017-04-06  4:15         ` Paul E. McKenney
2017-04-06 14:14           ` Steven Rostedt
2017-04-06 14:59             ` Paul E. McKenney

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git