All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
@ 2022-12-13 22:24 Steven Rostedt
  2022-12-13 23:49 ` Masami Hiramatsu
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-13 22:24 UTC (permalink / raw)
  To: LKML, Linux Trace Kernel; +Cc: Masami Hiramatsu, Andrew Morton

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

If a trigger filter on the kernel command line fails to apply (due to
syntax error), it will be freed. The freeing will call
tracepoint_synchronize_unregister(), but this is not needed during early
boot up, and will even trigger a lockdep splat.

Avoid calling the synchronization function when system_state is
SYSTEM_BOOTING.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 kernel/trace/trace_events_trigger.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 19ce9d22bfd7..e535959939d3 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -1085,8 +1085,14 @@ int set_trigger_filter(char *filter_str,
 	rcu_assign_pointer(data->filter, filter);
 
 	if (tmp) {
-		/* Make sure the call is done with the filter */
-		tracepoint_synchronize_unregister();
+		/*
+		 * Make sure the call is done with the filter.
+		 * It is possible that a filter could fail at boot up,
+		 * and then this path will be called. Avoid the synchronization
+		 * in that case.
+		 */
+		if (system_state != SYSTEM_BOOTING)
+			tracepoint_synchronize_unregister();
 		free_event_filter(tmp);
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-13 22:24 [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up Steven Rostedt
@ 2022-12-13 23:49 ` Masami Hiramatsu
  2022-12-14  0:03   ` Steven Rostedt
  2022-12-14 20:03   ` Paul E. McKenney
  0 siblings, 2 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2022-12-13 23:49 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Linux Trace Kernel, Masami Hiramatsu, Andrew Morton

On Tue, 13 Dec 2022 17:24:29 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> If a trigger filter on the kernel command line fails to apply (due to
> syntax error), it will be freed. The freeing will call
> tracepoint_synchronize_unregister(), but this is not needed during early
> boot up, and will even trigger a lockdep splat.
> 
> Avoid calling the synchronization function when system_state is
> SYSTEM_BOOTING.

Shouldn't this be done inside tracepoint_synchronize_unregister()?
Then, it will prevent similar warnings if we expand boot time feature.

Thank you,

> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_events_trigger.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index 19ce9d22bfd7..e535959939d3 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -1085,8 +1085,14 @@ int set_trigger_filter(char *filter_str,
>  	rcu_assign_pointer(data->filter, filter);
>  
>  	if (tmp) {
> -		/* Make sure the call is done with the filter */
> -		tracepoint_synchronize_unregister();
> +		/*
> +		 * Make sure the call is done with the filter.
> +		 * It is possible that a filter could fail at boot up,
> +		 * and then this path will be called. Avoid the synchronization
> +		 * in that case.
> +		 */
> +		if (system_state != SYSTEM_BOOTING)
> +			tracepoint_synchronize_unregister();
>  		free_event_filter(tmp);
>  	}
>  
> -- 
> 2.35.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-13 23:49 ` Masami Hiramatsu
@ 2022-12-14  0:03   ` Steven Rostedt
  2022-12-14  7:31     ` Masami Hiramatsu
  2022-12-14 20:03   ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-14  0:03 UTC (permalink / raw)
  To: Masami Hiramatsu (Google); +Cc: LKML, Linux Trace Kernel, Andrew Morton

On Wed, 14 Dec 2022 08:49:54 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > Avoid calling the synchronization function when system_state is
> > SYSTEM_BOOTING.  
> 
> Shouldn't this be done inside tracepoint_synchronize_unregister()?
> Then, it will prevent similar warnings if we expand boot time feature.

I thought about that, but because this is a special case, I was thinking
that we don't just gloss over it and have something call this when it
shouldn't be in early boot up. I was surprised when it was called, and then
had to debug it a bit to figure out why it was. When I discovered the path
it took to get there, I realized it was legitimate.

Right now, I only see this one case as an exception. If more pop up that
are correct, then we can switch it over. But right now, I still want
calling tracepoint_synchronize_unregister() to warn if it is called in
early boot up, because that could be a sign that something went wrong.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-14  0:03   ` Steven Rostedt
@ 2022-12-14  7:31     ` Masami Hiramatsu
  0 siblings, 0 replies; 19+ messages in thread
From: Masami Hiramatsu @ 2022-12-14  7:31 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: LKML, Linux Trace Kernel, Andrew Morton

On Tue, 13 Dec 2022 19:03:21 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 14 Dec 2022 08:49:54 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > Avoid calling the synchronization function when system_state is
> > > SYSTEM_BOOTING.  
> > 
> > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > Then, it will prevent similar warnings if we expand boot time feature.
> 
> I thought about that, but because this is a special case, I was thinking
> that we don't just gloss over it and have something call this when it
> shouldn't be in early boot up. I was surprised when it was called, and then
> had to debug it a bit to figure out why it was. When I discovered the path
> it took to get there, I realized it was legitimate.
> 
> Right now, I only see this one case as an exception. If more pop up that
> are correct, then we can switch it over. But right now, I still want
> calling tracepoint_synchronize_unregister() to warn if it is called in
> early boot up, because that could be a sign that something went wrong.

Thanks, I got it. :) So when we find more synchronization is involved,
we'll revisit it.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-13 23:49 ` Masami Hiramatsu
  2022-12-14  0:03   ` Steven Rostedt
@ 2022-12-14 20:03   ` Paul E. McKenney
  2022-12-14 20:37     ` Mathieu Desnoyers
  2022-12-15 15:02     ` Steven Rostedt
  1 sibling, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-14 20:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
> On Tue, 13 Dec 2022 17:24:29 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > 
> > If a trigger filter on the kernel command line fails to apply (due to
> > syntax error), it will be freed. The freeing will call
> > tracepoint_synchronize_unregister(), but this is not needed during early
> > boot up, and will even trigger a lockdep splat.
> > 
> > Avoid calling the synchronization function when system_state is
> > SYSTEM_BOOTING.
> 
> Shouldn't this be done inside tracepoint_synchronize_unregister()?
> Then, it will prevent similar warnings if we expand boot time feature.

How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
Just in case there are ever additional issues of this sort?

							Thanx, Paul

------------------------------------------------------------------------

commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Wed Dec 14 11:41:44 2022 -0800

    rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
    
    Currently, RCU_LOCKDEP_WARN() checks the condition before checking
    to see if lockdep is still enabled.  This is necessary to avoid the
    false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
    RCU_LOCKDEP_WARN() false positives").  However, the current state can
    result in false-positive splats during early boot before lockdep is fully
    initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
    both before and after checking the condition, thus avoiding both sets
    of false-positive error reports.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
    Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index aa1b1c3546d7a..1aec1d53b0c91 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
  * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
  * @c: condition to check
  * @s: informative message
+ *
+ * This checks debug_lockdep_rcu_enabled() before checking (c) to
+ * prevent early boot splats due to lockdep not yet being initialized,
+ * and rechecks it after checking (c) to prevent false-positive splats
+ * due to races with lockdep being disabled.  See commit 3066820034b5dd
+ * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
  */
 #define RCU_LOCKDEP_WARN(c, s)						\
 	do {								\
 		static bool __section(".data.unlikely") __warned;	\
-		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
+		if (debug_lockdep_rcu_enabled() && (c) &&		\
+		    debug_lockdep_rcu_enabled() && !__warned) {		\
 			__warned = true;				\
 			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
 		}							\

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-14 20:03   ` Paul E. McKenney
@ 2022-12-14 20:37     ` Mathieu Desnoyers
  2022-12-14 21:26       ` Paul E. McKenney
  2022-12-15 15:02     ` Steven Rostedt
  1 sibling, 1 reply; 19+ messages in thread
From: Mathieu Desnoyers @ 2022-12-14 20:37 UTC (permalink / raw)
  To: paulmck, Masami Hiramatsu
  Cc: Steven Rostedt, LKML, Linux Trace Kernel, Andrew Morton

On 2022-12-14 15:03, Paul E. McKenney wrote:
> On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
>> On Tue, 13 Dec 2022 17:24:29 -0500
>> Steven Rostedt <rostedt@goodmis.org> wrote:
>>
>>> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>>>
>>> If a trigger filter on the kernel command line fails to apply (due to
>>> syntax error), it will be freed. The freeing will call
>>> tracepoint_synchronize_unregister(), but this is not needed during early
>>> boot up, and will even trigger a lockdep splat.
>>>
>>> Avoid calling the synchronization function when system_state is
>>> SYSTEM_BOOTING.
>>
>> Shouldn't this be done inside tracepoint_synchronize_unregister()?
>> Then, it will prevent similar warnings if we expand boot time feature.
> 
> How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> Just in case there are ever additional issues of this sort?

Hi Paul,

Your approach makes sense. Thanks for looking into this.

Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Wed Dec 14 11:41:44 2022 -0800
> 
>      rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
>      
>      Currently, RCU_LOCKDEP_WARN() checks the condition before checking
>      to see if lockdep is still enabled.  This is necessary to avoid the
>      false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
>      RCU_LOCKDEP_WARN() false positives").  However, the current state can
>      result in false-positive splats during early boot before lockdep is fully
>      initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
>      both before and after checking the condition, thus avoiding both sets
>      of false-positive error reports.
>      
>      Reported-by: Steven Rostedt <rostedt@goodmis.org>
>      Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>      Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>      Cc: Boqun Feng <boqun.feng@gmail.com>
>      Cc: Matthew Wilcox <willy@infradead.org>
>      Cc: Thomas Gleixner <tglx@linutronix.de>
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index aa1b1c3546d7a..1aec1d53b0c91 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
>    * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
>    * @c: condition to check
>    * @s: informative message
> + *
> + * This checks debug_lockdep_rcu_enabled() before checking (c) to
> + * prevent early boot splats due to lockdep not yet being initialized,
> + * and rechecks it after checking (c) to prevent false-positive splats
> + * due to races with lockdep being disabled.  See commit 3066820034b5dd
> + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
>    */
>   #define RCU_LOCKDEP_WARN(c, s)						\
>   	do {								\
>   		static bool __section(".data.unlikely") __warned;	\
> -		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> +		if (debug_lockdep_rcu_enabled() && (c) &&		\
> +		    debug_lockdep_rcu_enabled() && !__warned) {		\
>   			__warned = true;				\
>   			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
>   		}							\

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-14 20:37     ` Mathieu Desnoyers
@ 2022-12-14 21:26       ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-14 21:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Masami Hiramatsu, Steven Rostedt, LKML, Linux Trace Kernel,
	Andrew Morton

On Wed, Dec 14, 2022 at 03:37:13PM -0500, Mathieu Desnoyers wrote:
> On 2022-12-14 15:03, Paul E. McKenney wrote:
> > On Wed, Dec 14, 2022 at 08:49:54AM +0900, Masami Hiramatsu wrote:
> > > On Tue, 13 Dec 2022 17:24:29 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > > > 
> > > > If a trigger filter on the kernel command line fails to apply (due to
> > > > syntax error), it will be freed. The freeing will call
> > > > tracepoint_synchronize_unregister(), but this is not needed during early
> > > > boot up, and will even trigger a lockdep splat.
> > > > 
> > > > Avoid calling the synchronization function when system_state is
> > > > SYSTEM_BOOTING.
> > > 
> > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > Then, it will prevent similar warnings if we expand boot time feature.
> > 
> > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > Just in case there are ever additional issues of this sort?
> 
> Hi Paul,
> 
> Your approach makes sense. Thanks for looking into this.
> 
> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Thank you, I will apply this on my next rebase.

							Thanx, Paul

> > ------------------------------------------------------------------------
> > 
> > commit d493ffca2df6c1963bd1d7b8f8c652a172f095ae
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date:   Wed Dec 14 11:41:44 2022 -0800
> > 
> >      rcu: Make RCU_LOCKDEP_WARN() avoid early lockdep checks
> >      Currently, RCU_LOCKDEP_WARN() checks the condition before checking
> >      to see if lockdep is still enabled.  This is necessary to avoid the
> >      false-positive splats fixed by commit 3066820034b5dd ("rcu: Reject
> >      RCU_LOCKDEP_WARN() false positives").  However, the current state can
> >      result in false-positive splats during early boot before lockdep is fully
> >      initialized.  This commit therefore checks debug_lockdep_rcu_enabled()
> >      both before and after checking the condition, thus avoiding both sets
> >      of false-positive error reports.
> >      Reported-by: Steven Rostedt <rostedt@goodmis.org>
> >      Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >      Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> >      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> >      Cc: Boqun Feng <boqun.feng@gmail.com>
> >      Cc: Matthew Wilcox <willy@infradead.org>
> >      Cc: Thomas Gleixner <tglx@linutronix.de>
> > 
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index aa1b1c3546d7a..1aec1d53b0c91 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -364,11 +364,18 @@ static inline int debug_lockdep_rcu_enabled(void)
> >    * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met
> >    * @c: condition to check
> >    * @s: informative message
> > + *
> > + * This checks debug_lockdep_rcu_enabled() before checking (c) to
> > + * prevent early boot splats due to lockdep not yet being initialized,
> > + * and rechecks it after checking (c) to prevent false-positive splats
> > + * due to races with lockdep being disabled.  See commit 3066820034b5dd
> > + * ("rcu: Reject RCU_LOCKDEP_WARN() false positives") for more detail.
> >    */
> >   #define RCU_LOCKDEP_WARN(c, s)						\
> >   	do {								\
> >   		static bool __section(".data.unlikely") __warned;	\
> > -		if ((c) && debug_lockdep_rcu_enabled() && !__warned) {	\
> > +		if (debug_lockdep_rcu_enabled() && (c) &&		\
> > +		    debug_lockdep_rcu_enabled() && !__warned) {		\
> >   			__warned = true;				\
> >   			lockdep_rcu_suspicious(__FILE__, __LINE__, s);	\
> >   		}							\
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-14 20:03   ` Paul E. McKenney
  2022-12-14 20:37     ` Mathieu Desnoyers
@ 2022-12-15 15:02     ` Steven Rostedt
  2022-12-15 17:02       ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-15 15:02 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Wed, 14 Dec 2022 12:03:33 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > > Avoid calling the synchronization function when system_state is
> > > SYSTEM_BOOTING.  
> > 
> > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > Then, it will prevent similar warnings if we expand boot time feature.  
> 
> How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> Just in case there are ever additional issues of this sort?

Adding more tracing command line parameters is triggering this more. I now
hit:

[    0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.232023] ------------[ cut here ]------------
[    0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
[    0.234241] Modules linked in:
[    0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63
[    0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[    0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
[    0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00
[    0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046
[    0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001
[    0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48
[    0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe
[    0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040
[    0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10
[    0.247826] FS:  0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000
[    0.249084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0
[    0.251122] Call Trace:
[    0.251491]  <TASK>
[    0.251810]  synchronize_rcu+0xb6/0x1e0
[    0.252393]  ? smp_call_function_many_cond+0x36d/0x3b0
[    0.253243]  ? enable_trace_buffered_event+0x10/0x10
[    0.254021]  trace_buffered_event_disable+0x7c/0x140
[    0.254786]  __ftrace_event_enable_disable+0x11c/0x250
[    0.255559]  __ftrace_set_clr_event_nolock+0xe3/0x130
[    0.256321]  ftrace_set_clr_event+0x9a/0xe0
[    0.256976]  early_enable_events+0x64/0xa8
[    0.257647]  trace_event_init+0xd6/0x2dd
[    0.258305]  start_kernel+0x760/0x9e0
[    0.258862]  secondary_startup_64_no_verify+0xe5/0xeb
[    0.259634]  </TASK>
[    0.259966] irq event stamp: 0
[    0.260427] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.262424] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.264345] ---[ end trace 0000000000000000 ]---
[    0.265088] ------------[ cut here ]------------

Where kernel/rcu/tree.c:1410 is at:

// Make the polled API aware of the beginning of a grace period, but
// where caller does not hold the root rcu_node structure's lock.
static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
{
	unsigned long flags;
	struct rcu_node *rnp = rcu_get_root();

	if (rcu_init_invoked()) {
		lockdep_assert_irqs_enabled(); <<-------------- 1410
		raw_spin_lock_irqsave_rcu_node(rnp, flags);
	}
	rcu_poll_gp_seq_start(snap);
	if (rcu_init_invoked())
		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}


-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 15:02     ` Steven Rostedt
@ 2022-12-15 17:02       ` Paul E. McKenney
  2022-12-15 17:25         ` Paul E. McKenney
  2022-12-15 18:51         ` Steven Rostedt
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-15 17:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> On Wed, 14 Dec 2022 12:03:33 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > > Avoid calling the synchronization function when system_state is
> > > > SYSTEM_BOOTING.  
> > > 
> > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > Then, it will prevent similar warnings if we expand boot time feature.  
> > 
> > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > Just in case there are ever additional issues of this sort?
> 
> Adding more tracing command line parameters is triggering this more. I now
> hit:

Fair point, and apologies for the hassle.

Any chance of getting an official "it is now late enough in boot to
safely invoke lockdep" API?  ;-)

In the meantime, does the (untested and quite crude) patch at the end
of this message help?

							Thanx, Paul

> [    0.213442] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> [    0.214653] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [    0.232023] ------------[ cut here ]------------
> [    0.232704] WARNING: CPU: 0 PID: 0 at kernel/rcu/tree.c:1410 rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
> [    0.234241] Modules linked in:
> [    0.234715] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc6-test-00046-ga0ca17bfe724-dirty #63
> [    0.236069] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
> [    0.237549] RIP: 0010:rcu_poll_gp_seq_start_unlocked+0x8a/0xa0
> [    0.238495] Code: 02 0f 0b 48 89 ee 5b 48 c7 c7 00 ca b1 b3 5d e9 cc 0f c1 00 65 8b 05 85 f3 dd 4d 85 c0 75 a1 65 8b 05 d6 f0 dd 4d 85 c0 75 96 <0f> 0b eb 92 5b 5d e9 eb 18 fc 00 66 66 2e 0f 1f 84 00 00 00 00 00
> [    0.241433] RSP: 0000:ffffffffb3a03d28 EFLAGS: 00010046
> [    0.242289] RAX: 0000000000000000 RBX: ffffffffb3bcfc48 RCX: 0000000000000001
> [    0.243355] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffffb3bcfc48
> [    0.244421] RBP: 0000000000000000 R08: ffff97a3c012b8f8 R09: fffffffffffffffe
> [    0.245569] R10: ffffffffb3a03e78 R11: 00000000ffffffff R12: 0000000000000040
> [    0.246752] R13: ffffffffb3a30500 R14: ffffffffb3be6180 R15: ffff97a3c0355d10
> [    0.247826] FS:  0000000000000000(0000) GS:ffff97a537c00000(0000) knlGS:0000000000000000
> [    0.249084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.250012] CR2: ffff97a51e201000 CR3: 000000025c826001 CR4: 00000000000606b0
> [    0.251122] Call Trace:
> [    0.251491]  <TASK>
> [    0.251810]  synchronize_rcu+0xb6/0x1e0
> [    0.252393]  ? smp_call_function_many_cond+0x36d/0x3b0
> [    0.253243]  ? enable_trace_buffered_event+0x10/0x10
> [    0.254021]  trace_buffered_event_disable+0x7c/0x140
> [    0.254786]  __ftrace_event_enable_disable+0x11c/0x250
> [    0.255559]  __ftrace_set_clr_event_nolock+0xe3/0x130
> [    0.256321]  ftrace_set_clr_event+0x9a/0xe0
> [    0.256976]  early_enable_events+0x64/0xa8
> [    0.257647]  trace_event_init+0xd6/0x2dd
> [    0.258305]  start_kernel+0x760/0x9e0
> [    0.258862]  secondary_startup_64_no_verify+0xe5/0xeb
> [    0.259634]  </TASK>
> [    0.259966] irq event stamp: 0
> [    0.260427] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.261448] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.262424] softirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.263402] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.264345] ---[ end trace 0000000000000000 ]---
> [    0.265088] ------------[ cut here ]------------
> 
> Where kernel/rcu/tree.c:1410 is at:
> 
> // Make the polled API aware of the beginning of a grace period, but
> // where caller does not hold the root rcu_node structure's lock.
> static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
> {
> 	unsigned long flags;
> 	struct rcu_node *rnp = rcu_get_root();
> 
> 	if (rcu_init_invoked()) {
> 		lockdep_assert_irqs_enabled(); <<-------------- 1410
> 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 	}
> 	rcu_poll_gp_seq_start(snap);
> 	if (rcu_init_invoked())
> 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..af8c99920e1a4 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -151,7 +151,6 @@ static void sync_sched_exp_online_cleanup(int cpu);
 static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
 static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);
 static bool rcu_rdp_cpu_online(struct rcu_data *rdp);
-static bool rcu_init_invoked(void);
 static void rcu_cleanup_dead_rnp(struct rcu_node *rnp_leaf);
 static void rcu_init_new_rnp(struct rcu_node *rnp_leaf);
 
@@ -1314,7 +1313,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1329,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1352,12 +1351,12 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked()) {
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) {
 		lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
 
@@ -1368,12 +1367,12 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	unsigned long flags;
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked()) {
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE) {
 		lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
 
@@ -4042,13 +4041,6 @@ EXPORT_SYMBOL_GPL(rcu_lockdep_current_cpu_online);
 
 #endif /* #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) */
 
-// Has rcu_init() been invoked?  This is used (for example) to determine
-// whether spinlocks may be acquired safely.
-static bool rcu_init_invoked(void)
-{
-	return !!rcu_state.n_online_cpus;
-}
-
 /*
  * Near the end of the offline process.  Trace the fact that this CPU
  * is going offline.
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index ed6c3cce28f23..26146530d48f9 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -1019,14 +1019,14 @@ unsigned long start_poll_synchronize_rcu_expedited(void)
 	s = get_state_synchronize_rcu();
 	rdp = per_cpu_ptr(&rcu_data, raw_smp_processor_id());
 	rnp = rdp->mynode;
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_lock_irqsave(&rnp->exp_poll_lock, flags);
 	if (!poll_state_synchronize_rcu(s)) {
 		rnp->exp_seq_poll_rq = s;
-		if (rcu_init_invoked())
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 			queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
 	}
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_spin_unlock_irqrestore(&rnp->exp_poll_lock, flags);
 
 	return s;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 17:02       ` Paul E. McKenney
@ 2022-12-15 17:25         ` Paul E. McKenney
  2022-12-15 18:51         ` Steven Rostedt
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-15 17:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 09:02:56AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > On Wed, 14 Dec 2022 12:03:33 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > 
> > > > > Avoid calling the synchronization function when system_state is
> > > > > SYSTEM_BOOTING.  
> > > > 
> > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > Then, it will prevent similar warnings if we expand boot time feature.  
> > > 
> > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > Just in case there are ever additional issues of this sort?
> > 
> > Adding more tracing command line parameters is triggering this more. I now
> > hit:
> 
> Fair point, and apologies for the hassle.
> 
> Any chance of getting an official "it is now late enough in boot to
> safely invoke lockdep" API?  ;-)
> 
> In the meantime, does the (untested and quite crude) patch at the end
> of this message help?

OK, I was clearly not yet awake.  :-/

The more targeted (but still untested) patch below might be more
appropriate...

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 17:02       ` Paul E. McKenney
  2022-12-15 17:25         ` Paul E. McKenney
@ 2022-12-15 18:51         ` Steven Rostedt
  2022-12-15 19:01           ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-15 18:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, 15 Dec 2022 09:02:56 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > On Wed, 14 Dec 2022 12:03:33 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >   
> > > > > Avoid calling the synchronization function when system_state is
> > > > > SYSTEM_BOOTING.    
> > > > 
> > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > Then, it will prevent similar warnings if we expand boot time feature.    
> > > 
> > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > Just in case there are ever additional issues of this sort?  
> > 
> > Adding more tracing command line parameters is triggering this more. I now
> > hit:  
> 
> Fair point, and apologies for the hassle.
> 
> Any chance of getting an official "it is now late enough in boot to
> safely invoke lockdep" API?  ;-)

lockdep API isn't the problem, it's that we are still in the earlyboot stage
where interrupts are disabled, and you can't enable them. Lockdep works
just fine there, and is reporting interrupts being disabled correctly. The
backtrace reported *does* have interrupts disabled.

The thing is, because we are still running on a single CPU with interrupts
disabled, there is no need for synchronization. Even grabbing a mutex is
safe because there's guaranteed to be no contention (unless it's not
released). This is why a lot of warnings are suppressed if system_state is
SYSTEM_BOOTING.

> 
> In the meantime, does the (untested and quite crude) patch at the end
> of this message help?

I'll go and test it, but I'm guessing it will work fine. You could also test
against system_state != SYSTEM_BOOTING, as that gets cleared just before
kernel_init() can continue (it waits for the complete() that is called
after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
after rcu_scheduler_starting().

I wonder if you could even replace RCU_SCHEDULER_RUNNING with
system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
call.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 18:51         ` Steven Rostedt
@ 2022-12-15 19:01           ` Paul E. McKenney
  2022-12-15 22:39             ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-15 19:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 01:51:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 09:02:56 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Thu, Dec 15, 2022 at 10:02:41AM -0500, Steven Rostedt wrote:
> > > On Wed, 14 Dec 2022 12:03:33 -0800
> > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > >   
> > > > > > Avoid calling the synchronization function when system_state is
> > > > > > SYSTEM_BOOTING.    
> > > > > 
> > > > > Shouldn't this be done inside tracepoint_synchronize_unregister()?
> > > > > Then, it will prevent similar warnings if we expand boot time feature.    
> > > > 
> > > > How about the following wide-spectrum fix within RCU_LOCKDEP_WARN()?
> > > > Just in case there are ever additional issues of this sort?  
> > > 
> > > Adding more tracing command line parameters is triggering this more. I now
> > > hit:  
> > 
> > Fair point, and apologies for the hassle.
> > 
> > Any chance of getting an official "it is now late enough in boot to
> > safely invoke lockdep" API?  ;-)
> 
> lockdep API isn't the problem, it's that we are still in the earlyboot stage
> where interrupts are disabled, and you can't enable them. Lockdep works
> just fine there, and is reporting interrupts being disabled correctly. The
> backtrace reported *does* have interrupts disabled.
> 
> The thing is, because we are still running on a single CPU with interrupts
> disabled, there is no need for synchronization. Even grabbing a mutex is
> safe because there's guaranteed to be no contention (unless it's not
> released). This is why a lot of warnings are suppressed if system_state is
> SYSTEM_BOOTING.

Agreed, and my second attempt is a bit more surgical.  (Please see below
for a more official version of it.)

> > In the meantime, does the (untested and quite crude) patch at the end
> > of this message help?
> 
> I'll go and test it, but I'm guessing it will work fine. You could also test
> against system_state != SYSTEM_BOOTING, as that gets cleared just before
> kernel_init() can continue (it waits for the complete() that is called
> after system_state is set to SYSTEM_SCHEDULING). Which happens shortly
> after rcu_scheduler_starting().
> 
> I wonder if you could even replace RCU_SCHEDULER_RUNNING with
> system_state != SYSTEM_BOOTING, and remove the rcu_scheduler_starting()
> call.

In this particular case, agreed, I could use system_state.  But there are
other cases that must change behavior as soon as preemption can happen,
which is upon return from that call to user_mode_thread().  The update to
system_state doesn't happen until much later.  So I don't get to remove
that rcu_scheduler_starting() call.

What case?

Here is one:

o	The newly spawned init process does something that uses RCU,
	but is preempted while holding rcu_read_lock().

o	The boot thread, which did the preempting, waits for a grace
	period.  If we use rcu_scheduler_active, all is well because
	synchronize_rcu() will do a real run-time grace period, thus
	waiting for that reader.

	But system_state has not yet been updated, so if synchronize_rcu()
	were instead to pay attention to that one, there might be a
	tragically too-short RCU grace period.

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 876c5ac113fa66a64fa241e69d9a2251b8daa5ee
Author: Paul E. McKenney <paulmck@kernel.org>
Date:   Thu Dec 15 09:26:09 2022 -0800

    rcu: Don't assert interrupts enabled too early in boot
    
    The rcu_poll_gp_seq_end() and rcu_poll_gp_seq_end_unlocked() both check
    that interrupts are enabled, as they normally should be when waiting for
    an RCU grace period.  Except that it is legal to wait for grace periods
    during early boot, before interrupts have been enabled for the first time,
    and polling for grace periods is required to work during this time.
    This can result in false-positive lockdep splats in the presence of
    boot-time-initiated tracing.
    
    This commit therefore conditions those interrupts-enabled checks on
    rcu_scheduler_active having advanced past RCU_SCHEDULER_INACTIVE, by
    which time interrupts have been enabled.
    
    Reported-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ee8a6a711719a..f627888715dca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1314,7 +1314,7 @@ static void rcu_poll_gp_seq_start(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If RCU was idle, note beginning of GP.
@@ -1330,7 +1330,7 @@ static void rcu_poll_gp_seq_end(unsigned long *snap)
 {
 	struct rcu_node *rnp = rcu_get_root();
 
-	if (rcu_init_invoked())
+	if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
 		raw_lockdep_assert_held_rcu_node(rnp);
 
 	// If the previously noted GP is still in effect, record the
@@ -1353,7 +1353,8 @@ static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_start(snap);
@@ -1369,7 +1370,8 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
 	struct rcu_node *rnp = rcu_get_root();
 
 	if (rcu_init_invoked()) {
-		lockdep_assert_irqs_enabled();
+		if (rcu_scheduler_active != RCU_SCHEDULER_INACTIVE)
+			lockdep_assert_irqs_enabled();
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 	}
 	rcu_poll_gp_seq_end(snap);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 19:01           ` Paul E. McKenney
@ 2022-12-15 22:39             ` Steven Rostedt
  2022-12-15 23:10               ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-15 22:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, 15 Dec 2022 11:01:58 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> What case?
> 
> Here is one:
> 
> o	The newly spawned init process does something that uses RCU,
> 	but is preempted while holding rcu_read_lock().
> 
> o	The boot thread, which did the preempting, waits for a grace
> 	period.  If we use rcu_scheduler_active, all is well because
> 	synchronize_rcu() will do a real run-time grace period, thus
> 	waiting for that reader.
> 
> 	But system_state has not yet been updated, so if synchronize_rcu()
> 	were instead to pay attention to that one, there might be a
> 	tragically too-short RCU grace period.

The thing is, preemption is disabled the entire time here.

That is, from:

void kthread_show_list(void);
noinline void __ref rest_init(void)
{
	struct task_struct *tsk;
	int pid;

	rcu_scheduler_starting();

through:

	system_state = SYSTEM_SCHEDULING;

	complete(&kthreadd_done);


Preemption is disabled and other CPUs have not even been started yet.

Although the might_sleep() call might schedule the kernel_init() task but
that will only block on the completion.

In other words, I don't think anything can cause any issues this early in
the boot up.

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 22:39             ` Steven Rostedt
@ 2022-12-15 23:10               ` Paul E. McKenney
  2022-12-15 23:42                 ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-15 23:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 05:39:13PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 11:01:58 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > What case?
> > 
> > Here is one:
> > 
> > o	The newly spawned init process does something that uses RCU,
> > 	but is preempted while holding rcu_read_lock().
> > 
> > o	The boot thread, which did the preempting, waits for a grace
> > 	period.  If we use rcu_scheduler_active, all is well because
> > 	synchronize_rcu() will do a real run-time grace period, thus
> > 	waiting for that reader.
> > 
> > 	But system_state has not yet been updated, so if synchronize_rcu()
> > 	were instead to pay attention to that one, there might be a
> > 	tragically too-short RCU grace period.
> 
> The thing is, preemption is disabled the entire time here.
> 
> That is, from:
> 
> void kthread_show_list(void);
> noinline void __ref rest_init(void)
> {
> 	struct task_struct *tsk;
> 	int pid;
> 
> 	rcu_scheduler_starting();
> 
> through:
> 
> 	system_state = SYSTEM_SCHEDULING;
> 
> 	complete(&kthreadd_done);
> 
> 
> Preemption is disabled and other CPUs have not even been started yet.
> 
> Although the might_sleep() call might schedule the kernel_init() task but
> that will only block on the completion.
> 
> In other words, I don't think anything can cause any issues this early in
> the boot up.

The nice thing about the current placement of rcu_scheduler_starting()
is that there is not yet any other task to switch to.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 23:10               ` Paul E. McKenney
@ 2022-12-15 23:42                 ` Steven Rostedt
  2022-12-16  1:01                   ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-15 23:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, 15 Dec 2022 15:10:27 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> The nice thing about the current placement of rcu_scheduler_starting()
> is that there is not yet any other task to switch to.  ;-)

Fair enough. Anyway the last patch appears to do the job.

Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-15 23:42                 ` Steven Rostedt
@ 2022-12-16  1:01                   ` Paul E. McKenney
  2022-12-16  4:40                     ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-16  1:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 15:10:27 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > The nice thing about the current placement of rcu_scheduler_starting()
> > is that there is not yet any other task to switch to.  ;-)
> 
> Fair enough. Anyway the last patch appears to do the job.
> 
> Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Applied, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-16  1:01                   ` Paul E. McKenney
@ 2022-12-16  4:40                     ` Paul E. McKenney
  2022-12-16 11:08                       ` Steven Rostedt
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-16  4:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:
> > On Thu, 15 Dec 2022 15:10:27 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > 
> > > The nice thing about the current placement of rcu_scheduler_starting()
> > > is that there is not yet any other task to switch to.  ;-)
> > 
> > Fair enough. Anyway the last patch appears to do the job.
> > 
> > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> 
> Applied, thank you!

And I should have checked up on urgency.  Normal process would hold this
one until the v6.3 merge window, but please let me know if you would
like it in v6.2.  It is after all a bug fix.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-16  4:40                     ` Paul E. McKenney
@ 2022-12-16 11:08                       ` Steven Rostedt
  2022-12-18  0:29                         ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Rostedt @ 2022-12-16 11:08 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Thu, 15 Dec 2022 20:40:22 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:  
> > > On Thu, 15 Dec 2022 15:10:27 -0800
> > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > >   
> > > > The nice thing about the current placement of rcu_scheduler_starting()
> > > > is that there is not yet any other task to switch to.  ;-)  
> > > 
> > > Fair enough. Anyway the last patch appears to do the job.
> > > 
> > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>  
> > 
> > Applied, thank you!  
> 
> And I should have checked up on urgency.  Normal process would hold this
> one until the v6.3 merge window, but please let me know if you would
> like it in v6.2.  It is after all a bug fix.
> 

Yes, any fix that fixes a lockdep splat is considered urgent and should be
pushed ASAP. As it makes finding other lockdep splats more difficult (as
the first splat shuts down lockdep).

Please send this during the merge window, as I have to carry it internally
until it makes it into mainline.

Thanks!

-- Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up
  2022-12-16 11:08                       ` Steven Rostedt
@ 2022-12-18  0:29                         ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2022-12-18  0:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, LKML, Linux Trace Kernel, Andrew Morton,
	Mathieu Desnoyers

On Fri, Dec 16, 2022 at 06:08:10AM -0500, Steven Rostedt wrote:
> On Thu, 15 Dec 2022 20:40:22 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Thu, Dec 15, 2022 at 05:01:18PM -0800, Paul E. McKenney wrote:
> > > On Thu, Dec 15, 2022 at 06:42:02PM -0500, Steven Rostedt wrote:  
> > > > On Thu, 15 Dec 2022 15:10:27 -0800
> > > > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > >   
> > > > > The nice thing about the current placement of rcu_scheduler_starting()
> > > > > is that there is not yet any other task to switch to.  ;-)  
> > > > 
> > > > Fair enough. Anyway the last patch appears to do the job.
> > > > 
> > > > Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > > Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>  
> > > 
> > > Applied, thank you!  
> > 
> > And I should have checked up on urgency.  Normal process would hold this
> > one until the v6.3 merge window, but please let me know if you would
> > like it in v6.2.  It is after all a bug fix.
> > 
> 
> Yes, any fix that fixes a lockdep splat is considered urgent and should be
> pushed ASAP. As it makes finding other lockdep splats more difficult (as
> the first splat shuts down lockdep).
> 
> Please send this during the merge window, as I have to carry it internally
> until it makes it into mainline.

This is now rebased and marked with tag rcu-urgent.2022.12.17a.
If testing goes well, I will send it by the middle of next week.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-12-18  0:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-13 22:24 [PATCH] tracing: Do not synchronize freeing of trigger filter on boot up Steven Rostedt
2022-12-13 23:49 ` Masami Hiramatsu
2022-12-14  0:03   ` Steven Rostedt
2022-12-14  7:31     ` Masami Hiramatsu
2022-12-14 20:03   ` Paul E. McKenney
2022-12-14 20:37     ` Mathieu Desnoyers
2022-12-14 21:26       ` Paul E. McKenney
2022-12-15 15:02     ` Steven Rostedt
2022-12-15 17:02       ` Paul E. McKenney
2022-12-15 17:25         ` Paul E. McKenney
2022-12-15 18:51         ` Steven Rostedt
2022-12-15 19:01           ` Paul E. McKenney
2022-12-15 22:39             ` Steven Rostedt
2022-12-15 23:10               ` Paul E. McKenney
2022-12-15 23:42                 ` Steven Rostedt
2022-12-16  1:01                   ` Paul E. McKenney
2022-12-16  4:40                     ` Paul E. McKenney
2022-12-16 11:08                       ` Steven Rostedt
2022-12-18  0:29                         ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.