All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] rcu/tree: Add a trace event for RCU stall warnings
       [not found] <CGME20210215090241epcas1p17b1319d38de5a09a9d65a06ebf936866@epcas1p1.samsung.com>
@ 2021-02-15  8:53 ` Sangmoon Kim
  2021-02-16 17:50   ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-15  8:53 UTC (permalink / raw)
  To: paulmck, josh
  Cc: rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu, Sangmoon Kim

The event allows us to trace the RCU stall when
sysctl_panic_on_rcu_stall is disabled.

The first parameter is the name of RCU flavour like other trace
events. The second one shows us which function detected stalls.

The RCU stall is mainly caused by external factors such as interrupt
handling or task scheduling or something else. Therefore, this event
uses TRACE_EVENT macro, not dedicated one, so that someone interested
in the RCU stall can use it without CONFIG_RCU_TRACE.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
---
 include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
 kernel/rcu/tree_exp.h      |  2 ++
 kernel/rcu/tree_stall.h    |  2 ++
 3 files changed, 32 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..8476f3161bd0 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
 		  __entry->cpu, __entry->qsevent)
 );
 
+/*
+ * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
+ * and a string identifying which function detected the RCU stall as follows:
+ *
+ *	"StallDetected": Scheduler-tick detects other CPU's stalls.
+ *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
+ *	"ExpeditedStall": Expedited grace period detects stalls.
+ */
+TRACE_EVENT(rcu_stall_warning,
+
+	TP_PROTO(const char *rcuname, const char *msg),
+
+	TP_ARGS(rcuname, msg),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(const char *, msg)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->msg = msg;
+	),
+
+	TP_printk("%s %s",
+		  __entry->rcuname, __entry->msg)
+);
+
 #endif /* #if defined(CONFIG_TREE_RCU) */
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8760b6ead770..c16618284cb2 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
 				dump_cpu_task(cpu);
 			}
 		}
+		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
+
 		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
 	}
 }
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 70d48c52fabc..e93df4fac5b1 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 
 	rcu_check_gp_kthread_starvation();
 
+	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
 	panic_on_rcu_stall();
 
 	rcu_force_quiescent_state();  /* Kick them all. */
@@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
 			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
+	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	panic_on_rcu_stall();
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-15  8:53 ` [PATCH] rcu/tree: Add a trace event for RCU stall warnings Sangmoon Kim
@ 2021-02-16 17:50   ` Paul E. McKenney
       [not found]     ` <CGME20210217135813epcas1p2b3bd79ca06ac847472b048b889100978@epcas1p2.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2021-02-16 17:50 UTC (permalink / raw)
  To: Sangmoon Kim; +Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, joel, rcu

On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> The event allows us to trace the RCU stall when
> sysctl_panic_on_rcu_stall is disabled.
> 
> The first parameter is the name of RCU flavour like other trace
> events. The second one shows us which function detected stalls.
> 
> The RCU stall is mainly caused by external factors such as interrupt
> handling or task scheduling or something else. Therefore, this event
> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> in the RCU stall can use it without CONFIG_RCU_TRACE.
> 
> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>

The patch looks plausible, but I have to ask...  Why not instead just
get the existing information out of the console log?

							Thanx, Paul

> ---
>  include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
>  kernel/rcu/tree_exp.h      |  2 ++
>  kernel/rcu/tree_stall.h    |  2 ++
>  3 files changed, 32 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 155b5cb43cfd..8476f3161bd0 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
>  		  __entry->cpu, __entry->qsevent)
>  );
>  
> +/*
> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> + * and a string identifying which function detected the RCU stall as follows:
> + *
> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> + *	"ExpeditedStall": Expedited grace period detects stalls.
> + */
> +TRACE_EVENT(rcu_stall_warning,
> +
> +	TP_PROTO(const char *rcuname, const char *msg),
> +
> +	TP_ARGS(rcuname, msg),
> +
> +	TP_STRUCT__entry(
> +		__field(const char *, rcuname)
> +		__field(const char *, msg)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->rcuname = rcuname;
> +		__entry->msg = msg;
> +	),
> +
> +	TP_printk("%s %s",
> +		  __entry->rcuname, __entry->msg)
> +);
> +
>  #endif /* #if defined(CONFIG_TREE_RCU) */
>  
>  /*
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 8760b6ead770..c16618284cb2 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
>  				dump_cpu_task(cpu);
>  			}
>  		}
> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
> +
>  		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
>  	}
>  }
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 70d48c52fabc..e93df4fac5b1 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  
>  	rcu_check_gp_kthread_starvation();
>  
> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
>  	panic_on_rcu_stall();
>  
>  	rcu_force_quiescent_state();  /* Kick them all. */
> @@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
>  			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  
> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
>  	panic_on_rcu_stall();
>  
>  	/*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
       [not found]     ` <CGME20210217135813epcas1p2b3bd79ca06ac847472b048b889100978@epcas1p2.samsung.com>
@ 2021-02-17 13:49       ` Sangmoon Kim
  2021-02-17 18:17         ` Neeraj Upadhyay
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-17 13:49 UTC (permalink / raw)
  To: paulmck
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, rcu, rostedt, sangmoon.kim

> -----Original Message-----
> From: Paul E. McKenney <paulmck@kernel.org>
> Sent: Wednesday, February 17, 2021 2:50 AM
> 
> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> > The event allows us to trace the RCU stall when
> > sysctl_panic_on_rcu_stall is disabled.
> >
> > The first parameter is the name of RCU flavour like other trace
> > events. The second one shows us which function detected stalls.
> >
> > The RCU stall is mainly caused by external factors such as interrupt
> > handling or task scheduling or something else. Therefore, this event
> > uses TRACE_EVENT macro, not dedicated one, so that someone interested
> > in the RCU stall can use it without CONFIG_RCU_TRACE.
> >
> > Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> 
> The patch looks plausible, but I have to ask...  Why not instead just
> get the existing information out of the console log?
> 
> 							Thanx, Paul

This can provide a trigger point for the RCU stall warning.
If a module in the kernel wants to trace the stall for debugging purposes,
there is a cost of continuing to parse the console log.
This tracepoint is useful because it is hard to pay these costs
especially on mobile devices.

Thanks,
Sangmoon

> 
> > ---
> >  include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
> >  kernel/rcu/tree_exp.h      |  2 ++
> >  kernel/rcu/tree_stall.h    |  2 ++
> >  3 files changed, 32 insertions(+)
> >
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index 155b5cb43cfd..8476f3161bd0 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
> >  		  __entry->cpu, __entry->qsevent)
> >  );
> >
> > +/*
> > + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> > + * and a string identifying which function detected the RCU stall as follows:
> > + *
> > + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> > + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> > + *	"ExpeditedStall": Expedited grace period detects stalls.
> > + */
> > +TRACE_EVENT(rcu_stall_warning,
> > +
> > +	TP_PROTO(const char *rcuname, const char *msg),
> > +
> > +	TP_ARGS(rcuname, msg),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(const char *, rcuname)
> > +		__field(const char *, msg)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->rcuname = rcuname;
> > +		__entry->msg = msg;
> > +	),
> > +
> > +	TP_printk("%s %s",
> > +		  __entry->rcuname, __entry->msg)
> > +);
> > +
> >  #endif /* #if defined(CONFIG_TREE_RCU) */
> >
> >  /*
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 8760b6ead770..c16618284cb2 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
> >  				dump_cpu_task(cpu);
> >  			}
> >  		}
> > +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
> > +
> >  		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
> >  	}
> >  }
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 70d48c52fabc..e93df4fac5b1 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >
> >  	rcu_check_gp_kthread_starvation();
> >
> > +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
> >  	panic_on_rcu_stall();
> >
> >  	rcu_force_quiescent_state();  /* Kick them all. */
> > @@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
> >  			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >
> > +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
> >  	panic_on_rcu_stall();
> >
> >  	/*
> > --
> > 2.17.1
> >


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-17 13:49       ` Sangmoon Kim
@ 2021-02-17 18:17         ` Neeraj Upadhyay
       [not found]           ` <CGME20210218130056epcas1p21e5b2e873f8c3cb0aa6953473d4c416c@epcas1p2.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Neeraj Upadhyay @ 2021-02-17 18:17 UTC (permalink / raw)
  To: Sangmoon Kim, paulmck
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, rcu, rostedt

Hi Sangmoon,

On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> -----Original Message-----
>> From: Paul E. McKenney <paulmck@kernel.org>
>> Sent: Wednesday, February 17, 2021 2:50 AM
>>
>> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>>> The event allows us to trace the RCU stall when
>>> sysctl_panic_on_rcu_stall is disabled.
>>>
>>> The first parameter is the name of RCU flavour like other trace
>>> events. The second one shows us which function detected stalls.
>>>
>>> The RCU stall is mainly caused by external factors such as interrupt
>>> handling or task scheduling or something else. Therefore, this event
>>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>>>
>>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
>>
>> The patch looks plausible, but I have to ask...  Why not instead just
>> get the existing information out of the console log?
>>
>> 							Thanx, Paul
> 
> This can provide a trigger point for the RCU stall warning.
> If a module in the kernel wants to trace the stall for debugging purposes,
> there is a cost of continuing to parse the console log.
> This tracepoint is useful because it is hard to pay these costs
> especially on mobile devices.
> 
> Thanks,
> Sangmoon
> 

So, the idea here is to register to these trace events from kernel 
module and use that for debugging? Just curious what debugging action
module does on these traces, as they have limited information
about the stall, compared to console stall warnings, which gives a
much more detailed information about stall.


Thanks
Neeraj

>>
>>> ---
>>>   include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
>>>   kernel/rcu/tree_exp.h      |  2 ++
>>>   kernel/rcu/tree_stall.h    |  2 ++
>>>   3 files changed, 32 insertions(+)
>>>
>>> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
>>> index 155b5cb43cfd..8476f3161bd0 100644
>>> --- a/include/trace/events/rcu.h
>>> +++ b/include/trace/events/rcu.h
>>> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
>>>   		  __entry->cpu, __entry->qsevent)
>>>   );
>>>
>>> +/*
>>> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
>>> + * and a string identifying which function detected the RCU stall as follows:
>>> + *
>>> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
>>> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
>>> + *	"ExpeditedStall": Expedited grace period detects stalls.
>>> + */
>>> +TRACE_EVENT(rcu_stall_warning,
>>> +
>>> +	TP_PROTO(const char *rcuname, const char *msg),
>>> +
>>> +	TP_ARGS(rcuname, msg),
>>> +
>>> +	TP_STRUCT__entry(
>>> +		__field(const char *, rcuname)
>>> +		__field(const char *, msg)
>>> +	),
>>> +
>>> +	TP_fast_assign(
>>> +		__entry->rcuname = rcuname;
>>> +		__entry->msg = msg;
>>> +	),
>>> +
>>> +	TP_printk("%s %s",
>>> +		  __entry->rcuname, __entry->msg)
>>> +);
>>> +
>>>   #endif /* #if defined(CONFIG_TREE_RCU) */
>>>
>>>   /*
>>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
>>> index 8760b6ead770..c16618284cb2 100644
>>> --- a/kernel/rcu/tree_exp.h
>>> +++ b/kernel/rcu/tree_exp.h
>>> @@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
>>>   				dump_cpu_task(cpu);
>>>   			}
>>>   		}
>>> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
>>> +
>>>   		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
>>>   	}
>>>   }
>>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
>>> index 70d48c52fabc..e93df4fac5b1 100644
>>> --- a/kernel/rcu/tree_stall.h
>>> +++ b/kernel/rcu/tree_stall.h
>>> @@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>>>
>>>   	rcu_check_gp_kthread_starvation();
>>>
>>> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
>>>   	panic_on_rcu_stall();
>>>
>>>   	rcu_force_quiescent_state();  /* Kick them all. */
>>> @@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
>>>   			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
>>>   	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>>>
>>> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
>>>   	panic_on_rcu_stall();
>>>
>>>   	/*
>>> --
>>> 2.17.1
>>>
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
       [not found]           ` <CGME20210218130056epcas1p21e5b2e873f8c3cb0aa6953473d4c416c@epcas1p2.samsung.com>
@ 2021-02-18 12:52             ` Sangmoon Kim
  2021-02-18 14:17               ` Mathieu Desnoyers
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-18 12:52 UTC (permalink / raw)
  To: neeraju
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, paulmck, rcu,
	rostedt, sangmoon.kim

> -----Original Message-----
> From: neeraju=codeaurora.org@mg.codeaurora.org <neeraju=codeaurora.org@mg.codeaurora.org>
> Sent: Thursday, February 18, 2021 3:18 AM
> 
> Hi Sangmoon,
> 
> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> -----Original Message-----
> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >>
> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >>> The event allows us to trace the RCU stall when
> >>> sysctl_panic_on_rcu_stall is disabled.
> >>>
> >>> The first parameter is the name of RCU flavour like other trace
> >>> events. The second one shows us which function detected stalls.
> >>>
> >>> The RCU stall is mainly caused by external factors such as interrupt
> >>> handling or task scheduling or something else. Therefore, this event
> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >>>
> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >>
> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> get the existing information out of the console log?
> >>
> >> 							Thanx, Paul
> >
> > This can provide a trigger point for the RCU stall warning.
> > If a module in the kernel wants to trace the stall for debugging purposes,
> > there is a cost of continuing to parse the console log.
> > This tracepoint is useful because it is hard to pay these costs
> > especially on mobile devices.
> >
> > Thanks,
> > Sangmoon
> >
> 
> So, the idea here is to register to these trace events from kernel
> module and use that for debugging? Just curious what debugging action
> module does on these traces, as they have limited information
> about the stall, compared to console stall warnings, which gives a
> much more detailed information about stall.
> 
> 
> Thanks
> Neeraj

Hi Neeraj,

Yes, a module can log the stall occurence using the trace, although
there is no detailed information. If the kernel panic occurs for some
reasons, the debugging report generated by the module can include that
RCU stall warning has occurred before.

In addition, it's just an idea now, when a trace event happens, the
module can store the console log including detailed information, or may
also obtain CPU/task information by parsing the console log.

Thanks,
Sangmoon

> 
> >>
> >>> ---
> >>>   include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
> >>>   kernel/rcu/tree_exp.h      |  2 ++
> >>>   kernel/rcu/tree_stall.h    |  2 ++
> >>>   3 files changed, 32 insertions(+)
> >>>
> >>> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> >>> index 155b5cb43cfd..8476f3161bd0 100644
> >>> --- a/include/trace/events/rcu.h
> >>> +++ b/include/trace/events/rcu.h
> >>> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
> >>>   		  __entry->cpu, __entry->qsevent)
> >>>   );
> >>>
> >>> +/*
> >>> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> >>> + * and a string identifying which function detected the RCU stall as follows:
> >>> + *
> >>> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> >>> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> >>> + *	"ExpeditedStall": Expedited grace period detects stalls.
> >>> + */
> >>> +TRACE_EVENT(rcu_stall_warning,
> >>> +
> >>> +	TP_PROTO(const char *rcuname, const char *msg),
> >>> +
> >>> +	TP_ARGS(rcuname, msg),
> >>> +
> >>> +	TP_STRUCT__entry(
> >>> +		__field(const char *, rcuname)
> >>> +		__field(const char *, msg)
> >>> +	),
> >>> +
> >>> +	TP_fast_assign(
> >>> +		__entry->rcuname = rcuname;
> >>> +		__entry->msg = msg;
> >>> +	),
> >>> +
> >>> +	TP_printk("%s %s",
> >>> +		  __entry->rcuname, __entry->msg)
> >>> +);
> >>> +
> >>>   #endif /* #if defined(CONFIG_TREE_RCU) */
> >>>
> >>>   /*
> >>> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> >>> index 8760b6ead770..c16618284cb2 100644
> >>> --- a/kernel/rcu/tree_exp.h
> >>> +++ b/kernel/rcu/tree_exp.h
> >>> @@ -566,6 +566,8 @@ static void synchronize_rcu_expedited_wait(void)
> >>>   				dump_cpu_task(cpu);
> >>>   			}
> >>>   		}
> >>> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
> >>> +
> >>>   		jiffies_stall = 3 * rcu_jiffies_till_stall_check() + 3;
> >>>   	}
> >>>   }
> >>> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> >>> index 70d48c52fabc..e93df4fac5b1 100644
> >>> --- a/kernel/rcu/tree_stall.h
> >>> +++ b/kernel/rcu/tree_stall.h
> >>> @@ -531,6 +531,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >>>
> >>>   	rcu_check_gp_kthread_starvation();
> >>>
> >>> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
> >>>   	panic_on_rcu_stall();
> >>>
> >>>   	rcu_force_quiescent_state();  /* Kick them all. */
> >>> @@ -575,6 +576,7 @@ static void print_cpu_stall(unsigned long gps)
> >>>   			   jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
> >>>   	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >>>
> >>> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
> >>>   	panic_on_rcu_stall();
> >>>
> >>>   	/*
> >>> --
> >>> 2.17.1
> >>>
> >
> 
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 12:52             ` Sangmoon Kim
@ 2021-02-18 14:17               ` Mathieu Desnoyers
  2021-02-18 15:06                 ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2021-02-18 14:17 UTC (permalink / raw)
  To: Sangmoon Kim
  Cc: neeraju, Lai Jiangshan, Joel Fernandes, Google, Josh Triplett,
	paulmck, rcu, rostedt

----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:

>> -----Original Message-----
>> From: neeraju=codeaurora.org@mg.codeaurora.org
>> <neeraju=codeaurora.org@mg.codeaurora.org>
>> Sent: Thursday, February 18, 2021 3:18 AM
>> 
>> Hi Sangmoon,
>> 
>> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> >> -----Original Message-----
>> >> From: Paul E. McKenney <paulmck@kernel.org>
>> >> Sent: Wednesday, February 17, 2021 2:50 AM
>> >>
>> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>> >>> The event allows us to trace the RCU stall when
>> >>> sysctl_panic_on_rcu_stall is disabled.
>> >>>
>> >>> The first parameter is the name of RCU flavour like other trace
>> >>> events. The second one shows us which function detected stalls.
>> >>>
>> >>> The RCU stall is mainly caused by external factors such as interrupt
>> >>> handling or task scheduling or something else. Therefore, this event
>> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>> >>>
>> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
>> >>
>> >> The patch looks plausible, but I have to ask...  Why not instead just
>> >> get the existing information out of the console log?
>> >>
>> >> 							Thanx, Paul
>> >
>> > This can provide a trigger point for the RCU stall warning.
>> > If a module in the kernel wants to trace the stall for debugging purposes,
>> > there is a cost of continuing to parse the console log.
>> > This tracepoint is useful because it is hard to pay these costs
>> > especially on mobile devices.
>> >
>> > Thanks,
>> > Sangmoon
>> >
>> 
>> So, the idea here is to register to these trace events from kernel
>> module and use that for debugging? Just curious what debugging action
>> module does on these traces, as they have limited information
>> about the stall, compared to console stall warnings, which gives a
>> much more detailed information about stall.
>> 
>> 
>> Thanks
>> Neeraj
> 
> Hi Neeraj,
> 
> Yes, a module can log the stall occurence using the trace, although
> there is no detailed information. If the kernel panic occurs for some
> reasons, the debugging report generated by the module can include that
> RCU stall warning has occurred before.
> 
> In addition, it's just an idea now, when a trace event happens, the
> module can store the console log including detailed information, or may
> also obtain CPU/task information by parsing the console log.

Adding a new tracepoint is not just about what is extracted by this specific
tracepoint, but rather how it can be analyzed when combined with all other relevant
tracepoints.

For instance, if we have this added RCU stall warning added, here is how it can be
used with the upcoming LTTng 2.13, which implements the "event notification" triggers
feature:

1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring buffer,
   enabling the following tracepoints:
   - kernel activity (meaning all other RCU event, scheduling, irq, workqueues, ...),
   - this new RCU stall warning event.

2) Add a "callstack-kernel" context to the RCU stall warning event. This will sample
   the kernel stack when the event is hit. This will provide information similar to
   the stack trace gathered into the console log on OOPS.

3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On this
   trigger, actions can be associated, such as capturing a snapshot or waking up
   an external user-space process to perform specific actions.

So you end up with a snapshot containing the sequence of events leading to the
RCU stall warning, with a kernel stack trace of the context causing the stall
warning to be emitted.

I would argue that this information is more complete than just the stack trace
extracted through the console log.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 14:17               ` Mathieu Desnoyers
@ 2021-02-18 15:06                 ` Paul E. McKenney
  2021-02-18 15:50                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2021-02-18 15:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes, Google,
	Josh Triplett, rcu, rostedt

On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
> 
> >> -----Original Message-----
> >> From: neeraju=codeaurora.org@mg.codeaurora.org
> >> <neeraju=codeaurora.org@mg.codeaurora.org>
> >> Sent: Thursday, February 18, 2021 3:18 AM
> >> 
> >> Hi Sangmoon,
> >> 
> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> >> -----Original Message-----
> >> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >> >>
> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >> >>> The event allows us to trace the RCU stall when
> >> >>> sysctl_panic_on_rcu_stall is disabled.
> >> >>>
> >> >>> The first parameter is the name of RCU flavour like other trace
> >> >>> events. The second one shows us which function detected stalls.
> >> >>>
> >> >>> The RCU stall is mainly caused by external factors such as interrupt
> >> >>> handling or task scheduling or something else. Therefore, this event
> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >> >>>
> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >> >>
> >> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> >> get the existing information out of the console log?
> >> >>
> >> >> 							Thanx, Paul
> >> >
> >> > This can provide a trigger point for the RCU stall warning.
> >> > If a module in the kernel wants to trace the stall for debugging purposes,
> >> > there is a cost of continuing to parse the console log.
> >> > This tracepoint is useful because it is hard to pay these costs
> >> > especially on mobile devices.
> >> >
> >> > Thanks,
> >> > Sangmoon
> >> >
> >> 
> >> So, the idea here is to register to these trace events from kernel
> >> module and use that for debugging? Just curious what debugging action
> >> module does on these traces, as they have limited information
> >> about the stall, compared to console stall warnings, which gives a
> >> much more detailed information about stall.
> >> 
> >> 
> >> Thanks
> >> Neeraj
> > 
> > Hi Neeraj,
> > 
> > Yes, a module can log the stall occurence using the trace, although
> > there is no detailed information. If the kernel panic occurs for some
> > reasons, the debugging report generated by the module can include that
> > RCU stall warning has occurred before.
> > 
> > In addition, it's just an idea now, when a trace event happens, the
> > module can store the console log including detailed information, or may
> > also obtain CPU/task information by parsing the console log.
> 
> Adding a new tracepoint is not just about what is extracted by this specific
> tracepoint, but rather how it can be analyzed when combined with all other relevant
> tracepoints.
> 
> For instance, if we have this added RCU stall warning added, here is how it can be
> used with the upcoming LTTng 2.13, which implements the "event notification" triggers
> feature:
> 
> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring buffer,
>    enabling the following tracepoints:
>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues, ...),
>    - this new RCU stall warning event.
> 
> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will sample
>    the kernel stack when the event is hit. This will provide information similar to
>    the stack trace gathered into the console log on OOPS.
> 
> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On this
>    trigger, actions can be associated, such as capturing a snapshot or waking up
>    an external user-space process to perform specific actions.
> 
> So you end up with a snapshot containing the sequence of events leading to the
> RCU stall warning, with a kernel stack trace of the context causing the stall
> warning to be emitted.
> 
> I would argue that this information is more complete than just the stack trace
> extracted through the console log.

I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
than a stack trace to the console.  Which is why I am concerned about the
proverbial camel's nose in the tent.  ;-)

So Sangmoon, what is it that you really need for this to be useful to you?

Or am I missing your point?  (Either Mathieu's or Sangmoon's.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 15:06                 ` Paul E. McKenney
@ 2021-02-18 15:50                   ` Mathieu Desnoyers
  2021-02-18 16:58                     ` Uladzislau Rezki
  2021-02-18 17:00                     ` Paul E. McKenney
  0 siblings, 2 replies; 24+ messages in thread
From: Mathieu Desnoyers @ 2021-02-18 15:50 UTC (permalink / raw)
  To: paulmck
  Cc: Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes, Google,
	Josh Triplett, rcu, rostedt

----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:

> On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
>> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
>> 
>> >> -----Original Message-----
>> >> From: neeraju=codeaurora.org@mg.codeaurora.org
>> >> <neeraju=codeaurora.org@mg.codeaurora.org>
>> >> Sent: Thursday, February 18, 2021 3:18 AM
>> >> 
>> >> Hi Sangmoon,
>> >> 
>> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> >> >> -----Original Message-----
>> >> >> From: Paul E. McKenney <paulmck@kernel.org>
>> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
>> >> >>
>> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>> >> >>> The event allows us to trace the RCU stall when
>> >> >>> sysctl_panic_on_rcu_stall is disabled.
>> >> >>>
>> >> >>> The first parameter is the name of RCU flavour like other trace
>> >> >>> events. The second one shows us which function detected stalls.
>> >> >>>
>> >> >>> The RCU stall is mainly caused by external factors such as interrupt
>> >> >>> handling or task scheduling or something else. Therefore, this event
>> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>> >> >>>
>> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
>> >> >>
>> >> >> The patch looks plausible, but I have to ask...  Why not instead just
>> >> >> get the existing information out of the console log?
>> >> >>
>> >> >> 							Thanx, Paul
>> >> >
>> >> > This can provide a trigger point for the RCU stall warning.
>> >> > If a module in the kernel wants to trace the stall for debugging purposes,
>> >> > there is a cost of continuing to parse the console log.
>> >> > This tracepoint is useful because it is hard to pay these costs
>> >> > especially on mobile devices.
>> >> >
>> >> > Thanks,
>> >> > Sangmoon
>> >> >
>> >> 
>> >> So, the idea here is to register to these trace events from kernel
>> >> module and use that for debugging? Just curious what debugging action
>> >> module does on these traces, as they have limited information
>> >> about the stall, compared to console stall warnings, which gives a
>> >> much more detailed information about stall.
>> >> 
>> >> 
>> >> Thanks
>> >> Neeraj
>> > 
>> > Hi Neeraj,
>> > 
>> > Yes, a module can log the stall occurence using the trace, although
>> > there is no detailed information. If the kernel panic occurs for some
>> > reasons, the debugging report generated by the module can include that
>> > RCU stall warning has occurred before.
>> > 
>> > In addition, it's just an idea now, when a trace event happens, the
>> > module can store the console log including detailed information, or may
>> > also obtain CPU/task information by parsing the console log.
>> 
>> Adding a new tracepoint is not just about what is extracted by this specific
>> tracepoint, but rather how it can be analyzed when combined with all other
>> relevant
>> tracepoints.
>> 
>> For instance, if we have this added RCU stall warning added, here is how it can
>> be
>> used with the upcoming LTTng 2.13, which implements the "event notification"
>> triggers
>> feature:
>> 
>> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
>> buffer,
>>    enabling the following tracepoints:
>>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
>>    ...),
>>    - this new RCU stall warning event.
>> 
>> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
>> sample
>>    the kernel stack when the event is hit. This will provide information similar to
>>    the stack trace gathered into the console log on OOPS.
>> 
>> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
>> this
>>    trigger, actions can be associated, such as capturing a snapshot or waking up
>>    an external user-space process to perform specific actions.
>> 
>> So you end up with a snapshot containing the sequence of events leading to the
>> RCU stall warning, with a kernel stack trace of the context causing the stall
>> warning to be emitted.
>> 
>> I would argue that this information is more complete than just the stack trace
>> extracted through the console log.
> 
> I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
> than a stack trace to the console.  Which is why I am concerned about the
> proverbial camel's nose in the tent.  ;-)
> 
> So Sangmoon, what is it that you really need for this to be useful to you?
> 
> Or am I missing your point?  (Either Mathieu's or Sangmoon's.)

Well there is a tracepoint to dump the console's content into the tracing buffers
as well, so technically this existing RCU stall warning information could be extracted
into a trace as well.

AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to hook on a specific
event to trigger trace capture, without requiring to parse the console log continuously,
and a way to know when the stall warning happens in time within the trace time-line.

That being said, there may be use-cases for extracting more details about the RCU stall
as tracepoint event fields to make it more convenient, but it does not appear to be
necessary considered that the console can be saved into trace buffers as well.

Thanks,

Mathieu

> 
> 							Thanx, Paul

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 15:50                   ` Mathieu Desnoyers
@ 2021-02-18 16:58                     ` Uladzislau Rezki
  2021-02-18 17:24                       ` Mathieu Desnoyers
  2021-02-18 17:00                     ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Uladzislau Rezki @ 2021-02-18 16:58 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes,
	Google, Josh Triplett, rcu, rostedt

On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
> 
> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
> >> 
> >> >> -----Original Message-----
> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
> >> >> Sent: Thursday, February 18, 2021 3:18 AM
> >> >> 
> >> >> Hi Sangmoon,
> >> >> 
> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >> >> >>
> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >> >> >>> The event allows us to trace the RCU stall when
> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
> >> >> >>>
> >> >> >>> The first parameter is the name of RCU flavour like other trace
> >> >> >>> events. The second one shows us which function detected stalls.
> >> >> >>>
> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
> >> >> >>> handling or task scheduling or something else. Therefore, this event
> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >> >> >>>
> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >> >> >>
> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> >> >> get the existing information out of the console log?
> >> >> >>
> >> >> >> 							Thanx, Paul
> >> >> >
> >> >> > This can provide a trigger point for the RCU stall warning.
> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
> >> >> > there is a cost of continuing to parse the console log.
> >> >> > This tracepoint is useful because it is hard to pay these costs
> >> >> > especially on mobile devices.
> >> >> >
> >> >> > Thanks,
> >> >> > Sangmoon
> >> >> >
> >> >> 
> >> >> So, the idea here is to register to these trace events from kernel
> >> >> module and use that for debugging? Just curious what debugging action
> >> >> module does on these traces, as they have limited information
> >> >> about the stall, compared to console stall warnings, which gives a
> >> >> much more detailed information about stall.
> >> >> 
> >> >> 
> >> >> Thanks
> >> >> Neeraj
> >> > 
> >> > Hi Neeraj,
> >> > 
> >> > Yes, a module can log the stall occurence using the trace, although
> >> > there is no detailed information. If the kernel panic occurs for some
> >> > reasons, the debugging report generated by the module can include that
> >> > RCU stall warning has occurred before.
> >> > 
> >> > In addition, it's just an idea now, when a trace event happens, the
> >> > module can store the console log including detailed information, or may
> >> > also obtain CPU/task information by parsing the console log.
> >> 
> >> Adding a new tracepoint is not just about what is extracted by this specific
> >> tracepoint, but rather how it can be analyzed when combined with all other
> >> relevant
> >> tracepoints.
> >> 
> >> For instance, if we have this added RCU stall warning added, here is how it can
> >> be
> >> used with the upcoming LTTng 2.13, which implements the "event notification"
> >> triggers
> >> feature:
> >> 
> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
> >> buffer,
> >>    enabling the following tracepoints:
> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
> >>    ...),
> >>    - this new RCU stall warning event.
> >> 
> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
> >> sample
> >>    the kernel stack when the event is hit. This will provide information similar to
> >>    the stack trace gathered into the console log on OOPS.
> >> 
> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
> >> this
> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
> >>    an external user-space process to perform specific actions.
> >> 
> >> So you end up with a snapshot containing the sequence of events leading to the
> >> RCU stall warning, with a kernel stack trace of the context causing the stall
> >> warning to be emitted.
> >> 
> >> I would argue that this information is more complete than just the stack trace
> >> extracted through the console log.
> > 
> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
> > than a stack trace to the console.  Which is why I am concerned about the
> > proverbial camel's nose in the tent.  ;-)
> > 
> > So Sangmoon, what is it that you really need for this to be useful to you?
> > 
> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
> 
> Well there is a tracepoint to dump the console's content into the tracing buffers
> as well, so technically this existing RCU stall warning information could be extracted
> into a trace as well.
> 
> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to hook on a specific
> event to trigger trace capture, without requiring to parse the console log continuously,
> and a way to know when the stall warning happens in time within the trace time-line.
> 
> That being said, there may be use-cases for extracting more details about the RCU stall
> as tracepoint event fields to make it more convenient, but it does not appear to be
> necessary considered that the console can be saved into trace buffers as well.
> 
Could you please clarify how the kernel ring-buffer can also be routed over trace buffer?
Probably i am not aware of if it is possible on latest kernel.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 15:50                   ` Mathieu Desnoyers
  2021-02-18 16:58                     ` Uladzislau Rezki
@ 2021-02-18 17:00                     ` Paul E. McKenney
  2021-02-18 17:56                       ` Mathieu Desnoyers
  2021-02-18 18:24                       ` Steven Rostedt
  1 sibling, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2021-02-18 17:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes, Google,
	Josh Triplett, rcu, rostedt

On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
> 
> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
> >> 
> >> >> -----Original Message-----
> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
> >> >> Sent: Thursday, February 18, 2021 3:18 AM
> >> >> 
> >> >> Hi Sangmoon,
> >> >> 
> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> >> >> -----Original Message-----
> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >> >> >>
> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >> >> >>> The event allows us to trace the RCU stall when
> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
> >> >> >>>
> >> >> >>> The first parameter is the name of RCU flavour like other trace
> >> >> >>> events. The second one shows us which function detected stalls.
> >> >> >>>
> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
> >> >> >>> handling or task scheduling or something else. Therefore, this event
> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >> >> >>>
> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >> >> >>
> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> >> >> get the existing information out of the console log?
> >> >> >>
> >> >> >> 							Thanx, Paul
> >> >> >
> >> >> > This can provide a trigger point for the RCU stall warning.
> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
> >> >> > there is a cost of continuing to parse the console log.
> >> >> > This tracepoint is useful because it is hard to pay these costs
> >> >> > especially on mobile devices.
> >> >> >
> >> >> > Thanks,
> >> >> > Sangmoon
> >> >> >
> >> >> 
> >> >> So, the idea here is to register to these trace events from kernel
> >> >> module and use that for debugging? Just curious what debugging action
> >> >> module does on these traces, as they have limited information
> >> >> about the stall, compared to console stall warnings, which gives a
> >> >> much more detailed information about stall.
> >> >> 
> >> >> 
> >> >> Thanks
> >> >> Neeraj
> >> > 
> >> > Hi Neeraj,
> >> > 
> >> > Yes, a module can log the stall occurence using the trace, although
> >> > there is no detailed information. If the kernel panic occurs for some
> >> > reasons, the debugging report generated by the module can include that
> >> > RCU stall warning has occurred before.
> >> > 
> >> > In addition, it's just an idea now, when a trace event happens, the
> >> > module can store the console log including detailed information, or may
> >> > also obtain CPU/task information by parsing the console log.
> >> 
> >> Adding a new tracepoint is not just about what is extracted by this specific
> >> tracepoint, but rather how it can be analyzed when combined with all other
> >> relevant
> >> tracepoints.
> >> 
> >> For instance, if we have this added RCU stall warning added, here is how it can
> >> be
> >> used with the upcoming LTTng 2.13, which implements the "event notification"
> >> triggers
> >> feature:
> >> 
> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
> >> buffer,
> >>    enabling the following tracepoints:
> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
> >>    ...),
> >>    - this new RCU stall warning event.
> >> 
> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
> >> sample
> >>    the kernel stack when the event is hit. This will provide information similar to
> >>    the stack trace gathered into the console log on OOPS.
> >> 
> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
> >> this
> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
> >>    an external user-space process to perform specific actions.
> >> 
> >> So you end up with a snapshot containing the sequence of events leading to the
> >> RCU stall warning, with a kernel stack trace of the context causing the stall
> >> warning to be emitted.
> >> 
> >> I would argue that this information is more complete than just the stack trace
> >> extracted through the console log.
> > 
> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
> > than a stack trace to the console.  Which is why I am concerned about the
> > proverbial camel's nose in the tent.  ;-)
> > 
> > So Sangmoon, what is it that you really need for this to be useful to you?
> > 
> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
> 
> Well there is a tracepoint to dump the console's content into the tracing buffers
> as well, so technically this existing RCU stall warning information could be extracted
> into a trace as well.
> 
> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to hook on a specific
> event to trigger trace capture, without requiring to parse the console log continuously,
> and a way to know when the stall warning happens in time within the trace time-line.
> 
> That being said, there may be use-cases for extracting more details about the RCU stall
> as tracepoint event fields to make it more convenient, but it does not appear to be
> necessary considered that the console can be saved into trace buffers as well.

OK, I will bite.  ;-)

Can the saving of console into trace buffers be easily enabled and
disabled within the code?  If so, does enabling and disabling nest?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 16:58                     ` Uladzislau Rezki
@ 2021-02-18 17:24                       ` Mathieu Desnoyers
  2021-02-18 18:43                         ` Uladzislau Rezki
  0 siblings, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2021-02-18 17:24 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: paulmck, Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes,
	Google, Josh Triplett, rcu, rostedt

----- On Feb 18, 2021, at 11:58 AM, Uladzislau Rezki urezki@gmail.com wrote:

> On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
>> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
>> 
>> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
>> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
>> >> 
>> >> >> -----Original Message-----
>> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
>> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
>> >> >> Sent: Thursday, February 18, 2021 3:18 AM
>> >> >> 
>> >> >> Hi Sangmoon,
>> >> >> 
>> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> >> >> >> -----Original Message-----
>> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
>> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
>> >> >> >>
>> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>> >> >> >>> The event allows us to trace the RCU stall when
>> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
>> >> >> >>>
>> >> >> >>> The first parameter is the name of RCU flavour like other trace
>> >> >> >>> events. The second one shows us which function detected stalls.
>> >> >> >>>
>> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
>> >> >> >>> handling or task scheduling or something else. Therefore, this event
>> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>> >> >> >>>
>> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
>> >> >> >>
>> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
>> >> >> >> get the existing information out of the console log?
>> >> >> >>
>> >> >> >> 							Thanx, Paul
>> >> >> >
>> >> >> > This can provide a trigger point for the RCU stall warning.
>> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
>> >> >> > there is a cost of continuing to parse the console log.
>> >> >> > This tracepoint is useful because it is hard to pay these costs
>> >> >> > especially on mobile devices.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Sangmoon
>> >> >> >
>> >> >> 
>> >> >> So, the idea here is to register to these trace events from kernel
>> >> >> module and use that for debugging? Just curious what debugging action
>> >> >> module does on these traces, as they have limited information
>> >> >> about the stall, compared to console stall warnings, which gives a
>> >> >> much more detailed information about stall.
>> >> >> 
>> >> >> 
>> >> >> Thanks
>> >> >> Neeraj
>> >> > 
>> >> > Hi Neeraj,
>> >> > 
>> >> > Yes, a module can log the stall occurence using the trace, although
>> >> > there is no detailed information. If the kernel panic occurs for some
>> >> > reasons, the debugging report generated by the module can include that
>> >> > RCU stall warning has occurred before.
>> >> > 
>> >> > In addition, it's just an idea now, when a trace event happens, the
>> >> > module can store the console log including detailed information, or may
>> >> > also obtain CPU/task information by parsing the console log.
>> >> 
>> >> Adding a new tracepoint is not just about what is extracted by this specific
>> >> tracepoint, but rather how it can be analyzed when combined with all other
>> >> relevant
>> >> tracepoints.
>> >> 
>> >> For instance, if we have this added RCU stall warning added, here is how it can
>> >> be
>> >> used with the upcoming LTTng 2.13, which implements the "event notification"
>> >> triggers
>> >> feature:
>> >> 
>> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
>> >> buffer,
>> >>    enabling the following tracepoints:
>> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
>> >>    ...),
>> >>    - this new RCU stall warning event.
>> >> 
>> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
>> >> sample
>> >>    the kernel stack when the event is hit. This will provide information similar to
>> >>    the stack trace gathered into the console log on OOPS.
>> >> 
>> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
>> >> this
>> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
>> >>    an external user-space process to perform specific actions.
>> >> 
>> >> So you end up with a snapshot containing the sequence of events leading to the
>> >> RCU stall warning, with a kernel stack trace of the context causing the stall
>> >> warning to be emitted.
>> >> 
>> >> I would argue that this information is more complete than just the stack trace
>> >> extracted through the console log.
>> > 
>> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
>> > than a stack trace to the console.  Which is why I am concerned about the
>> > proverbial camel's nose in the tent.  ;-)
>> > 
>> > So Sangmoon, what is it that you really need for this to be useful to you?
>> > 
>> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
>> 
>> Well there is a tracepoint to dump the console's content into the tracing
>> buffers
>> as well, so technically this existing RCU stall warning information could be
>> extracted
>> into a trace as well.
>> 
>> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to
>> hook on a specific
>> event to trigger trace capture, without requiring to parse the console log
>> continuously,
>> and a way to know when the stall warning happens in time within the trace
>> time-line.
>> 
>> That being said, there may be use-cases for extracting more details about the
>> RCU stall
>> as tracepoint event fields to make it more convenient, but it does not appear to
>> be
>> necessary considered that the console can be saved into trace buffers as well.
>> 
> Could you please clarify how the kernel ring-buffer can also be routed over
> trace buffer?
> Probably i am not aware of if it is possible on latest kernel.

This can be done by enabling the "printk console" tracepoint. See the call to
trace_console_rcuidle() in kernel/printk/printk.c. The tracepoint is defined in
include/trace/events/printk.h.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 17:00                     ` Paul E. McKenney
@ 2021-02-18 17:56                       ` Mathieu Desnoyers
  2021-02-18 18:13                         ` Paul E. McKenney
  2021-02-18 18:24                       ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Mathieu Desnoyers @ 2021-02-18 17:56 UTC (permalink / raw)
  To: paulmck
  Cc: Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes, Google,
	Josh Triplett, rcu, rostedt

----- On Feb 18, 2021, at 12:00 PM, paulmck paulmck@kernel.org wrote:

> On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
>> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
>> 
>> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
>> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
>> >> 
>> >> >> -----Original Message-----
>> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
>> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
>> >> >> Sent: Thursday, February 18, 2021 3:18 AM
>> >> >> 
>> >> >> Hi Sangmoon,
>> >> >> 
>> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
>> >> >> >> -----Original Message-----
>> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
>> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
>> >> >> >>
>> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
>> >> >> >>> The event allows us to trace the RCU stall when
>> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
>> >> >> >>>
>> >> >> >>> The first parameter is the name of RCU flavour like other trace
>> >> >> >>> events. The second one shows us which function detected stalls.
>> >> >> >>>
>> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
>> >> >> >>> handling or task scheduling or something else. Therefore, this event
>> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
>> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
>> >> >> >>>
>> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
>> >> >> >>
>> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
>> >> >> >> get the existing information out of the console log?
>> >> >> >>
>> >> >> >> 							Thanx, Paul
>> >> >> >
>> >> >> > This can provide a trigger point for the RCU stall warning.
>> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
>> >> >> > there is a cost of continuing to parse the console log.
>> >> >> > This tracepoint is useful because it is hard to pay these costs
>> >> >> > especially on mobile devices.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Sangmoon
>> >> >> >
>> >> >> 
>> >> >> So, the idea here is to register to these trace events from kernel
>> >> >> module and use that for debugging? Just curious what debugging action
>> >> >> module does on these traces, as they have limited information
>> >> >> about the stall, compared to console stall warnings, which gives a
>> >> >> much more detailed information about stall.
>> >> >> 
>> >> >> 
>> >> >> Thanks
>> >> >> Neeraj
>> >> > 
>> >> > Hi Neeraj,
>> >> > 
>> >> > Yes, a module can log the stall occurence using the trace, although
>> >> > there is no detailed information. If the kernel panic occurs for some
>> >> > reasons, the debugging report generated by the module can include that
>> >> > RCU stall warning has occurred before.
>> >> > 
>> >> > In addition, it's just an idea now, when a trace event happens, the
>> >> > module can store the console log including detailed information, or may
>> >> > also obtain CPU/task information by parsing the console log.
>> >> 
>> >> Adding a new tracepoint is not just about what is extracted by this specific
>> >> tracepoint, but rather how it can be analyzed when combined with all other
>> >> relevant
>> >> tracepoints.
>> >> 
>> >> For instance, if we have this added RCU stall warning added, here is how it can
>> >> be
>> >> used with the upcoming LTTng 2.13, which implements the "event notification"
>> >> triggers
>> >> feature:
>> >> 
>> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
>> >> buffer,
>> >>    enabling the following tracepoints:
>> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
>> >>    ...),
>> >>    - this new RCU stall warning event.
>> >> 
>> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
>> >> sample
>> >>    the kernel stack when the event is hit. This will provide information similar to
>> >>    the stack trace gathered into the console log on OOPS.
>> >> 
>> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
>> >> this
>> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
>> >>    an external user-space process to perform specific actions.
>> >> 
>> >> So you end up with a snapshot containing the sequence of events leading to the
>> >> RCU stall warning, with a kernel stack trace of the context causing the stall
>> >> warning to be emitted.
>> >> 
>> >> I would argue that this information is more complete than just the stack trace
>> >> extracted through the console log.
>> > 
>> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
>> > than a stack trace to the console.  Which is why I am concerned about the
>> > proverbial camel's nose in the tent.  ;-)
>> > 
>> > So Sangmoon, what is it that you really need for this to be useful to you?
>> > 
>> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
>> 
>> Well there is a tracepoint to dump the console's content into the tracing
>> buffers
>> as well, so technically this existing RCU stall warning information could be
>> extracted
>> into a trace as well.
>> 
>> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to
>> hook on a specific
>> event to trigger trace capture, without requiring to parse the console log
>> continuously,
>> and a way to know when the stall warning happens in time within the trace
>> time-line.
>> 
>> That being said, there may be use-cases for extracting more details about the
>> RCU stall
>> as tracepoint event fields to make it more convenient, but it does not appear to
>> be
>> necessary considered that the console can be saved into trace buffers as well.
> 
> OK, I will bite.  ;-)
> 
> Can the saving of console into trace buffers be easily enabled and
> disabled within the code? If so, does enabling and disabling nest?

OK I think I get your point: when this tracing facility is not in use, you
don't want to spam the console with those printouts. And when tracing is in
use, you would want that information collected.

Well one way do achieve this would be to use the TRACE_EVENT_FN() family of
tracepoint events for this new rcu stall warning event. It takes register/unregister
callback functions as parameter. See the sys_enter/sys_exit events as examples.
syscall_regfunc and syscall_unregfunc implement the reference counting you are
referring to.

The downside of this approach though in the context of extracting RCU CPU stalls
warnings through the console is that when tracing of those stalls is enabled, the
console gets spammed with those messages as a side-effect, which may be unwanted.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 17:56                       ` Mathieu Desnoyers
@ 2021-02-18 18:13                         ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2021-02-18 18:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Sangmoon Kim, neeraju, Lai Jiangshan, Joel Fernandes, Google,
	Josh Triplett, rcu, rostedt

On Thu, Feb 18, 2021 at 12:56:39PM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2021, at 12:00 PM, paulmck paulmck@kernel.org wrote:
> 
> > On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
> >> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
> >> 
> >> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
> >> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
> >> >> 
> >> >> >> -----Original Message-----
> >> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
> >> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
> >> >> >> Sent: Thursday, February 18, 2021 3:18 AM
> >> >> >> 
> >> >> >> Hi Sangmoon,
> >> >> >> 
> >> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >> >> >> >>
> >> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >> >> >> >>> The event allows us to trace the RCU stall when
> >> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
> >> >> >> >>>
> >> >> >> >>> The first parameter is the name of RCU flavour like other trace
> >> >> >> >>> events. The second one shows us which function detected stalls.
> >> >> >> >>>
> >> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
> >> >> >> >>> handling or task scheduling or something else. Therefore, this event
> >> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >> >> >> >>>
> >> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >> >> >> >>
> >> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> >> >> >> get the existing information out of the console log?
> >> >> >> >>
> >> >> >> >> 							Thanx, Paul
> >> >> >> >
> >> >> >> > This can provide a trigger point for the RCU stall warning.
> >> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
> >> >> >> > there is a cost of continuing to parse the console log.
> >> >> >> > This tracepoint is useful because it is hard to pay these costs
> >> >> >> > especially on mobile devices.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Sangmoon
> >> >> >> >
> >> >> >> 
> >> >> >> So, the idea here is to register to these trace events from kernel
> >> >> >> module and use that for debugging? Just curious what debugging action
> >> >> >> module does on these traces, as they have limited information
> >> >> >> about the stall, compared to console stall warnings, which gives a
> >> >> >> much more detailed information about stall.
> >> >> >> 
> >> >> >> 
> >> >> >> Thanks
> >> >> >> Neeraj
> >> >> > 
> >> >> > Hi Neeraj,
> >> >> > 
> >> >> > Yes, a module can log the stall occurence using the trace, although
> >> >> > there is no detailed information. If the kernel panic occurs for some
> >> >> > reasons, the debugging report generated by the module can include that
> >> >> > RCU stall warning has occurred before.
> >> >> > 
> >> >> > In addition, it's just an idea now, when a trace event happens, the
> >> >> > module can store the console log including detailed information, or may
> >> >> > also obtain CPU/task information by parsing the console log.
> >> >> 
> >> >> Adding a new tracepoint is not just about what is extracted by this specific
> >> >> tracepoint, but rather how it can be analyzed when combined with all other
> >> >> relevant
> >> >> tracepoints.
> >> >> 
> >> >> For instance, if we have this added RCU stall warning added, here is how it can
> >> >> be
> >> >> used with the upcoming LTTng 2.13, which implements the "event notification"
> >> >> triggers
> >> >> feature:
> >> >> 
> >> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
> >> >> buffer,
> >> >>    enabling the following tracepoints:
> >> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
> >> >>    ...),
> >> >>    - this new RCU stall warning event.
> >> >> 
> >> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
> >> >> sample
> >> >>    the kernel stack when the event is hit. This will provide information similar to
> >> >>    the stack trace gathered into the console log on OOPS.
> >> >> 
> >> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
> >> >> this
> >> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
> >> >>    an external user-space process to perform specific actions.
> >> >> 
> >> >> So you end up with a snapshot containing the sequence of events leading to the
> >> >> RCU stall warning, with a kernel stack trace of the context causing the stall
> >> >> warning to be emitted.
> >> >> 
> >> >> I would argue that this information is more complete than just the stack trace
> >> >> extracted through the console log.
> >> > 
> >> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
> >> > than a stack trace to the console.  Which is why I am concerned about the
> >> > proverbial camel's nose in the tent.  ;-)
> >> > 
> >> > So Sangmoon, what is it that you really need for this to be useful to you?
> >> > 
> >> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
> >> 
> >> Well there is a tracepoint to dump the console's content into the tracing
> >> buffers
> >> as well, so technically this existing RCU stall warning information could be
> >> extracted
> >> into a trace as well.
> >> 
> >> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to
> >> hook on a specific
> >> event to trigger trace capture, without requiring to parse the console log
> >> continuously,
> >> and a way to know when the stall warning happens in time within the trace
> >> time-line.
> >> 
> >> That being said, there may be use-cases for extracting more details about the
> >> RCU stall
> >> as tracepoint event fields to make it more convenient, but it does not appear to
> >> be
> >> necessary considered that the console can be saved into trace buffers as well.
> > 
> > OK, I will bite.  ;-)
> > 
> > Can the saving of console into trace buffers be easily enabled and
> > disabled within the code? If so, does enabling and disabling nest?
> 
> OK I think I get your point: when this tracing facility is not in use, you
> don't want to spam the console with those printouts. And when tracing is in
> use, you would want that information collected.

Actually, I was expecting that the console would continue dumping if it
was enabled.  In other words, I was expecting to duplicate the information
on the console and in the tracing facility.

But I am getting ahead of myself here.  I don't yet know what Sangmoon
is trying to accomplish, long term.

							Thanx, Paul

> Well one way do achieve this would be to use the TRACE_EVENT_FN() family of
> tracepoint events for this new rcu stall warning event. It takes register/unregister
> callback functions as parameter. See the sys_enter/sys_exit events as examples.
> syscall_regfunc and syscall_unregfunc implement the reference counting you are
> referring to.
> 
> The downside of this approach though in the context of extracting RCU CPU stalls
> warnings through the console is that when tracing of those stalls is enabled, the
> console gets spammed with those messages as a side-effect, which may be unwanted.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 17:00                     ` Paul E. McKenney
  2021-02-18 17:56                       ` Mathieu Desnoyers
@ 2021-02-18 18:24                       ` Steven Rostedt
  2021-02-18 18:59                         ` Paul E. McKenney
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2021-02-18 18:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mathieu Desnoyers, Sangmoon Kim, neeraju, Lai Jiangshan,
	Joel Fernandes, Google, Josh Triplett, rcu

On Thu, 18 Feb 2021 09:00:12 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> OK, I will bite.  ;-)

Do I turn into a wearwolf?

> 
> Can the saving of console into trace buffers be easily enabled and
> disabled within the code?

Well, we do have this for ftrace:

 echo 'enable_event:printk:console' > events/rcu/rcu_stall_warning/trigger

Would enable the printk console to be traced to the ring buffer once the
stall is triggered, and not before hand.

>  If so, does enabling and disabling nest?

It has a disable as well, but unfortunately its global and does not nest.
We could always implement a percpu trigger that can nest though.

-- Steve

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 17:24                       ` Mathieu Desnoyers
@ 2021-02-18 18:43                         ` Uladzislau Rezki
  0 siblings, 0 replies; 24+ messages in thread
From: Uladzislau Rezki @ 2021-02-18 18:43 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Uladzislau Rezki, paulmck, Sangmoon Kim, neeraju, Lai Jiangshan,
	Joel Fernandes, Google, Josh Triplett, rcu, rostedt

On Thu, Feb 18, 2021 at 12:24:14PM -0500, Mathieu Desnoyers wrote:
> ----- On Feb 18, 2021, at 11:58 AM, Uladzislau Rezki urezki@gmail.com wrote:
> 
> > On Thu, Feb 18, 2021 at 10:50:53AM -0500, Mathieu Desnoyers wrote:
> >> ----- On Feb 18, 2021, at 10:06 AM, paulmck paulmck@kernel.org wrote:
> >> 
> >> > On Thu, Feb 18, 2021 at 09:17:01AM -0500, Mathieu Desnoyers wrote:
> >> >> ----- On Feb 18, 2021, at 7:52 AM, Sangmoon Kim sangmoon.kim@samsung.com wrote:
> >> >> 
> >> >> >> -----Original Message-----
> >> >> >> From: neeraju=codeaurora.org@mg.codeaurora.org
> >> >> >> <neeraju=codeaurora.org@mg.codeaurora.org>
> >> >> >> Sent: Thursday, February 18, 2021 3:18 AM
> >> >> >> 
> >> >> >> Hi Sangmoon,
> >> >> >> 
> >> >> >> On 2/17/2021 7:19 PM, Sangmoon Kim wrote:
> >> >> >> >> -----Original Message-----
> >> >> >> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> >> >> >> Sent: Wednesday, February 17, 2021 2:50 AM
> >> >> >> >>
> >> >> >> >> On Mon, Feb 15, 2021 at 05:53:25PM +0900, Sangmoon Kim wrote:
> >> >> >> >>> The event allows us to trace the RCU stall when
> >> >> >> >>> sysctl_panic_on_rcu_stall is disabled.
> >> >> >> >>>
> >> >> >> >>> The first parameter is the name of RCU flavour like other trace
> >> >> >> >>> events. The second one shows us which function detected stalls.
> >> >> >> >>>
> >> >> >> >>> The RCU stall is mainly caused by external factors such as interrupt
> >> >> >> >>> handling or task scheduling or something else. Therefore, this event
> >> >> >> >>> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> >> >> >> >>> in the RCU stall can use it without CONFIG_RCU_TRACE.
> >> >> >> >>>
> >> >> >> >>> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> >> >> >> >>
> >> >> >> >> The patch looks plausible, but I have to ask...  Why not instead just
> >> >> >> >> get the existing information out of the console log?
> >> >> >> >>
> >> >> >> >> 							Thanx, Paul
> >> >> >> >
> >> >> >> > This can provide a trigger point for the RCU stall warning.
> >> >> >> > If a module in the kernel wants to trace the stall for debugging purposes,
> >> >> >> > there is a cost of continuing to parse the console log.
> >> >> >> > This tracepoint is useful because it is hard to pay these costs
> >> >> >> > especially on mobile devices.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Sangmoon
> >> >> >> >
> >> >> >> 
> >> >> >> So, the idea here is to register to these trace events from kernel
> >> >> >> module and use that for debugging? Just curious what debugging action
> >> >> >> module does on these traces, as they have limited information
> >> >> >> about the stall, compared to console stall warnings, which gives a
> >> >> >> much more detailed information about stall.
> >> >> >> 
> >> >> >> 
> >> >> >> Thanks
> >> >> >> Neeraj
> >> >> > 
> >> >> > Hi Neeraj,
> >> >> > 
> >> >> > Yes, a module can log the stall occurence using the trace, although
> >> >> > there is no detailed information. If the kernel panic occurs for some
> >> >> > reasons, the debugging report generated by the module can include that
> >> >> > RCU stall warning has occurred before.
> >> >> > 
> >> >> > In addition, it's just an idea now, when a trace event happens, the
> >> >> > module can store the console log including detailed information, or may
> >> >> > also obtain CPU/task information by parsing the console log.
> >> >> 
> >> >> Adding a new tracepoint is not just about what is extracted by this specific
> >> >> tracepoint, but rather how it can be analyzed when combined with all other
> >> >> relevant
> >> >> tracepoints.
> >> >> 
> >> >> For instance, if we have this added RCU stall warning added, here is how it can
> >> >> be
> >> >> used with the upcoming LTTng 2.13, which implements the "event notification"
> >> >> triggers
> >> >> feature:
> >> >> 
> >> >> 1) Setup "flight recorder" (snapshot) tracing to trace into a circular ring
> >> >> buffer,
> >> >>    enabling the following tracepoints:
> >> >>    - kernel activity (meaning all other RCU event, scheduling, irq, workqueues,
> >> >>    ...),
> >> >>    - this new RCU stall warning event.
> >> >> 
> >> >> 2) Add a "callstack-kernel" context to the RCU stall warning event. This will
> >> >> sample
> >> >>    the kernel stack when the event is hit. This will provide information similar to
> >> >>    the stack trace gathered into the console log on OOPS.
> >> >> 
> >> >> 3) Enable a trigger waiting on the RCU stall warning tracepoint to be hit. On
> >> >> this
> >> >>    trigger, actions can be associated, such as capturing a snapshot or waking up
> >> >>    an external user-space process to perform specific actions.
> >> >> 
> >> >> So you end up with a snapshot containing the sequence of events leading to the
> >> >> RCU stall warning, with a kernel stack trace of the context causing the stall
> >> >> warning to be emitted.
> >> >> 
> >> >> I would argue that this information is more complete than just the stack trace
> >> >> extracted through the console log.
> >> > 
> >> > I am not so sure about that.  RCU CPU stall warnings dump quite a bit more
> >> > than a stack trace to the console.  Which is why I am concerned about the
> >> > proverbial camel's nose in the tent.  ;-)
> >> > 
> >> > So Sangmoon, what is it that you really need for this to be useful to you?
> >> > 
> >> > Or am I missing your point?  (Either Mathieu's or Sangmoon's.)
> >> 
> >> Well there is a tracepoint to dump the console's content into the tracing
> >> buffers
> >> as well, so technically this existing RCU stall warning information could be
> >> extracted
> >> into a trace as well.
> >> 
> >> AFAIU, what the new "rcu stall warning" tracepoint provides is an easy way to
> >> hook on a specific
> >> event to trigger trace capture, without requiring to parse the console log
> >> continuously,
> >> and a way to know when the stall warning happens in time within the trace
> >> time-line.
> >> 
> >> That being said, there may be use-cases for extracting more details about the
> >> RCU stall
> >> as tracepoint event fields to make it more convenient, but it does not appear to
> >> be
> >> necessary considered that the console can be saved into trace buffers as well.
> >> 
> > Could you please clarify how the kernel ring-buffer can also be routed over
> > trace buffer?
> > Probably i am not aware of if it is possible on latest kernel.
> 
> This can be done by enabling the "printk console" tracepoint. See the call to
> trace_console_rcuidle() in kernel/printk/printk.c. The tracepoint is defined in
> include/trace/events/printk.h.
> 
OK, i see. At least "error" messages get routed:

<snip>
root@seldlx26095:/home/CORPUSERS/23060734# cat /sys/kernel/debug/tracing/trace_pipe 
     kworker/4:0-9693    [004] d..1 627080.409625: console: [627080.402633] cp210x ttyUSB0: failed set request 0x7 status: -19
     kworker/4:0-9693    [004] d..1 627080.409633: console: [627080.402642] cp210x ttyUSB0: failed set request 0x12 status: -19
     kworker/4:0-9693    [004] d..1 627080.409639: console: [627080.402647] cp210x ttyUSB0: failed set request 0x0 status: -19
<snip>

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-18 18:24                       ` Steven Rostedt
@ 2021-02-18 18:59                         ` Paul E. McKenney
       [not found]                           ` <CGME20210222083416epcas1p2c4d51fc7a793729c08e424f13077954a@epcas1p2.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2021-02-18 18:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Sangmoon Kim, neeraju, Lai Jiangshan,
	Joel Fernandes, Google, Josh Triplett, rcu

On Thu, Feb 18, 2021 at 01:24:16PM -0500, Steven Rostedt wrote:
> On Thu, 18 Feb 2021 09:00:12 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > OK, I will bite.  ;-)
> 
> Do I turn into a wearwolf?

Maybe even a wash-n-wear wolf.  ;-)

> > Can the saving of console into trace buffers be easily enabled and
> > disabled within the code?
> 
> Well, we do have this for ftrace:
> 
>  echo 'enable_event:printk:console' > events/rcu/rcu_stall_warning/trigger
> 
> Would enable the printk console to be traced to the ring buffer once the
> stall is triggered, and not before hand.
> 
> >  If so, does enabling and disabling nest?
> 
> It has a disable as well, but unfortunately its global and does not nest.
> We could always implement a percpu trigger that can nest though.

So Sangmoon could have a module react to an RCU CPU stall warning
tracepoint by turning on tracing of console messages.  If there was a
start-stop pair of RCU CPU stall warning tracepoints, he could turn it
off as well.

Not sure if that is what Sangmoon is looking for, but it is a possibility.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
       [not found]                           ` <CGME20210222083416epcas1p2c4d51fc7a793729c08e424f13077954a@epcas1p2.samsung.com>
@ 2021-02-22  8:25                             ` Sangmoon Kim
  2021-02-24  9:18                               ` Neeraj Upadhyay
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-22  8:25 UTC (permalink / raw)
  To: paulmck
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, neeraju, rcu,
	rostedt, sangmoon.kim



> -----Original Message-----
> From: Paul E. McKenney <paulmck@kernel.org>
> Sent: Friday, February 19, 2021 3:59 AM
> 
> On Thu, Feb 18, 2021 at 01:24:16PM -0500, Steven Rostedt wrote:
> > On Thu, 18 Feb 2021 09:00:12 -0800
> > "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >
> > > OK, I will bite.  ;-)
> >
> > Do I turn into a wearwolf?
> 
> Maybe even a wash-n-wear wolf.  ;-)
> 
> > > Can the saving of console into trace buffers be easily enabled and
> > > disabled within the code?
> >
> > Well, we do have this for ftrace:
> >
> >  echo 'enable_event:printk:console' > events/rcu/rcu_stall_warning/trigger
> >
> > Would enable the printk console to be traced to the ring buffer once the
> > stall is triggered, and not before hand.
> >
> > >  If so, does enabling and disabling nest?
> >
> > It has a disable as well, but unfortunately its global and does not nest.
> > We could always implement a percpu trigger that can nest though.
> 
> So Sangmoon could have a module react to an RCU CPU stall warning
> tracepoint by turning on tracing of console messages.  If there was a
> start-stop pair of RCU CPU stall warning tracepoints, he could turn it
> off as well.
> 
> Not sure if that is what Sangmoon is looking for, but it is a possibility.
> 
> 							Thanx, Paul

Thank you. I'm learning the new things through this review.

Copying console messages is not really necessary to me. I wanted to say
this tracepoint can also be used in such way for debugging purpose.

What I need is that a module records the stall happening and report the
stall event when kernel panic occurred. Of course, there is no detailed
information there, but it helps developers to recognize that the stall
has occurred in the system.

The usage scenario is as follows.
There is a kernel module for debugging.
The module registers a probe function using register_trace_rcu_stall_warning().
When a stall warning event occurs, the module records the event into its own buffer.
When a kernel panic occurs, the developers can see the report generated
by the module and can also see the buffer of the module directly through Ramdump.

Thanks,
Sangmoon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-22  8:25                             ` Sangmoon Kim
@ 2021-02-24  9:18                               ` Neeraj Upadhyay
       [not found]                                 ` <CGME20210224152605epcas1p2170c2d6c0567c47e72559faa3148ddee@epcas1p2.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Neeraj Upadhyay @ 2021-02-24  9:18 UTC (permalink / raw)
  To: Sangmoon Kim, paulmck
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, rcu, rostedt

Hi Sangmoon,


On 2/22/2021 1:55 PM, Sangmoon Kim wrote:
> 
> 
>> -----Original Message-----
>> From: Paul E. McKenney <paulmck@kernel.org>
>> Sent: Friday, February 19, 2021 3:59 AM
>>
>> On Thu, Feb 18, 2021 at 01:24:16PM -0500, Steven Rostedt wrote:
>>> On Thu, 18 Feb 2021 09:00:12 -0800
>>> "Paul E. McKenney" <paulmck@kernel.org> wrote:
>>>
>>>> OK, I will bite.  ;-)
>>>
>>> Do I turn into a wearwolf?
>>
>> Maybe even a wash-n-wear wolf.  ;-)
>>
>>>> Can the saving of console into trace buffers be easily enabled and
>>>> disabled within the code?
>>>
>>> Well, we do have this for ftrace:
>>>
>>>   echo 'enable_event:printk:console' > events/rcu/rcu_stall_warning/trigger
>>>
>>> Would enable the printk console to be traced to the ring buffer once the
>>> stall is triggered, and not before hand.
>>>
>>>>   If so, does enabling and disabling nest?
>>>
>>> It has a disable as well, but unfortunately its global and does not nest.
>>> We could always implement a percpu trigger that can nest though.
>>
>> So Sangmoon could have a module react to an RCU CPU stall warning
>> tracepoint by turning on tracing of console messages.  If there was a
>> start-stop pair of RCU CPU stall warning tracepoints, he could turn it
>> off as well.
>>
>> Not sure if that is what Sangmoon is looking for, but it is a possibility.
>>
>> 							Thanx, Paul
> 
> Thank you. I'm learning the new things through this review.
> 
> Copying console messages is not really necessary to me. I wanted to say
> this tracepoint can also be used in such way for debugging purpose.
> 

In case we want to use this trace event, to capture RCU stall 
information, which comes on console, into trace buffers; why not have 
these events at the start of the functions like print_other_cpu_stall(), 
print_cpu_stall() rather than  at the end ?


Thanks
Neeraj

> What I need is that a module records the stall happening and report the
> stall event when kernel panic occurred. Of course, there is no detailed
> information there, but it helps developers to recognize that the stall
> has occurred in the system.
> 
> The usage scenario is as follows.
> There is a kernel module for debugging.
> The module registers a probe function using register_trace_rcu_stall_warning().
> When a stall warning event occurs, the module records the event into its own buffer.
> When a kernel panic occurs, the developers can see the report generated
> by the module and can also see the buffer of the module directly through Ramdump.
> 
> Thanks,
> Sangmoon
> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] rcu/tree: Add a trace event for RCU stall warnings
       [not found]                                 ` <CGME20210224152605epcas1p2170c2d6c0567c47e72559faa3148ddee@epcas1p2.samsung.com>
@ 2021-02-24 15:17                                   ` Sangmoon Kim
       [not found]                                     ` <CGME20210225041622epcas1p159a1f427882a90a4ca815b31114c52b6@epcas1p1.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-24 15:17 UTC (permalink / raw)
  To: neeraju
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, paulmck, rcu,
	rostedt, sangmoon.kim


> -----Original Message-----
> From: neeraju=codeaurora.org@mg.codeaurora.org <neeraju=codeaurora.org@mg.codeaurora.org>
> Neeraj Upadhyay
> Sent: Wednesday, February 24, 2021 6:19 PM
> 
> Hi Sangmoon,
> 
> 
> On 2/22/2021 1:55 PM, Sangmoon Kim wrote:
> >
> >
> >> -----Original Message-----
> >> From: Paul E. McKenney <paulmck@kernel.org>
> >> Sent: Friday, February 19, 2021 3:59 AM
> >>
> >> On Thu, Feb 18, 2021 at 01:24:16PM -0500, Steven Rostedt wrote:
> >>> On Thu, 18 Feb 2021 09:00:12 -0800
> >>> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> >>>
> >>>> OK, I will bite.  ;-)
> >>>
> >>> Do I turn into a wearwolf?
> >>
> >> Maybe even a wash-n-wear wolf.  ;-)
> >>
> >>>> Can the saving of console into trace buffers be easily enabled and
> >>>> disabled within the code?
> >>>
> >>> Well, we do have this for ftrace:
> >>>
> >>>   echo 'enable_event:printk:console' > events/rcu/rcu_stall_warning/trigger
> >>>
> >>> Would enable the printk console to be traced to the ring buffer once the
> >>> stall is triggered, and not before hand.
> >>>
> >>>>   If so, does enabling and disabling nest?
> >>>
> >>> It has a disable as well, but unfortunately its global and does not nest.
> >>> We could always implement a percpu trigger that can nest though.
> >>
> >> So Sangmoon could have a module react to an RCU CPU stall warning
> >> tracepoint by turning on tracing of console messages.  If there was a
> >> start-stop pair of RCU CPU stall warning tracepoints, he could turn it
> >> off as well.
> >>
> >> Not sure if that is what Sangmoon is looking for, but it is a possibility.
> >>
> >> 							Thanx, Paul
> >
> > Thank you. I'm learning the new things through this review.
> >
> > Copying console messages is not really necessary to me. I wanted to say
> > this tracepoint can also be used in such way for debugging purpose.
> >
> 
> In case we want to use this trace event, to capture RCU stall
> information, which comes on console, into trace buffers; why not have
> these events at the start of the functions like print_other_cpu_stall(),
> print_cpu_stall() rather than  at the end ?

I agree. It is good to be located before printing console message. I'll
revise the patch. Or, should I make a patch with start-end pair?

Thanks,
Sangmoon

> 
> 
> Thanks
> Neeraj
> 
> > What I need is that a module records the stall happening and report the
> > stall event when kernel panic occurred. Of course, there is no detailed
> > information there, but it helps developers to recognize that the stall
> > has occurred in the system.
> >
> > The usage scenario is as follows.
> > There is a kernel module for debugging.
> > The module registers a probe function using register_trace_rcu_stall_warning().
> > When a stall warning event occurs, the module records the event into its own buffer.
> > When a kernel panic occurs, the developers can see the report generated
> > by the module and can also see the buffer of the module directly through Ramdump.
> >
> > Thanks,
> > Sangmoon
> >
> 
> --
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] rcu/tree: Add a trace event for RCU stall warnings
       [not found]                                     ` <CGME20210225041622epcas1p159a1f427882a90a4ca815b31114c52b6@epcas1p1.samsung.com>
@ 2021-02-25  4:06                                       ` Sangmoon Kim
  2021-02-28 19:47                                         ` Uladzislau Rezki
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-02-25  4:06 UTC (permalink / raw)
  To: neeraju
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, paulmck, rcu,
	rostedt, sangmoon.kim

The event allows us to trace the RCU stall when
sysctl_panic_on_rcu_stall is disabled.

The first parameter is the name of RCU flavour like other trace
events. The second one shows us which function detected stalls.

The RCU stall is mainly caused by external factors such as interrupt
handling or task scheduling or something else. Therefore, this event
uses TRACE_EVENT macro, not dedicated one, so that someone interested
in the RCU stall can use it without CONFIG_RCU_TRACE.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
---
Changes in v2:
 - Move the tracepoints before printing out stall messages.

 include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
 kernel/rcu/tree_exp.h      |  1 +
 kernel/rcu/tree_stall.h    |  2 ++
 3 files changed, 31 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 155b5cb43cfd..8476f3161bd0 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
 		  __entry->cpu, __entry->qsevent)
 );
 
+/*
+ * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
+ * and a string identifying which function detected the RCU stall as follows:
+ *
+ *	"StallDetected": Scheduler-tick detects other CPU's stalls.
+ *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
+ *	"ExpeditedStall": Expedited grace period detects stalls.
+ */
+TRACE_EVENT(rcu_stall_warning,
+
+	TP_PROTO(const char *rcuname, const char *msg),
+
+	TP_ARGS(rcuname, msg),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(const char *, msg)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->msg = msg;
+	),
+
+	TP_printk("%s %s",
+		  __entry->rcuname, __entry->msg)
+);
+
 #endif /* #if defined(CONFIG_TREE_RCU) */
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 8760b6ead770..6e1ac5b41f69 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void)
 		if (rcu_stall_is_suppressed())
 			continue;
 		panic_on_rcu_stall();
+		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
 		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
 		       rcu_state.name);
 		ndetected = 0;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 70d48c52fabc..ed50056a3c7b 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -488,6 +488,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
 	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
 	rcu_for_each_leaf_node(rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -554,6 +555,7 @@ static void print_cpu_stall(unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
 	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
 	print_cpu_stall_info(smp_processor_id());
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] rcu/tree: Add a trace event for RCU stall warnings
  2021-02-25  4:06                                       ` [PATCH v2] " Sangmoon Kim
@ 2021-02-28 19:47                                         ` Uladzislau Rezki
       [not found]                                           ` <CGME20210302120418epcas1p17e7f55d9c380ae864d2a6afab5fd3b1b@epcas1p1.samsung.com>
  0 siblings, 1 reply; 24+ messages in thread
From: Uladzislau Rezki @ 2021-02-28 19:47 UTC (permalink / raw)
  To: Sangmoon Kim, neeraju, paulmck
  Cc: neeraju, jiangshanlai, joel, josh, mathieu.desnoyers, paulmck,
	rcu, rostedt

> The event allows us to trace the RCU stall when
> sysctl_panic_on_rcu_stall is disabled.
> 
> The first parameter is the name of RCU flavour like other trace
> events. The second one shows us which function detected stalls.
> 
> The RCU stall is mainly caused by external factors such as interrupt
> handling or task scheduling or something else. Therefore, this event
> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> in the RCU stall can use it without CONFIG_RCU_TRACE.
> 
> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> ---
> Changes in v2:
>  - Move the tracepoints before printing out stall messages.
> 
>  include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
>  kernel/rcu/tree_exp.h      |  1 +
>  kernel/rcu/tree_stall.h    |  2 ++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 155b5cb43cfd..8476f3161bd0 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
>  		  __entry->cpu, __entry->qsevent)
>  );
>  
> +/*
> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> + * and a string identifying which function detected the RCU stall as follows:
> + *
> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> + *	"ExpeditedStall": Expedited grace period detects stalls.
> + */
> +TRACE_EVENT(rcu_stall_warning,
> +
> +	TP_PROTO(const char *rcuname, const char *msg),
> +
> +	TP_ARGS(rcuname, msg),
> +
> +	TP_STRUCT__entry(
> +		__field(const char *, rcuname)
> +		__field(const char *, msg)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->rcuname = rcuname;
> +		__entry->msg = msg;
> +	),
> +
> +	TP_printk("%s %s",
> +		  __entry->rcuname, __entry->msg)
> +);
> +
>  #endif /* #if defined(CONFIG_TREE_RCU) */
>  
>  /*
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 8760b6ead770..6e1ac5b41f69 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void)
>  		if (rcu_stall_is_suppressed())
>  			continue;
>  		panic_on_rcu_stall();
> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
>  		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
>  		       rcu_state.name);
>  		ndetected = 0;
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 70d48c52fabc..ed50056a3c7b 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -488,6 +488,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug
>  	 * RCU CPU stall warnings.
>  	 */
> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
>  	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
>  	rcu_for_each_leaf_node(rnp) {
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> @@ -554,6 +555,7 @@ static void print_cpu_stall(unsigned long gps)
>  	 * See Documentation/RCU/stallwarn.rst for info on how to debug
>  	 * RCU CPU stall warnings.
>  	 */
> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
>  	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
>  	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
>  	print_cpu_stall_info(smp_processor_id());
> -- 
> 2.17.1
> 
One small nit, the patch is not cleanelly applied on latest dev
so it should be rebased unless Paul fixes it on his own :)

Anyway to me it looks OK.

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] rcu/tree: Add a trace event for RCU stall warnings
       [not found]                                           ` <CGME20210302120418epcas1p17e7f55d9c380ae864d2a6afab5fd3b1b@epcas1p1.samsung.com>
@ 2021-03-02 11:55                                             ` Sangmoon Kim
  2021-03-02 17:04                                               ` Neeraj Upadhyay
  0 siblings, 1 reply; 24+ messages in thread
From: Sangmoon Kim @ 2021-03-02 11:55 UTC (permalink / raw)
  To: urezki
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, neeraju, paulmck,
	rcu, rostedt, sangmoon.kim

The event allows us to trace the RCU stall when
sysctl_panic_on_rcu_stall is disabled.

The first parameter is the name of RCU flavour like other trace
events. The second one shows us which function detected stalls.

The RCU stall is mainly caused by external factors such as interrupt
handling or task scheduling or something else. Therefore, this event
uses TRACE_EVENT macro, not dedicated one, so that someone interested
in the RCU stall can use it without CONFIG_RCU_TRACE.

Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
Thank you for reviewing this.

Changes in v2:
 - Move the tracepoints before printing out stall messages.

Changes in v3:
 - Rebase on dev in linux-rcu.git

 include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
 kernel/rcu/tree_exp.h      |  1 +
 kernel/rcu/tree_stall.h    |  2 ++
 3 files changed, 31 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5fc29400e1a2..c7711e9b6900 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
 		  __entry->cpu, __entry->qsevent)
 );
 
+/*
+ * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
+ * and a string identifying which function detected the RCU stall as follows:
+ *
+ *	"StallDetected": Scheduler-tick detects other CPU's stalls.
+ *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
+ *	"ExpeditedStall": Expedited grace period detects stalls.
+ */
+TRACE_EVENT(rcu_stall_warning,
+
+	TP_PROTO(const char *rcuname, const char *msg),
+
+	TP_ARGS(rcuname, msg),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(const char *, msg)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->msg = msg;
+	),
+
+	TP_printk("%s %s",
+		  __entry->rcuname, __entry->msg)
+);
+
 #endif /* #if defined(CONFIG_TREE_RCU) */
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6c6ff06d4ae6..2796084ef85a 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void)
 		if (rcu_stall_is_suppressed())
 			continue;
 		panic_on_rcu_stall();
+		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
 		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
 		       rcu_state.name);
 		ndetected = 0;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 475b26171b20..59b95cc5cbdf 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -536,6 +536,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
 	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
 	rcu_for_each_leaf_node(rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -606,6 +607,7 @@ static void print_cpu_stall(unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
 	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
 	print_cpu_stall_info(smp_processor_id());
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] rcu/tree: Add a trace event for RCU stall warnings
  2021-03-02 11:55                                             ` [PATCH v3] " Sangmoon Kim
@ 2021-03-02 17:04                                               ` Neeraj Upadhyay
  2021-03-02 19:20                                                 ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Neeraj Upadhyay @ 2021-03-02 17:04 UTC (permalink / raw)
  To: Sangmoon Kim, urezki
  Cc: jiangshanlai, joel, josh, mathieu.desnoyers, paulmck, rcu, rostedt



On 3/2/2021 5:25 PM, Sangmoon Kim wrote:
> The event allows us to trace the RCU stall when
> sysctl_panic_on_rcu_stall is disabled.
> 
> The first parameter is the name of RCU flavour like other trace
> events. The second one shows us which function detected stalls.
> 
> The RCU stall is mainly caused by external factors such as interrupt
> handling or task scheduling or something else. Therefore, this event
> uses TRACE_EVENT macro, not dedicated one, so that someone interested
> in the RCU stall can use it without CONFIG_RCU_TRACE.
> 
> Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> Thank you for reviewing this.
> 
> Changes in v2:
>   - Move the tracepoints before printing out stall messages.
> 
> Changes in v3:
>   - Rebase on dev in linux-rcu.git
> 
>   include/trace/events/rcu.h | 28 ++++++++++++++++++++++++++++
>   kernel/rcu/tree_exp.h      |  1 +
>   kernel/rcu/tree_stall.h    |  2 ++
>   3 files changed, 31 insertions(+)
> 
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index 5fc29400e1a2..c7711e9b6900 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
>   		  __entry->cpu, __entry->qsevent)
>   );
>   
> +/*
> + * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
> + * and a string identifying which function detected the RCU stall as follows:
> + *
> + *	"StallDetected": Scheduler-tick detects other CPU's stalls.
> + *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
> + *	"ExpeditedStall": Expedited grace period detects stalls.
> + */
> +TRACE_EVENT(rcu_stall_warning,
> +
> +	TP_PROTO(const char *rcuname, const char *msg),
> +
> +	TP_ARGS(rcuname, msg),
> +
> +	TP_STRUCT__entry(
> +		__field(const char *, rcuname)
> +		__field(const char *, msg)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->rcuname = rcuname;
> +		__entry->msg = msg;
> +	),
> +
> +	TP_printk("%s %s",
> +		  __entry->rcuname, __entry->msg)
> +);
> +
>   #endif /* #if defined(CONFIG_TREE_RCU) */
>   
>   /*
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 6c6ff06d4ae6..2796084ef85a 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void)
>   		if (rcu_stall_is_suppressed())
>   			continue;
>   		panic_on_rcu_stall();
> +		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
>   		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
>   		       rcu_state.name);
>   		ndetected = 0;
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 475b26171b20..59b95cc5cbdf 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -536,6 +536,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>   	 * See Documentation/RCU/stallwarn.rst for info on how to debug
>   	 * RCU CPU stall warnings.
>   	 */
> +	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
>   	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
>   	rcu_for_each_leaf_node(rnp) {
>   		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> @@ -606,6 +607,7 @@ static void print_cpu_stall(unsigned long gps)
>   	 * See Documentation/RCU/stallwarn.rst for info on how to debug
>   	 * RCU CPU stall warnings.
>   	 */
> +	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
>   	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
>   	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
>   	print_cpu_stall_info(smp_processor_id());
> 

Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3] rcu/tree: Add a trace event for RCU stall warnings
  2021-03-02 17:04                                               ` Neeraj Upadhyay
@ 2021-03-02 19:20                                                 ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2021-03-02 19:20 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: Sangmoon Kim, urezki, jiangshanlai, joel, josh,
	mathieu.desnoyers, rcu, rostedt

On Tue, Mar 02, 2021 at 10:34:26PM +0530, Neeraj Upadhyay wrote:
> On 3/2/2021 5:25 PM, Sangmoon Kim wrote:
> > The event allows us to trace the RCU stall when
> > sysctl_panic_on_rcu_stall is disabled.
> > 
> > The first parameter is the name of RCU flavour like other trace
> > events. The second one shows us which function detected stalls.
> > 
> > The RCU stall is mainly caused by external factors such as interrupt
> > handling or task scheduling or something else. Therefore, this event
> > uses TRACE_EVENT macro, not dedicated one, so that someone interested
> > in the RCU stall can use it without CONFIG_RCU_TRACE.
> > 
> > Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
> > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

[ . . . ]

> Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>

Thank you all!  As usual, I wordsmithed the commit log as shown below.
Please let me know if I messed anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit 4ee0eb7c0cbccaae8e5e3681d852d4e7f50c4378
Author: Sangmoon Kim <sangmoon.kim@samsung.com>
Date:   Tue Mar 2 20:55:15 2021 +0900

    rcu/tree: Add a trace event for RCU CPU stall warnings
    
    This commit adds a trace event which allows tracing the beginnings of RCU
    CPU stall warnings on systems where sysctl_panic_on_rcu_stall is disabled.
    
    The first parameter is the name of RCU flavor like other trace events.
    The second parameter indicates whether this is a stall of an expedited
    grace period, a self-detected stall of a normal grace period, or a stall
    of a normal grace period detected by some CPU other than the one that
    is stalled.
    
    RCU CPU stall warnings are often caused by external-to-RCU issues,
    for example, in interrupt handling or task scheduling.  Therefore,
    this event uses TRACE_EVENT, not TRACE_EVENT_RCU, to avoid requiring
    those interested in tracing RCU CPU stalls to rebuild their kernels
    with CONFIG_RCU_TRACE=y.
    
    Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
    Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 5fc2940..c7711e9 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -432,6 +432,34 @@ TRACE_EVENT_RCU(rcu_fqs,
 		  __entry->cpu, __entry->qsevent)
 );
 
+/*
+ * Tracepoint for RCU stall events. Takes a string identifying the RCU flavor
+ * and a string identifying which function detected the RCU stall as follows:
+ *
+ *	"StallDetected": Scheduler-tick detects other CPU's stalls.
+ *	"SelfDetected": Scheduler-tick detects a current CPU's stall.
+ *	"ExpeditedStall": Expedited grace period detects stalls.
+ */
+TRACE_EVENT(rcu_stall_warning,
+
+	TP_PROTO(const char *rcuname, const char *msg),
+
+	TP_ARGS(rcuname, msg),
+
+	TP_STRUCT__entry(
+		__field(const char *, rcuname)
+		__field(const char *, msg)
+	),
+
+	TP_fast_assign(
+		__entry->rcuname = rcuname;
+		__entry->msg = msg;
+	),
+
+	TP_printk("%s %s",
+		  __entry->rcuname, __entry->msg)
+);
+
 #endif /* #if defined(CONFIG_TREE_RCU) */
 
 /*
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6c6ff06..2796084 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -521,6 +521,7 @@ static void synchronize_rcu_expedited_wait(void)
 		if (rcu_stall_is_suppressed())
 			continue;
 		panic_on_rcu_stall();
+		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
 		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
 		       rcu_state.name);
 		ndetected = 0;
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 475b261..59b95cc 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -536,6 +536,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("StallDetected"));
 	pr_err("INFO: %s detected stalls on CPUs/tasks:\n", rcu_state.name);
 	rcu_for_each_leaf_node(rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
@@ -606,6 +607,7 @@ static void print_cpu_stall(unsigned long gps)
 	 * See Documentation/RCU/stallwarn.rst for info on how to debug
 	 * RCU CPU stall warnings.
 	 */
+	trace_rcu_stall_warning(rcu_state.name, TPS("SelfDetected"));
 	pr_err("INFO: %s self-detected stall on CPU\n", rcu_state.name);
 	raw_spin_lock_irqsave_rcu_node(rdp->mynode, flags);
 	print_cpu_stall_info(smp_processor_id());

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-03-03  5:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20210215090241epcas1p17b1319d38de5a09a9d65a06ebf936866@epcas1p1.samsung.com>
2021-02-15  8:53 ` [PATCH] rcu/tree: Add a trace event for RCU stall warnings Sangmoon Kim
2021-02-16 17:50   ` Paul E. McKenney
     [not found]     ` <CGME20210217135813epcas1p2b3bd79ca06ac847472b048b889100978@epcas1p2.samsung.com>
2021-02-17 13:49       ` Sangmoon Kim
2021-02-17 18:17         ` Neeraj Upadhyay
     [not found]           ` <CGME20210218130056epcas1p21e5b2e873f8c3cb0aa6953473d4c416c@epcas1p2.samsung.com>
2021-02-18 12:52             ` Sangmoon Kim
2021-02-18 14:17               ` Mathieu Desnoyers
2021-02-18 15:06                 ` Paul E. McKenney
2021-02-18 15:50                   ` Mathieu Desnoyers
2021-02-18 16:58                     ` Uladzislau Rezki
2021-02-18 17:24                       ` Mathieu Desnoyers
2021-02-18 18:43                         ` Uladzislau Rezki
2021-02-18 17:00                     ` Paul E. McKenney
2021-02-18 17:56                       ` Mathieu Desnoyers
2021-02-18 18:13                         ` Paul E. McKenney
2021-02-18 18:24                       ` Steven Rostedt
2021-02-18 18:59                         ` Paul E. McKenney
     [not found]                           ` <CGME20210222083416epcas1p2c4d51fc7a793729c08e424f13077954a@epcas1p2.samsung.com>
2021-02-22  8:25                             ` Sangmoon Kim
2021-02-24  9:18                               ` Neeraj Upadhyay
     [not found]                                 ` <CGME20210224152605epcas1p2170c2d6c0567c47e72559faa3148ddee@epcas1p2.samsung.com>
2021-02-24 15:17                                   ` Sangmoon Kim
     [not found]                                     ` <CGME20210225041622epcas1p159a1f427882a90a4ca815b31114c52b6@epcas1p1.samsung.com>
2021-02-25  4:06                                       ` [PATCH v2] " Sangmoon Kim
2021-02-28 19:47                                         ` Uladzislau Rezki
     [not found]                                           ` <CGME20210302120418epcas1p17e7f55d9c380ae864d2a6afab5fd3b1b@epcas1p1.samsung.com>
2021-03-02 11:55                                             ` [PATCH v3] " Sangmoon Kim
2021-03-02 17:04                                               ` Neeraj Upadhyay
2021-03-02 19:20                                                 ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.