linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
@ 2023-04-06  7:57 Leonardo Bras
  2023-04-06  8:15 ` Sebastian Andrzej Siewior
  2023-04-06  9:55 ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Leonardo Bras @ 2023-04-06  7:57 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Leonardo Bras, Thomas Gleixner,
	Yury Norov, Paul E. McKenney, Peter Zijlstra (Intel),
	Sebastian Andrzej Siewior, Nadav Amit, Zhen Lei, Chen Zhongjin
  Cc: linux-kernel, linux-trace-kernel, Marcelo Tosatti

When running RT workloads in isolated CPUs, many cases of deadline misses
are caused by remote CPU requests such as smp_call_function*().

For those cases, having the names of those functions running around the
deadline miss moment could help finding a target for the next improvements.

Add tracepoints for acquiring the funtion name & argument before entry and
after exitting the called function.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/trace/events/smp.h | 56 ++++++++++++++++++++++++++++++++++++++
 kernel/smp.c               | 11 ++++++++
 2 files changed, 67 insertions(+)
 create mode 100644 include/trace/events/smp.h

diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
new file mode 100644
index 0000000000000..94aae8d71705d
--- /dev/null
+++ b/include/trace/events/smp.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM smp
+
+#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SMP_H
+
+#include <linux/tracepoint.h>
+#include <linux/smp.h>
+
+/*
+ * Tracepoints for a function which is called as an effect of smp_call_function.*
+ */
+TRACE_EVENT(smp_call_function_entry,
+
+	TP_PROTO(smp_call_func_t func, void *info),
+
+	TP_ARGS(func, info),
+
+	TP_STRUCT__entry(
+		__field(void *,	func)
+		__field(void *,	info)
+	),
+
+	TP_fast_assign(
+		__entry->func	= func;
+		__entry->info	= info;
+	),
+
+	TP_printk("function %ps, argument = %p", __entry->func, __entry->info)
+);
+
+
+TRACE_EVENT(smp_call_function_exit,
+
+	TP_PROTO(smp_call_func_t func, void *info),
+
+	TP_ARGS(func, info),
+
+	TP_STRUCT__entry(
+		__field(void *,	func)
+		__field(void *,	info)
+	),
+
+	TP_fast_assign(
+		__entry->func	= func;
+		__entry->info	= info;
+	),
+
+	TP_printk("function %ps with argument = %p", __entry->func, __entry->info)
+);
+
+#endif /* _TRACE_SMP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14a..38d8dec28c39c 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -29,6 +29,9 @@
 #include "smpboot.h"
 #include "sched/smp.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/smp.h>
+
 #define CSD_TYPE(_csd)	((_csd)->node.u_flags & CSD_FLAG_TYPE_MASK)
 
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
@@ -517,7 +520,9 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
 		csd_lock_record(csd);
 		csd_unlock(csd);
 		local_irq_save(flags);
+		trace_smp_call_function_entry(func, info);
 		func(info);
+		trace_smp_call_function_exit(func, info);
 		csd_lock_record(NULL);
 		local_irq_restore(flags);
 		return 0;
@@ -627,7 +632,9 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 			}
 
 			csd_lock_record(csd);
+			trace_smp_call_function_entry(func, info);
 			func(info);
+			trace_smp_call_function_exit(func, info);
 			csd_unlock(csd);
 			csd_lock_record(NULL);
 		} else {
@@ -662,7 +669,9 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 
 				csd_lock_record(csd);
 				csd_unlock(csd);
+				trace_smp_call_function_entry(func, info);
 				func(info);
+				trace_smp_call_function_exit(func, info);
 				csd_lock_record(NULL);
 			} else if (type == CSD_TYPE_IRQ_WORK) {
 				irq_work_single(csd);
@@ -975,7 +984,9 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		unsigned long flags;
 
 		local_irq_save(flags);
+		trace_smp_call_function_entry(func, info);
 		func(info);
+		trace_smp_call_function_exit(func, info);
 		local_irq_restore(flags);
 	}
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06  7:57 [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*() Leonardo Bras
@ 2023-04-06  8:15 ` Sebastian Andrzej Siewior
  2023-04-06  8:42   ` Leonardo Brás
  2023-04-06  9:55 ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-04-06  8:15 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Peter Zijlstra (Intel),
	Nadav Amit, Zhen Lei, Chen Zhongjin, linux-kernel,
	linux-trace-kernel, Marcelo Tosatti

On 2023-04-06 04:57:18 [-0300], Leonardo Bras wrote:
> When running RT workloads in isolated CPUs, many cases of deadline misses
> are caused by remote CPU requests such as smp_call_function*().
> 
> For those cases, having the names of those functions running around the
> deadline miss moment could help finding a target for the next improvements.
> 
> Add tracepoints for acquiring the funtion name & argument before entry and
> after exitting the called function.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

Check
	https://lore.kernel.org/20230307143558.294354-1-vschneid@redhat.com

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06  8:15 ` Sebastian Andrzej Siewior
@ 2023-04-06  8:42   ` Leonardo Brás
  2023-04-06 13:49     ` Steven Rostedt
  0 siblings, 1 reply; 18+ messages in thread
From: Leonardo Brás @ 2023-04-06  8:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Peter Zijlstra (Intel),
	Nadav Amit, Zhen Lei, Chen Zhongjin, linux-kernel,
	linux-trace-kernel, Marcelo Tosatti

On Thu, 2023-04-06 at 10:15 +0200, Sebastian Andrzej Siewior wrote:
> On 2023-04-06 04:57:18 [-0300], Leonardo Bras wrote:
> > When running RT workloads in isolated CPUs, many cases of deadline misses
> > are caused by remote CPU requests such as smp_call_function*().
> > 
> > For those cases, having the names of those functions running around the
> > deadline miss moment could help finding a target for the next improvements.
> > 
> > Add tracepoints for acquiring the funtion name & argument before entry and
> > after exitting the called function.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 
> Check
> 	https://lore.kernel.org/20230307143558.294354-1-vschneid@redhat.com
> 
> Sebastian
> 

Hello Sebastian, thanks for the heads up!

IIUC Valentin is adding/improving tracepoints that are collected in the
requesting CPU, at the moment of scheduling the IPI, which are also useful.

But on my scenario, would be nice to actually have the information on the time
window in which the requested function is running in the requested CPU. 

This would allow us to check how many and which IPIs took too much time to
complete, and caused the RT deadline to miss. 

(I noticed there is an include/trace/events/ipi.h that could be used instead of
creating a new include/trace/events/smp.h, what is your opinion on this ? )

Please let me know if I am missing something.

Thank you!
Leo

 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06  7:57 [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*() Leonardo Bras
  2023-04-06  8:15 ` Sebastian Andrzej Siewior
@ 2023-04-06  9:55 ` Peter Zijlstra
  2023-04-19  3:45   ` Leonardo Brás
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2023-04-06  9:55 UTC (permalink / raw)
  To: Leonardo Bras
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti

On Thu, Apr 06, 2023 at 04:57:18AM -0300, Leonardo Bras wrote:
> When running RT workloads in isolated CPUs, many cases of deadline misses
> are caused by remote CPU requests such as smp_call_function*().
> 
> For those cases, having the names of those functions running around the
> deadline miss moment could help finding a target for the next improvements.
> 
> Add tracepoints for acquiring the funtion name & argument before entry and
> after exitting the called function.
> 
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

How are the patches queued there not sufficient?

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/core

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06  8:42   ` Leonardo Brás
@ 2023-04-06 13:49     ` Steven Rostedt
  2023-04-19  4:04       ` Leonardo Brás
  0 siblings, 1 reply; 18+ messages in thread
From: Steven Rostedt @ 2023-04-06 13:49 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Sebastian Andrzej Siewior, Masami Hiramatsu, Thomas Gleixner,
	Yury Norov, Paul E. McKenney, Peter Zijlstra (Intel),
	Nadav Amit, Zhen Lei, Chen Zhongjin, linux-kernel,
	linux-trace-kernel, Marcelo Tosatti

On Thu, 06 Apr 2023 05:42:13 -0300
Leonardo Brás <leobras@redhat.com> wrote:

> Hello Sebastian, thanks for the heads up!
> 
> IIUC Valentin is adding/improving tracepoints that are collected in the
> requesting CPU, at the moment of scheduling the IPI, which are also useful.
> 
> But on my scenario, would be nice to actually have the information on the time
> window in which the requested function is running in the requested CPU. 
> 

Well once you know the functions that are being called (running your test
case with the IPI trace events), then you can get the timings of those same
functions by:

 # cd /sys/kernel/tracing
 # echo '<space-delimited-list-of-functions>' > set_ftrace_filter
 # echo function_graph > current_tracer

Then the function graph will give you those timings.

-- Steve

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06  9:55 ` Peter Zijlstra
@ 2023-04-19  3:45   ` Leonardo Brás
  2023-05-03  4:23     ` Leonardo Brás
  2023-05-03 14:59     ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Leonardo Brás @ 2023-04-19  3:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti

On Thu, 2023-04-06 at 11:55 +0200, Peter Zijlstra wrote:
> On Thu, Apr 06, 2023 at 04:57:18AM -0300, Leonardo Bras wrote:
> > When running RT workloads in isolated CPUs, many cases of deadline misses
> > are caused by remote CPU requests such as smp_call_function*().
> > 
> > For those cases, having the names of those functions running around the
> > deadline miss moment could help finding a target for the next improvements.
> > 
> > Add tracepoints for acquiring the funtion name & argument before entry and
> > after exitting the called function.
> > 
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> 
> How are the patches queued there not sufficient?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/core
> 

IIUC the last commits add tracepoints that are collected in the
requesting CPU, at the moment of scheduling the IPI, which are also useful in
some scenarios.

On my scenario, it could help a little, since it makes possible to filter what
all other cpus are scheduling on the requested cpu. OTOH it could be also be
misleading, as the requested cpu could be running something that was scheduled
way before.

The change I propose does exactly what my scenario need: track exactly which
function was running at given time in the requested CPU. With this info, we can
check which (if any) remotely requested function was running on given time
window.

(An unrelated thing I just thought: We could even use the commits you pointed
together with my proposed change in order to measure how long does it take for a
requested function to run / complete in the requested cpu)

Thanks for reviewing!

Leo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-06 13:49     ` Steven Rostedt
@ 2023-04-19  4:04       ` Leonardo Brás
  0 siblings, 0 replies; 18+ messages in thread
From: Leonardo Brás @ 2023-04-19  4:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sebastian Andrzej Siewior, Masami Hiramatsu, Thomas Gleixner,
	Yury Norov, Paul E. McKenney, Peter Zijlstra (Intel),
	Nadav Amit, Zhen Lei, Chen Zhongjin, linux-kernel,
	linux-trace-kernel, Marcelo Tosatti

On Thu, 2023-04-06 at 09:49 -0400, Steven Rostedt wrote:
> On Thu, 06 Apr 2023 05:42:13 -0300
> Leonardo Brás <leobras@redhat.com> wrote:
> 
> > Hello Sebastian, thanks for the heads up!
> > 
> > IIUC Valentin is adding/improving tracepoints that are collected in the
> > requesting CPU, at the moment of scheduling the IPI, which are also useful.
> > 
> > But on my scenario, would be nice to actually have the information on the time
> > window in which the requested function is running in the requested CPU. 
> > 

Hello Steven,

> 
> Well once you know the functions that are being called (running your test
> case with the IPI trace events), then you can get the timings of those same
> functions by:
> 
>  # cd /sys/kernel/tracing
>  # echo '<space-delimited-list-of-functions>' > set_ftrace_filter
>  # echo function_graph > current_tracer
> 
> Then the function graph will give you those timings.

IIUC, the use scenario of your suggestion would be something like:

1) Run something until we get a deadline miss in given CPU
2) Get the log -> Which functions were scheduled on that CPU in the last N time
3) Add those functions to the ftrace_filter
4) Run again until a deadline miss again

If that's correct, there are some points worth mentioning:
a) It requires getting the deadline miss again, which may take non-negligible
time for reproducing, slowing the debug process. The suggested change would get
the functions in the first run.
b) Where is no guarantee the next deadline time window will contain (only) those
functions. In case some are missing, we would need to add the new functions and
run the test once more.
c) Those functions could be called outside of remote scheduling context, and
would dirty our log.

> 
> -- Steve
> 

Thanks for reviewing and contributing, Steve!

Leo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-19  3:45   ` Leonardo Brás
@ 2023-05-03  4:23     ` Leonardo Brás
  2023-05-03 14:59     ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Leonardo Brás @ 2023-05-03  4:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti

On Wed, 2023-04-19 at 00:45 -0300, Leonardo Brás wrote:
> On Thu, 2023-04-06 at 11:55 +0200, Peter Zijlstra wrote:
> > On Thu, Apr 06, 2023 at 04:57:18AM -0300, Leonardo Bras wrote:
> > > When running RT workloads in isolated CPUs, many cases of deadline misses
> > > are caused by remote CPU requests such as smp_call_function*().
> > > 
> > > For those cases, having the names of those functions running around the
> > > deadline miss moment could help finding a target for the next improvements.
> > > 
> > > Add tracepoints for acquiring the funtion name & argument before entry and
> > > after exitting the called function.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > 
> > How are the patches queued there not sufficient?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/core


Hello Peter, 

The trace I proposed will run in the CPU which received the request, not in the
one which originated it. Meaning the trace will run in "cpu" from
smp_call_function*(cpu,...)

IIUC the trace introduced by the commits in tip/smp/core (which you pointed)
will run in the cpu which calls smp_call_function*(), which is different from
above.

Did I get it right?
Could you please help me understand how the commits in tip/smp/core are
sufficient for my scenario?

Thanks!
Leo



> > 
> 
> IIUC the last commits add tracepoints that are collected in the
> requesting CPU, at the moment of scheduling the IPI, which are also useful in
> some scenarios.
> 
> On my scenario, it could help a little, since it makes possible to filter what
> all other cpus are scheduling on the requested cpu. OTOH it could be also be
> misleading, as the requested cpu could be running something that was scheduled
> way before.
> 
> The change I propose does exactly what my scenario need: track exactly which
> function was running at given time in the requested CPU. With this info, we can
> check which (if any) remotely requested function was running on given time
> window.
> 
> (An unrelated thing I just thought: We could even use the commits you pointed
> together with my proposed change in order to measure how long does it take for a
> requested function to run / complete in the requested cpu)
> 
> Thanks for reviewing!
> 
> Leo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-04-19  3:45   ` Leonardo Brás
  2023-05-03  4:23     ` Leonardo Brás
@ 2023-05-03 14:59     ` Peter Zijlstra
  2023-05-03 15:53       ` Leonardo Bras Soares Passos
  2023-05-04 11:59       ` Valentin Schneider
  1 sibling, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2023-05-03 14:59 UTC (permalink / raw)
  To: Leonardo Brás
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, vschneid

On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:
> On Thu, 2023-04-06 at 11:55 +0200, Peter Zijlstra wrote:
> > On Thu, Apr 06, 2023 at 04:57:18AM -0300, Leonardo Bras wrote:
> > > When running RT workloads in isolated CPUs, many cases of deadline misses
> > > are caused by remote CPU requests such as smp_call_function*().
> > > 
> > > For those cases, having the names of those functions running around the
> > > deadline miss moment could help finding a target for the next improvements.
> > > 
> > > Add tracepoints for acquiring the funtion name & argument before entry and
> > > after exitting the called function.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > 
> > How are the patches queued there not sufficient?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/core
> > 
> 
> IIUC the last commits add tracepoints that are collected in the
> requesting CPU, at the moment of scheduling the IPI, which are also useful in
> some scenarios.
> 
> On my scenario, it could help a little, since it makes possible to filter what
> all other cpus are scheduling on the requested cpu. OTOH it could be also be
> misleading, as the requested cpu could be running something that was scheduled
> way before.
> 
> The change I propose does exactly what my scenario need: track exactly which
> function was running at given time in the requested CPU. With this info, we can
> check which (if any) remotely requested function was running on given time
> window.

I was thinking you could simply (graph)-trace
__flush_smp_call_function_queue() with a max_graph_depth or so (Steve
says that ought to work).

But even that might be too specific, your use case sounds more like what
we have the irq-off latency tracer for, and that thing will immediately
tell you what functions were being ran.

> (An unrelated thing I just thought: We could even use the commits you pointed
> together with my proposed change in order to measure how long does it take for a
> requested function to run / complete in the requested cpu)

I don't think you could actually do that; the send tracepoints Valentin
added don't log the csd address, this means you cannot distinguish
two CSDs with the same function send from different CPUs.

To do this you'd need to add the csd address to the the ipi_send
tracepoints and your own (possibly replacing info -- because I don't
think that's too useful).

Valentin -- is any of this something you'd also find useful?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-03 14:59     ` Peter Zijlstra
@ 2023-05-03 15:53       ` Leonardo Bras Soares Passos
  2023-05-04 11:59       ` Valentin Schneider
  1 sibling, 0 replies; 18+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-05-03 15:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, vschneid

On Wed, May 3, 2023 at 12:17 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brįs wrote:
> > On Thu, 2023-04-06 at 11:55 +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 06, 2023 at 04:57:18AM -0300, Leonardo Bras wrote:
> > > > When running RT workloads in isolated CPUs, many cases of deadline misses
> > > > are caused by remote CPU requests such as smp_call_function*().
> > > >
> > > > For those cases, having the names of those functions running around the
> > > > deadline miss moment could help finding a target for the next improvements.
> > > >
> > > > Add tracepoints for acquiring the funtion name & argument before entry and
> > > > after exitting the called function.
> > > >
> > > > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > >
> > > How are the patches queued there not sufficient?
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=smp/core
> > >
> >
> > IIUC the last commits add tracepoints that are collected in the
> > requesting CPU, at the moment of scheduling the IPI, which are also useful in
> > some scenarios.
> >
> > On my scenario, it could help a little, since it makes possible to filter what
> > all other cpus are scheduling on the requested cpu. OTOH it could be also be
> > misleading, as the requested cpu could be running something that was scheduled
> > way before.
> >
> > The change I propose does exactly what my scenario need: track exactly which
> > function was running at given time in the requested CPU. With this info, we can
> > check which (if any) remotely requested function was running on given time
> > window.
>
> I was thinking you could simply (graph)-trace
> __flush_smp_call_function_queue() with a max_graph_depth or so (Steve
> says that ought to work).
>
> But even that might be too specific, your use case sounds more like what
> we have the irq-off latency tracer for, and that thing will immediately
> tell you what functions were being ran.
>
> > (An unrelated thing I just thought: We could even use the commits you pointed
> > together with my proposed change in order to measure how long does it take for a
> > requested function to run / complete in the requested cpu)
>
> I don't think you could actually do that; the send tracepoints Valentin
> added don't log the csd address, this means you cannot distinguish
> two CSDs with the same function send from different CPUs.
>
> To do this you'd need to add the csd address to the the ipi_send
> tracepoints and your own (possibly replacing info -- because I don't
> think that's too useful).

Sure, I will improve this in my patch.

You think I should rebase it on top of tip/smp/core in order to add it
to the set?

Thanks!
Leo

>
> Valentin -- is any of this something you'd also find useful?
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-03 14:59     ` Peter Zijlstra
  2023-05-03 15:53       ` Leonardo Bras Soares Passos
@ 2023-05-04 11:59       ` Valentin Schneider
  2023-05-04 13:34         ` Steven Rostedt
                           ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Valentin Schneider @ 2023-05-04 11:59 UTC (permalink / raw)
  To: Peter Zijlstra, Leonardo Brás
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, Daniel Bristot de Oliveira

+Daniel

On 03/05/23 16:59, Peter Zijlstra wrote:
> On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:
>>
>> IIUC the last commits add tracepoints that are collected in the
>> requesting CPU, at the moment of scheduling the IPI, which are also useful in
>> some scenarios.
>>
>> On my scenario, it could help a little, since it makes possible to filter what
>> all other cpus are scheduling on the requested cpu. OTOH it could be also be
>> misleading, as the requested cpu could be running something that was scheduled
>> way before.
>>
>> The change I propose does exactly what my scenario need: track exactly which
>> function was running at given time in the requested CPU. With this info, we can
>> check which (if any) remotely requested function was running on given time
>> window.
>
> I was thinking you could simply (graph)-trace
> __flush_smp_call_function_queue() with a max_graph_depth or so (Steve
> says that ought to work).
>
> But even that might be too specific, your use case sounds more like what
> we have the irq-off latency tracer for, and that thing will immediately
> tell you what functions were being ran.
>
>> (An unrelated thing I just thought: We could even use the commits you pointed
>> together with my proposed change in order to measure how long does it take for a
>> requested function to run / complete in the requested cpu)
>
> I don't think you could actually do that; the send tracepoints Valentin
> added don't log the csd address, this means you cannot distinguish
> two CSDs with the same function send from different CPUs.
>
> To do this you'd need to add the csd address to the the ipi_send
> tracepoints and your own (possibly replacing info -- because I don't
> think that's too useful).
>
> Valentin -- is any of this something you'd also find useful?

Conceptually yeah, however:

With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over,
you interrupt the userspace task and you eat the full meal that is
NOHZ_FULL kernel entry. Pretty much any such IPI will show up when
evaluating your setup with rteval/rlta/whatever, so having an event at
IPI reception is redundant.

IIUC Leonardo's looking at a setup with isolcpus but not necessarily
NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the
isolated task over its deadline/period. I would argue any IPI received
there is at risk of doing that, so it's the same fight as with NOHZ_FULL.

With that said, I suppose this could still be helpful for e.g. osnoise to
hook into and point the finger at which CPU/context sent the problematic
IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery
times.

One thing though is that trace_ipi_send_cpu*() is not used solely for
CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to
split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*().

Something like so...

---
diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
new file mode 100644
index 0000000000000..8fc725a2b45b9
--- /dev/null
+++ b/include/trace/events/smp.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM smp
+
+#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SMP_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(csd_queue_cpu,
+
+	TP_PROTO(const unsigned int cpu,
+		 unsigned long callsite,
+		 smp_call_func_t func,
+		 call_single_data_t *csd),
+
+	TP_ARGS(cpu, callsite, func, csd),
+
+	TP_STRUCT__entry(
+		__field(unsigned int, cpu)
+		__field(void *, callsite)
+		__field(void *, func)
+		__field(void *, csd)
+	),
+
+	TP_fast_assign(
+		__entry->cpu = cpu;
+		__entry->callsite = (void *)callsite;
+		__entry->func = func;
+		__entry->csd  = csd;
+	),
+
+	TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
+		  __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
+);
+
+DECLARE_EVENT_CLASS(csd_function,
+
+	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+
+	TP_ARGS(func, csd),
+
+	TP_STRUCT__entry(
+		__field(void *,	func)
+		__field(void *,	csd)
+	),
+
+	TP_fast_assign(
+		__entry->func	= func;
+		__entry->csd	= csd;
+	),
+
+	TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_entry,
+	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+	TP_ARGS(func, csd)
+);
+
+DEFINE_EVENT(csd_function, csd_function_exit,
+	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
+	TP_ARGS(func, csd)
+);
+
+#endif /* _TRACE_SMP_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/smp.c b/kernel/smp.c
index ab3e5dad6cfe9..7d28db303e9bc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -27,6 +27,9 @@
 #include <linux/jump_label.h>
 
 #include <trace/events/ipi.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/smp.h>
+#undef CREATE_TRACE_POINTS
 
 #include "smpboot.h"
 #include "sched/smp.h"
@@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
 	arch_send_call_function_ipi_mask(mask);
 }
 
+static __always_inline void
+csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
+{
+	trace_csd_function_entry(func, csd);
+	func(info);
+	trace_csd_function_exit(func, csd);
+}
+
 #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
 
 static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);
@@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
 	 * even if we haven't sent the smp_call IPI yet (e.g. the stopper
 	 * executes migration_cpu_stop() on the remote CPU).
 	 */
-	if (trace_ipi_send_cpu_enabled()) {
+	if (trace_csd_queue_cpu_enabled()) {
 		call_single_data_t *csd;
 		smp_call_func_t func;
 
@@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
 		func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
 			sched_ttwu_pending : csd->func;
 
-		trace_ipi_send_cpu(cpu, _RET_IP_, func);
+		trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
 	}
 
 	/*
@@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
 		csd_lock_record(csd);
 		csd_unlock(csd);
 		local_irq_save(flags);
-		func(info);
+		csd_do_func(func, info, NULL);
 		csd_lock_record(NULL);
 		local_irq_restore(flags);
 		return 0;
@@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 			}
 
 			csd_lock_record(csd);
-			func(info);
+			csd_do_func(func, info, csd);
 			csd_unlock(csd);
 			csd_lock_record(NULL);
 		} else {
@@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 
 				csd_lock_record(csd);
 				csd_unlock(csd);
-				func(info);
+				csd_do_func(func, info, csd);
 				csd_lock_record(NULL);
 			} else if (type == CSD_TYPE_IRQ_WORK) {
 				irq_work_single(csd);
@@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
 	/*
 	 * Third; only CSD_TYPE_TTWU is left, issue those.
 	 */
-	if (entry)
-		sched_ttwu_pending(entry);
+	if (entry) {
+		csd = llist_entry(entry, typeof(*csd), node.llist);
+		csd_do_func(sched_ttwu_pending, entry, csd);
+	}
 }
 
 
@@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single);
 
 /**
  * smp_call_function_single_async() - Run an asynchronous function on a
- * 			         specific CPU.
+ *				 specific CPU.
  * @cpu: The CPU to run on.
  * @csd: Pre-allocated and setup data structure
  *
@@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 	int cpu, last_cpu, this_cpu = smp_processor_id();
 	struct call_function_data *cfd;
 	bool wait = scf_flags & SCF_WAIT;
-	int nr_cpus = 0, nr_queued = 0;
+	int nr_cpus = 0;
 	bool run_remote = false;
 	bool run_local = false;
 
@@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 			csd->node.src = smp_processor_id();
 			csd->node.dst = cpu;
 #endif
+
+			trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
+
 			if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
 				__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
 				nr_cpus++;
 				last_cpu = cpu;
 			}
-			nr_queued++;
 		}
 
-		/*
-		 * Trace each smp_function_call_*() as an IPI, actual IPIs
-		 * will be traced with func==generic_smp_call_function_single_ipi().
-		 */
-		if (nr_queued)
-			trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
-
 		/*
 		 * Choose the most efficient way to send an IPI. Note that the
 		 * number of CPUs might be zero due to concurrent changes to the
@@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 		unsigned long flags;
 
 		local_irq_save(flags);
-		func(info);
+		csd_do_func(func, info, NULL);
 		local_irq_restore(flags);
 	}
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-04 11:59       ` Valentin Schneider
@ 2023-05-04 13:34         ` Steven Rostedt
  2023-05-04 15:01         ` Peter Zijlstra
  2023-05-10 20:27         ` Leonardo Brás
  2 siblings, 0 replies; 18+ messages in thread
From: Steven Rostedt @ 2023-05-04 13:34 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Leonardo Brás, Masami Hiramatsu,
	Thomas Gleixner, Yury Norov, Paul E. McKenney,
	Sebastian Andrzej Siewior, Nadav Amit, Zhen Lei, Chen Zhongjin,
	linux-kernel, linux-trace-kernel, Marcelo Tosatti,
	Daniel Bristot de Oliveira

On Thu, 04 May 2023 12:59:41 +0100
Valentin Schneider <vschneid@redhat.com> wrote:

> +TRACE_EVENT(csd_queue_cpu,
> +
> +	TP_PROTO(const unsigned int cpu,

Why the "const" ?

-- Steve

> +		 unsigned long callsite,
> +		 smp_call_func_t func,
> +		 call_single_data_t *csd),
> +
> +	TP_ARGS(cpu, callsite, func, csd),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, cpu)
> +		__field(void *, callsite)
> +		__field(void *, func)
> +		__field(void *, csd)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu = cpu;
> +		__entry->callsite = (void *)callsite;
> +		__entry->func = func;
> +		__entry->csd  = csd;
> +	),
> +
> +	TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> +		  __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> +);
> +

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-04 11:59       ` Valentin Schneider
  2023-05-04 13:34         ` Steven Rostedt
@ 2023-05-04 15:01         ` Peter Zijlstra
  2023-05-10 20:27         ` Leonardo Brás
  2 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2023-05-04 15:01 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Leonardo Brás, Steven Rostedt, Masami Hiramatsu,
	Thomas Gleixner, Yury Norov, Paul E. McKenney,
	Sebastian Andrzej Siewior, Nadav Amit, Zhen Lei, Chen Zhongjin,
	linux-kernel, linux-trace-kernel, Marcelo Tosatti,
	Daniel Bristot de Oliveira

On Thu, May 04, 2023 at 12:59:41PM +0100, Valentin Schneider wrote:

> With that said, I suppose this could still be helpful for e.g. osnoise to
> hook into and point the finger at which CPU/context sent the problematic
> IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery
> times.
> 
> One thing though is that trace_ipi_send_cpu*() is not used solely for
> CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to
> split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*().
> 
> Something like so...
> 

Yeah, looks about right I suppose.. it generates more events, but given
we need the csd that's inevitable.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-04 11:59       ` Valentin Schneider
  2023-05-04 13:34         ` Steven Rostedt
  2023-05-04 15:01         ` Peter Zijlstra
@ 2023-05-10 20:27         ` Leonardo Brás
  2023-05-10 22:18           ` Leonardo Bras Soares Passos
  2023-05-11  8:13           ` Valentin Schneider
  2 siblings, 2 replies; 18+ messages in thread
From: Leonardo Brás @ 2023-05-10 20:27 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, Daniel Bristot de Oliveira

On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
> +Daniel
> 
> On 03/05/23 16:59, Peter Zijlstra wrote:
> > On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:
> > > 
> > > IIUC the last commits add tracepoints that are collected in the
> > > requesting CPU, at the moment of scheduling the IPI, which are also useful in
> > > some scenarios.
> > > 
> > > On my scenario, it could help a little, since it makes possible to filter what
> > > all other cpus are scheduling on the requested cpu. OTOH it could be also be
> > > misleading, as the requested cpu could be running something that was scheduled
> > > way before.
> > > 
> > > The change I propose does exactly what my scenario need: track exactly which
> > > function was running at given time in the requested CPU. With this info, we can
> > > check which (if any) remotely requested function was running on given time
> > > window.
> > 
> > I was thinking you could simply (graph)-trace
> > __flush_smp_call_function_queue() with a max_graph_depth or so (Steve
> > says that ought to work).
> > 
> > But even that might be too specific, your use case sounds more like what
> > we have the irq-off latency tracer for, and that thing will immediately
> > tell you what functions were being ran.
> > 
> > > (An unrelated thing I just thought: We could even use the commits you pointed
> > > together with my proposed change in order to measure how long does it take for a
> > > requested function to run / complete in the requested cpu)
> > 
> > I don't think you could actually do that; the send tracepoints Valentin
> > added don't log the csd address, this means you cannot distinguish
> > two CSDs with the same function send from different CPUs.
> > 
> > To do this you'd need to add the csd address to the the ipi_send
> > tracepoints and your own (possibly replacing info -- because I don't
> > think that's too useful).
> > 
> > Valentin -- is any of this something you'd also find useful?
> 
> Conceptually yeah, however:
> 
> With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over,
> you interrupt the userspace task and you eat the full meal that is
> NOHZ_FULL kernel entry. Pretty much any such IPI will show up when
> evaluating your setup with rteval/rlta/whatever, so having an event at
> IPI reception is redundant.
> 
> IIUC Leonardo's looking at a setup with isolcpus but not necessarily
> NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the
> isolated task over its deadline/period. I would argue any IPI received
> there is at risk of doing that, so it's the same fight as with NOHZ_FULL.
> 
> With that said, I suppose this could still be helpful for e.g. osnoise to
> hook into and point the finger at which CPU/context sent the problematic
> IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery
> times.
> 
> One thing though is that trace_ipi_send_cpu*() is not used solely for
> CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to
> split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*().
> 
> Something like so...
> 
> ---
> diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
> new file mode 100644
> index 0000000000000..8fc725a2b45b9
> --- /dev/null
> +++ b/include/trace/events/smp.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM smp
> +
> +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_SMP_H
> +
> +#include <linux/tracepoint.h>
> +
> +TRACE_EVENT(csd_queue_cpu,
> +
> +	TP_PROTO(const unsigned int cpu,
> +		 unsigned long callsite,
> +		 smp_call_func_t func,
> +		 call_single_data_t *csd),
> +
> +	TP_ARGS(cpu, callsite, func, csd),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned int, cpu)
> +		__field(void *, callsite)
> +		__field(void *, func)
> +		__field(void *, csd)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->cpu = cpu;
> +		__entry->callsite = (void *)callsite;
> +		__entry->func = func;
> +		__entry->csd  = csd;
> +	),
> +
> +	TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> +		  __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> +);

This is for the caller side, right?

> +
> +DECLARE_EVENT_CLASS(csd_function,
> +
> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> +
> +	TP_ARGS(func, csd),
> +
> +	TP_STRUCT__entry(
> +		__field(void *,	func)
> +		__field(void *,	csd)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->func	= func;
> +		__entry->csd	= csd;
> +	),
> +
> +	TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
> +);
> +
> +DEFINE_EVENT(csd_function, csd_function_entry,
> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> +	TP_ARGS(func, csd)
> +);
> +
> +DEFINE_EVENT(csd_function, csd_function_exit,
> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> +	TP_ARGS(func, csd)
> +);

Oh, this is what event_class is for. Thanks for the example :)

> +
> +#endif /* _TRACE_SMP_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/smp.c b/kernel/smp.c
> index ab3e5dad6cfe9..7d28db303e9bc 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -27,6 +27,9 @@
>  #include <linux/jump_label.h>
>  
>  #include <trace/events/ipi.h>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/smp.h>
> +#undef CREATE_TRACE_POINTS
>  
>  #include "smpboot.h"
>  #include "sched/smp.h"
> @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
>  	arch_send_call_function_ipi_mask(mask);
>  }
>  
> +static __always_inline void
> +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
> +{
> +	trace_csd_function_entry(func, csd);
> +	func(info);
> +	trace_csd_function_exit(func, csd);
> +}
> +

Good one, a helper to avoid calling those traces everywhere. 

Honest question: 
Since info == csd->info and func == csd->func, we could just pass csd, right?
I suppose the suggestion on the 3-argument version is to use the values already
fetched from memory instead of fetching them again. Is that correct?

>  #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
>  
>  static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);
> @@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
>  	 * even if we haven't sent the smp_call IPI yet (e.g. the stopper
>  	 * executes migration_cpu_stop() on the remote CPU).
>  	 */
> -	if (trace_ipi_send_cpu_enabled()) {
> +	if (trace_csd_queue_cpu_enabled()) {
>  		call_single_data_t *csd;
>  		smp_call_func_t func;
>  
> @@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
>  		func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
>  			sched_ttwu_pending : csd->func;
>  
> -		trace_ipi_send_cpu(cpu, _RET_IP_, func);
> +		trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
>  	}
>  
>  	/*
> @@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
>  		csd_lock_record(csd);
>  		csd_unlock(csd);
>  		local_irq_save(flags);
> -		func(info);
> +		csd_do_func(func, info, NULL);
>  		csd_lock_record(NULL);
>  		local_irq_restore(flags);
>  		return 0;
> @@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
>  			}
>  
>  			csd_lock_record(csd);
> -			func(info);
> +			csd_do_func(func, info, csd);
>  			csd_unlock(csd);
>  			csd_lock_record(NULL);
>  		} else {
> @@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
>  
>  				csd_lock_record(csd);
>  				csd_unlock(csd);
> -				func(info);
> +				csd_do_func(func, info, csd);
>  				csd_lock_record(NULL);
>  			} else if (type == CSD_TYPE_IRQ_WORK) {
>  				irq_work_single(csd);
> @@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
>  	/*
>  	 * Third; only CSD_TYPE_TTWU is left, issue those.
>  	 */
> -	if (entry)
> -		sched_ttwu_pending(entry);
> +	if (entry) {
> +		csd = llist_entry(entry, typeof(*csd), node.llist);
> +		csd_do_func(sched_ttwu_pending, entry, csd);
> +	}
>  }
>  
>  
> @@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single);
>  
>  /**
>   * smp_call_function_single_async() - Run an asynchronous function on a
> - * 			         specific CPU.
> + *				 specific CPU.
>   * @cpu: The CPU to run on.
>   * @csd: Pre-allocated and setup data structure
>   *
> @@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>  	int cpu, last_cpu, this_cpu = smp_processor_id();
>  	struct call_function_data *cfd;
>  	bool wait = scf_flags & SCF_WAIT;
> -	int nr_cpus = 0, nr_queued = 0;
> +	int nr_cpus = 0;
>  	bool run_remote = false;
>  	bool run_local = false;
>  
> @@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>  			csd->node.src = smp_processor_id();
>  			csd->node.dst = cpu;
>  #endif
> +
> +			trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
> +
>  			if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
>  				__cpumask_set_cpu(cpu, cfd->cpumask_ipi);
>  				nr_cpus++;
>  				last_cpu = cpu;
>  			}
> -			nr_queued++;
>  		}
>  
> -		/*
> -		 * Trace each smp_function_call_*() as an IPI, actual IPIs
> -		 * will be traced with func==generic_smp_call_function_single_ipi().
> -		 */
> -		if (nr_queued)
> -			trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
> -
>  		/*
>  		 * Choose the most efficient way to send an IPI. Note that the
>  		 * number of CPUs might be zero due to concurrent changes to the
> @@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
>  		unsigned long flags;
>  
>  		local_irq_save(flags);
> -		func(info);
> +		csd_do_func(func, info, NULL);
>  		local_irq_restore(flags);
>  	}
>  
> 

I will rebase my patch on top of tip/smp/core and apply the suggested changes.
A v2 will be send shortly.

Thank you all for reviewing!
Leo



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-10 20:27         ` Leonardo Brás
@ 2023-05-10 22:18           ` Leonardo Bras Soares Passos
  2023-05-10 23:05             ` Leonardo Bras Soares Passos
  2023-05-11  8:13           ` Valentin Schneider
  1 sibling, 1 reply; 18+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-05-10 22:18 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, Daniel Bristot de Oliveira

On Wed, May 10, 2023 at 5:27 PM Leonardo Brás <leobras@redhat.com> wrote:
>
> On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
> > +Daniel
> >
> > On 03/05/23 16:59, Peter Zijlstra wrote:
> > > On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:
> > > >
> > > > IIUC the last commits add tracepoints that are collected in the
> > > > requesting CPU, at the moment of scheduling the IPI, which are also useful in
> > > > some scenarios.
> > > >
> > > > On my scenario, it could help a little, since it makes possible to filter what
> > > > all other cpus are scheduling on the requested cpu. OTOH it could be also be
> > > > misleading, as the requested cpu could be running something that was scheduled
> > > > way before.
> > > >
> > > > The change I propose does exactly what my scenario need: track exactly which
> > > > function was running at given time in the requested CPU. With this info, we can
> > > > check which (if any) remotely requested function was running on given time
> > > > window.
> > >
> > > I was thinking you could simply (graph)-trace
> > > __flush_smp_call_function_queue() with a max_graph_depth or so (Steve
> > > says that ought to work).
> > >
> > > But even that might be too specific, your use case sounds more like what
> > > we have the irq-off latency tracer for, and that thing will immediately
> > > tell you what functions were being ran.
> > >
> > > > (An unrelated thing I just thought: We could even use the commits you pointed
> > > > together with my proposed change in order to measure how long does it take for a
> > > > requested function to run / complete in the requested cpu)
> > >
> > > I don't think you could actually do that; the send tracepoints Valentin
> > > added don't log the csd address, this means you cannot distinguish
> > > two CSDs with the same function send from different CPUs.
> > >
> > > To do this you'd need to add the csd address to the the ipi_send
> > > tracepoints and your own (possibly replacing info -- because I don't
> > > think that's too useful).
> > >
> > > Valentin -- is any of this something you'd also find useful?
> >
> > Conceptually yeah, however:
> >
> > With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over,
> > you interrupt the userspace task and you eat the full meal that is
> > NOHZ_FULL kernel entry. Pretty much any such IPI will show up when
> > evaluating your setup with rteval/rlta/whatever, so having an event at
> > IPI reception is redundant.
> >
> > IIUC Leonardo's looking at a setup with isolcpus but not necessarily
> > NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the
> > isolated task over its deadline/period. I would argue any IPI received
> > there is at risk of doing that, so it's the same fight as with NOHZ_FULL.
> >
> > With that said, I suppose this could still be helpful for e.g. osnoise to
> > hook into and point the finger at which CPU/context sent the problematic
> > IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery
> > times.
> >
> > One thing though is that trace_ipi_send_cpu*() is not used solely for
> > CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to
> > split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*().
> >
> > Something like so...
> >
> > ---
> > diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
> > new file mode 100644
> > index 0000000000000..8fc725a2b45b9
> > --- /dev/null
> > +++ b/include/trace/events/smp.h
> > @@ -0,0 +1,69 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM smp
> > +
> > +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_SMP_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +TRACE_EVENT(csd_queue_cpu,
> > +
> > +     TP_PROTO(const unsigned int cpu,
> > +              unsigned long callsite,
> > +              smp_call_func_t func,
> > +              call_single_data_t *csd),
> > +
> > +     TP_ARGS(cpu, callsite, func, csd),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(unsigned int, cpu)
> > +             __field(void *, callsite)
> > +             __field(void *, func)
> > +             __field(void *, csd)
> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->cpu = cpu;
> > +             __entry->callsite = (void *)callsite;
> > +             __entry->func = func;
> > +             __entry->csd  = csd;
> > +     ),
> > +
> > +     TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> > +               __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> > +);
>
> This is for the caller side, right?
>
> > +
> > +DECLARE_EVENT_CLASS(csd_function,
> > +
> > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > +
> > +     TP_ARGS(func, csd),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(void *, func)
> > +             __field(void *, csd)
> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->func   = func;
> > +             __entry->csd    = csd;
> > +     ),
> > +
> > +     TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
> > +);
> > +
> > +DEFINE_EVENT(csd_function, csd_function_entry,
> > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > +     TP_ARGS(func, csd)
> > +);
> > +
> > +DEFINE_EVENT(csd_function, csd_function_exit,
> > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > +     TP_ARGS(func, csd)
> > +);
>
> Oh, this is what event_class is for. Thanks for the example :)
>
> > +
> > +#endif /* _TRACE_SMP_H */
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index ab3e5dad6cfe9..7d28db303e9bc 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -27,6 +27,9 @@
> >  #include <linux/jump_label.h>
> >
> >  #include <trace/events/ipi.h>
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/smp.h>
> > +#undef CREATE_TRACE_POINTS
> >
> >  #include "smpboot.h"
> >  #include "sched/smp.h"
> > @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
> >       arch_send_call_function_ipi_mask(mask);
> >  }
> >
> > +static __always_inline void
> > +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
> > +{
> > +     trace_csd_function_entry(func, csd);
> > +     func(info);
> > +     trace_csd_function_exit(func, csd);
> > +}
> > +
>
> Good one, a helper to avoid calling those traces everywhere.
>
> Honest question:
> Since info == csd->info and func == csd->func, we could just pass csd, right?
> I suppose the suggestion on the 3-argument version is to use the values already
> fetched from memory instead of fetching them again. Is that correct?
>
> >  #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
> >
> >  static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);
> > @@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
> >        * even if we haven't sent the smp_call IPI yet (e.g. the stopper
> >        * executes migration_cpu_stop() on the remote CPU).
> >        */
> > -     if (trace_ipi_send_cpu_enabled()) {
> > +     if (trace_csd_queue_cpu_enabled()) {
> >               call_single_data_t *csd;
> >               smp_call_func_t func;
> >
> > @@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
> >               func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
> >                       sched_ttwu_pending : csd->func;
> >
> > -             trace_ipi_send_cpu(cpu, _RET_IP_, func);
> > +             trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
> >       }
> >
> >       /*
> > @@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
> >               csd_lock_record(csd);
> >               csd_unlock(csd);
> >               local_irq_save(flags);
> > -             func(info);
> > +             csd_do_func(func, info, NULL);
> >               csd_lock_record(NULL);
> >               local_irq_restore(flags);
> >               return 0;
> > @@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> >                       }
> >
> >                       csd_lock_record(csd);
> > -                     func(info);
> > +                     csd_do_func(func, info, csd);
> >                       csd_unlock(csd);
> >                       csd_lock_record(NULL);
> >               } else {
> > @@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> >
> >                               csd_lock_record(csd);
> >                               csd_unlock(csd);
> > -                             func(info);
> > +                             csd_do_func(func, info, csd);
> >                               csd_lock_record(NULL);
> >                       } else if (type == CSD_TYPE_IRQ_WORK) {
> >                               irq_work_single(csd);
> > @@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> >       /*
> >        * Third; only CSD_TYPE_TTWU is left, issue those.
> >        */
> > -     if (entry)
> > -             sched_ttwu_pending(entry);
> > +     if (entry) {
> > +             csd = llist_entry(entry, typeof(*csd), node.llist);
> > +             csd_do_func(sched_ttwu_pending, entry, csd);
> > +     }
> >  }
> >
> >
> > @@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single);
> >
> >  /**
> >   * smp_call_function_single_async() - Run an asynchronous function on a
> > - *                            specific CPU.
> > + *                            specific CPU.
> >   * @cpu: The CPU to run on.
> >   * @csd: Pre-allocated and setup data structure
> >   *
> > @@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> >       int cpu, last_cpu, this_cpu = smp_processor_id();
> >       struct call_function_data *cfd;
> >       bool wait = scf_flags & SCF_WAIT;
> > -     int nr_cpus = 0, nr_queued = 0;
> > +     int nr_cpus = 0;
> >       bool run_remote = false;
> >       bool run_local = false;
> >
> > @@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> >                       csd->node.src = smp_processor_id();
> >                       csd->node.dst = cpu;
> >  #endif
> > +
> > +                     trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
> > +
> >                       if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
> >                               __cpumask_set_cpu(cpu, cfd->cpumask_ipi);
> >                               nr_cpus++;
> >                               last_cpu = cpu;
> >                       }
> > -                     nr_queued++;
> >               }
> >
> > -             /*
> > -              * Trace each smp_function_call_*() as an IPI, actual IPIs
> > -              * will be traced with func==generic_smp_call_function_single_ipi().
> > -              */
> > -             if (nr_queued)
> > -                     trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
> > -
> >               /*
> >                * Choose the most efficient way to send an IPI. Note that the
> >                * number of CPUs might be zero due to concurrent changes to the
> > @@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> >               unsigned long flags;
> >
> >               local_irq_save(flags);
> > -             func(info);
> > +             csd_do_func(func, info, NULL);
> >               local_irq_restore(flags);
> >       }
> >
> >
>
> I will rebase my patch on top of tip/smp/core and apply the suggested changes.
> A v2 will be send shortly.

RFC-v2 sent:
https://lore.kernel.org/lkml/20230510221513.93297-1-leobras@redhat.com/

>
> Thank you all for reviewing!
> Leo
>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-10 22:18           ` Leonardo Bras Soares Passos
@ 2023-05-10 23:05             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 18+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-05-10 23:05 UTC (permalink / raw)
  To: Valentin Schneider, Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, Daniel Bristot de Oliveira

On Wed, May 10, 2023 at 7:18 PM Leonardo Bras Soares Passos
<leobras@redhat.com> wrote:
>
> On Wed, May 10, 2023 at 5:27 PM Leonardo Brás <leobras@redhat.com> wrote:
> >
> > On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
> > > +Daniel
> > >
> > > On 03/05/23 16:59, Peter Zijlstra wrote:
> > > > On Wed, Apr 19, 2023 at 12:45:08AM -0300, Leonardo Brás wrote:
> > > > >
> > > > > IIUC the last commits add tracepoints that are collected in the
> > > > > requesting CPU, at the moment of scheduling the IPI, which are also useful in
> > > > > some scenarios.
> > > > >
> > > > > On my scenario, it could help a little, since it makes possible to filter what
> > > > > all other cpus are scheduling on the requested cpu. OTOH it could be also be
> > > > > misleading, as the requested cpu could be running something that was scheduled
> > > > > way before.
> > > > >
> > > > > The change I propose does exactly what my scenario need: track exactly which
> > > > > function was running at given time in the requested CPU. With this info, we can
> > > > > check which (if any) remotely requested function was running on given time
> > > > > window.
> > > >
> > > > I was thinking you could simply (graph)-trace
> > > > __flush_smp_call_function_queue() with a max_graph_depth or so (Steve
> > > > says that ought to work).
> > > >
> > > > But even that might be too specific, your use case sounds more like what
> > > > we have the irq-off latency tracer for, and that thing will immediately
> > > > tell you what functions were being ran.
> > > >
> > > > > (An unrelated thing I just thought: We could even use the commits you pointed
> > > > > together with my proposed change in order to measure how long does it take for a
> > > > > requested function to run / complete in the requested cpu)
> > > >
> > > > I don't think you could actually do that; the send tracepoints Valentin
> > > > added don't log the csd address, this means you cannot distinguish
> > > > two CSDs with the same function send from different CPUs.
> > > >
> > > > To do this you'd need to add the csd address to the the ipi_send
> > > > tracepoints and your own (possibly replacing info -- because I don't
> > > > think that's too useful).
> > > >
> > > > Valentin -- is any of this something you'd also find useful?
> > >
> > > Conceptually yeah, however:
> > >
> > > With isolcpus + NOHZ_FULL, *any* IPI sent to an isolated CPU is game over,
> > > you interrupt the userspace task and you eat the full meal that is
> > > NOHZ_FULL kernel entry. Pretty much any such IPI will show up when
> > > evaluating your setup with rteval/rlta/whatever, so having an event at
> > > IPI reception is redundant.
> > >
> > > IIUC Leonardo's looking at a setup with isolcpus but not necessarily
> > > NOHZ_FULL, so he wants to look at *which* IPIs (if any) are pushing the
> > > isolated task over its deadline/period. I would argue any IPI received
> > > there is at risk of doing that, so it's the same fight as with NOHZ_FULL.
> > >
> > > With that said, I suppose this could still be helpful for e.g. osnoise to
> > > hook into and point the finger at which CPU/context sent the problematic
> > > IPI. Or more generally, as Leonardo suggested, to measure CSD IPI delivery
> > > times.
> > >
> > > One thing though is that trace_ipi_send_cpu*() is not used solely for
> > > CSD's, cf. irq_work_raise() or smp_send_reschedule(). We might want to
> > > split that into e.g. trace_csd_queue_cpu*() + trace_ipi_send*().
> > >
> > > Something like so...
> > >
> > > ---
> > > diff --git a/include/trace/events/smp.h b/include/trace/events/smp.h
> > > new file mode 100644
> > > index 0000000000000..8fc725a2b45b9
> > > --- /dev/null
> > > +++ b/include/trace/events/smp.h
> > > @@ -0,0 +1,69 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#undef TRACE_SYSTEM
> > > +#define TRACE_SYSTEM smp
> > > +
> > > +#if !defined(_TRACE_SMP_H) || defined(TRACE_HEADER_MULTI_READ)
> > > +#define _TRACE_SMP_H
> > > +
> > > +#include <linux/tracepoint.h>
> > > +
> > > +TRACE_EVENT(csd_queue_cpu,
> > > +
> > > +     TP_PROTO(const unsigned int cpu,
> > > +              unsigned long callsite,
> > > +              smp_call_func_t func,
> > > +              call_single_data_t *csd),
> > > +
> > > +     TP_ARGS(cpu, callsite, func, csd),
> > > +
> > > +     TP_STRUCT__entry(
> > > +             __field(unsigned int, cpu)
> > > +             __field(void *, callsite)
> > > +             __field(void *, func)
> > > +             __field(void *, csd)
> > > +     ),
> > > +
> > > +     TP_fast_assign(
> > > +             __entry->cpu = cpu;
> > > +             __entry->callsite = (void *)callsite;
> > > +             __entry->func = func;
> > > +             __entry->csd  = csd;
> > > +     ),
> > > +
> > > +     TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> > > +               __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> > > +);
> >
> > This is for the caller side, right?
> >
> > > +
> > > +DECLARE_EVENT_CLASS(csd_function,
> > > +
> > > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > > +
> > > +     TP_ARGS(func, csd),
> > > +
> > > +     TP_STRUCT__entry(
> > > +             __field(void *, func)
> > > +             __field(void *, csd)
> > > +     ),
> > > +
> > > +     TP_fast_assign(
> > > +             __entry->func   = func;
> > > +             __entry->csd    = csd;
> > > +     ),
> > > +
> > > +     TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
> > > +);
> > > +
> > > +DEFINE_EVENT(csd_function, csd_function_entry,
> > > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > > +     TP_ARGS(func, csd)
> > > +);
> > > +
> > > +DEFINE_EVENT(csd_function, csd_function_exit,
> > > +     TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> > > +     TP_ARGS(func, csd)
> > > +);
> >
> > Oh, this is what event_class is for. Thanks for the example :)
> >
> > > +
> > > +#endif /* _TRACE_SMP_H */
> > > +
> > > +/* This part must be outside protection */
> > > +#include <trace/define_trace.h>
> > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > index ab3e5dad6cfe9..7d28db303e9bc 100644
> > > --- a/kernel/smp.c
> > > +++ b/kernel/smp.c
> > > @@ -27,6 +27,9 @@
> > >  #include <linux/jump_label.h>
> > >
> > >  #include <trace/events/ipi.h>
> > > +#define CREATE_TRACE_POINTS
> > > +#include <trace/events/smp.h>
> > > +#undef CREATE_TRACE_POINTS
> > >
> > >  #include "smpboot.h"
> > >  #include "sched/smp.h"
> > > @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
> > >       arch_send_call_function_ipi_mask(mask);
> > >  }
> > >
> > > +static __always_inline void
> > > +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
> > > +{
> > > +     trace_csd_function_entry(func, csd);
> > > +     func(info);
> > > +     trace_csd_function_exit(func, csd);
> > > +}
> > > +
> >
> > Good one, a helper to avoid calling those traces everywhere.
> >
> > Honest question:
> > Since info == csd->info and func == csd->func, we could just pass csd, right?
> > I suppose the suggestion on the 3-argument version is to use the values already
> > fetched from memory instead of fetching them again. Is that correct?
> >
> > >  #ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
> > >
> > >  static DEFINE_STATIC_KEY_MAYBE(CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT, csdlock_debug_enabled);
> > > @@ -329,7 +340,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
> > >        * even if we haven't sent the smp_call IPI yet (e.g. the stopper
> > >        * executes migration_cpu_stop() on the remote CPU).
> > >        */
> > > -     if (trace_ipi_send_cpu_enabled()) {
> > > +     if (trace_csd_queue_cpu_enabled()) {
> > >               call_single_data_t *csd;
> > >               smp_call_func_t func;
> > >
> > > @@ -337,7 +348,7 @@ void __smp_call_single_queue(int cpu, struct llist_node *node)
> > >               func = CSD_TYPE(csd) == CSD_TYPE_TTWU ?
> > >                       sched_ttwu_pending : csd->func;
> > >
> > > -             trace_ipi_send_cpu(cpu, _RET_IP_, func);
> > > +             trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
> > >       }
> > >
> > >       /*
> > > @@ -375,7 +386,7 @@ static int generic_exec_single(int cpu, struct __call_single_data *csd)
> > >               csd_lock_record(csd);
> > >               csd_unlock(csd);
> > >               local_irq_save(flags);
> > > -             func(info);
> > > +             csd_do_func(func, info, NULL);
> > >               csd_lock_record(NULL);
> > >               local_irq_restore(flags);
> > >               return 0;
> > > @@ -477,7 +488,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> > >                       }
> > >
> > >                       csd_lock_record(csd);
> > > -                     func(info);
> > > +                     csd_do_func(func, info, csd);
> > >                       csd_unlock(csd);
> > >                       csd_lock_record(NULL);
> > >               } else {
> > > @@ -508,7 +519,7 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> > >
> > >                               csd_lock_record(csd);
> > >                               csd_unlock(csd);
> > > -                             func(info);
> > > +                             csd_do_func(func, info, csd);
> > >                               csd_lock_record(NULL);
> > >                       } else if (type == CSD_TYPE_IRQ_WORK) {
> > >                               irq_work_single(csd);
> > > @@ -522,8 +533,10 @@ static void __flush_smp_call_function_queue(bool warn_cpu_offline)
> > >       /*
> > >        * Third; only CSD_TYPE_TTWU is left, issue those.
> > >        */
> > > -     if (entry)
> > > -             sched_ttwu_pending(entry);
> > > +     if (entry) {
> > > +             csd = llist_entry(entry, typeof(*csd), node.llist);
> > > +             csd_do_func(sched_ttwu_pending, entry, csd);
> > > +     }
> > >  }
> > >
> > >
> > > @@ -624,7 +637,7 @@ EXPORT_SYMBOL(smp_call_function_single);
> > >
> > >  /**
> > >   * smp_call_function_single_async() - Run an asynchronous function on a
> > > - *                            specific CPU.
> > > + *                            specific CPU.
> > >   * @cpu: The CPU to run on.
> > >   * @csd: Pre-allocated and setup data structure
> > >   *
> > > @@ -728,7 +741,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> > >       int cpu, last_cpu, this_cpu = smp_processor_id();
> > >       struct call_function_data *cfd;
> > >       bool wait = scf_flags & SCF_WAIT;
> > > -     int nr_cpus = 0, nr_queued = 0;
> > > +     int nr_cpus = 0;
> > >       bool run_remote = false;
> > >       bool run_local = false;
> > >
> > > @@ -786,21 +799,16 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> > >                       csd->node.src = smp_processor_id();
> > >                       csd->node.dst = cpu;
> > >  #endif
> > > +
> > > +                     trace_csd_queue_cpu(cpu, _RET_IP_, func, csd);
> > > +
> > >                       if (llist_add(&csd->node.llist, &per_cpu(call_single_queue, cpu))) {
> > >                               __cpumask_set_cpu(cpu, cfd->cpumask_ipi);
> > >                               nr_cpus++;
> > >                               last_cpu = cpu;
> > >                       }
> > > -                     nr_queued++;
> > >               }
> > >
> > > -             /*
> > > -              * Trace each smp_function_call_*() as an IPI, actual IPIs
> > > -              * will be traced with func==generic_smp_call_function_single_ipi().
> > > -              */
> > > -             if (nr_queued)
> > > -                     trace_ipi_send_cpumask(cfd->cpumask, _RET_IP_, func);
> > > -
> > >               /*
> > >                * Choose the most efficient way to send an IPI. Note that the
> > >                * number of CPUs might be zero due to concurrent changes to the
> > > @@ -816,7 +824,7 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
> > >               unsigned long flags;
> > >
> > >               local_irq_save(flags);
> > > -             func(info);
> > > +             csd_do_func(func, info, NULL);
> > >               local_irq_restore(flags);
> > >       }
> > >
> > >
> >
> > I will rebase my patch on top of tip/smp/core and apply the suggested changes.
> > A v2 will be send shortly.
>
> RFC-v2 sent:
> https://lore.kernel.org/lkml/20230510221513.93297-1-leobras@redhat.com/
>

Argh, I accidentally sent an unfinished patch, sorry about that.
Please disconsider above v2, and review the v3 instead:
https://lore.kernel.org/lkml/20230510230128.150384-1-leobras@redhat.com/


> >
> > Thank you all for reviewing!
> > Leo
> >
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-10 20:27         ` Leonardo Brás
  2023-05-10 22:18           ` Leonardo Bras Soares Passos
@ 2023-05-11  8:13           ` Valentin Schneider
  2023-05-11  9:25             ` Leonardo Bras Soares Passos
  1 sibling, 1 reply; 18+ messages in thread
From: Valentin Schneider @ 2023-05-11  8:13 UTC (permalink / raw)
  To: Leonardo Brás, Peter Zijlstra
  Cc: Steven Rostedt, Masami Hiramatsu, Thomas Gleixner, Yury Norov,
	Paul E. McKenney, Sebastian Andrzej Siewior, Nadav Amit,
	Zhen Lei, Chen Zhongjin, linux-kernel, linux-trace-kernel,
	Marcelo Tosatti, Daniel Bristot de Oliveira

On 10/05/23 17:27, Leonardo Brás wrote:
> On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
>> +TRACE_EVENT(csd_queue_cpu,
>> +
>> +	TP_PROTO(const unsigned int cpu,
>> +		 unsigned long callsite,
>> +		 smp_call_func_t func,
>> +		 call_single_data_t *csd),
>> +
>> +	TP_ARGS(cpu, callsite, func, csd),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(unsigned int, cpu)
>> +		__field(void *, callsite)
>> +		__field(void *, func)
>> +		__field(void *, csd)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->cpu = cpu;
>> +		__entry->callsite = (void *)callsite;
>> +		__entry->func = func;
>> +		__entry->csd  = csd;
>> +	),
>> +
>> +	TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
>> +		  __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
>> +);
>
> This is for the caller side, right?
>

Yep, see usage lower down.

>> +
>> +DECLARE_EVENT_CLASS(csd_function,
>> +
>> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> +
>> +	TP_ARGS(func, csd),
>> +
>> +	TP_STRUCT__entry(
>> +		__field(void *,	func)
>> +		__field(void *,	csd)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__entry->func	= func;
>> +		__entry->csd	= csd;
>> +	),
>> +
>> +	TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
>> +);
>> +
>> +DEFINE_EVENT(csd_function, csd_function_entry,
>> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> +	TP_ARGS(func, csd)
>> +);
>> +
>> +DEFINE_EVENT(csd_function, csd_function_exit,
>> +	TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
>> +	TP_ARGS(func, csd)
>> +);
>
> Oh, this is what event_class is for. Thanks for the example :)
>
>> +
>> +#endif /* _TRACE_SMP_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>> diff --git a/kernel/smp.c b/kernel/smp.c
>> index ab3e5dad6cfe9..7d28db303e9bc 100644
>> --- a/kernel/smp.c
>> +++ b/kernel/smp.c
>> @@ -27,6 +27,9 @@
>>  #include <linux/jump_label.h>
>>
>>  #include <trace/events/ipi.h>
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/smp.h>
>> +#undef CREATE_TRACE_POINTS
>>
>>  #include "smpboot.h"
>>  #include "sched/smp.h"
>> @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
>>      arch_send_call_function_ipi_mask(mask);
>>  }
>>
>> +static __always_inline void
>> +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
>> +{
>> +	trace_csd_function_entry(func, csd);
>> +	func(info);
>> +	trace_csd_function_exit(func, csd);
>> +}
>> +
>
> Good one, a helper to avoid calling those traces everywhere.
>
> Honest question:
> Since info == csd->info and func == csd->func, we could just pass csd, right?
> I suppose the suggestion on the 3-argument version is to use the values already
> fetched from memory instead of fetching them again. Is that correct?
>

There's also the special case of CSD_TYPE_TTWU where there is no csd->func,
instead we have an implicit func mapping to sched_ttwu_pending). I think
it's preferable to directly feed the right things to the TP than to
duplicate the "decoding" logic against the *csd passed as TP argument.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*()
  2023-05-11  8:13           ` Valentin Schneider
@ 2023-05-11  9:25             ` Leonardo Bras Soares Passos
  0 siblings, 0 replies; 18+ messages in thread
From: Leonardo Bras Soares Passos @ 2023-05-11  9:25 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Peter Zijlstra, Steven Rostedt, Masami Hiramatsu,
	Thomas Gleixner, Yury Norov, Paul E. McKenney,
	Sebastian Andrzej Siewior, Nadav Amit, Zhen Lei, Chen Zhongjin,
	linux-kernel, linux-trace-kernel, Marcelo Tosatti,
	Daniel Bristot de Oliveira

On Thu, May 11, 2023 at 5:13 AM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 10/05/23 17:27, Leonardo Brás wrote:
> > On Thu, 2023-05-04 at 12:59 +0100, Valentin Schneider wrote:
> >> +TRACE_EVENT(csd_queue_cpu,
> >> +
> >> +    TP_PROTO(const unsigned int cpu,
> >> +             unsigned long callsite,
> >> +             smp_call_func_t func,
> >> +             call_single_data_t *csd),
> >> +
> >> +    TP_ARGS(cpu, callsite, func, csd),
> >> +
> >> +    TP_STRUCT__entry(
> >> +            __field(unsigned int, cpu)
> >> +            __field(void *, callsite)
> >> +            __field(void *, func)
> >> +            __field(void *, csd)
> >> +    ),
> >> +
> >> +    TP_fast_assign(
> >> +            __entry->cpu = cpu;
> >> +            __entry->callsite = (void *)callsite;
> >> +            __entry->func = func;
> >> +            __entry->csd  = csd;
> >> +    ),
> >> +
> >> +    TP_printk("cpu=%u callsite=%pS func=%pS csd=%p",
> >> +              __entry->cpu, __entry->callsite, __entry->func, __entry->csd)
> >> +);
> >
> > This is for the caller side, right?
> >
>
> Yep, see usage lower down.
>
> >> +
> >> +DECLARE_EVENT_CLASS(csd_function,
> >> +
> >> +    TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> +
> >> +    TP_ARGS(func, csd),
> >> +
> >> +    TP_STRUCT__entry(
> >> +            __field(void *, func)
> >> +            __field(void *, csd)
> >> +    ),
> >> +
> >> +    TP_fast_assign(
> >> +            __entry->func   = func;
> >> +            __entry->csd    = csd;
> >> +    ),
> >> +
> >> +    TP_printk("func=%pS csd=%p", __entry->func, __entry->csd)
> >> +);
> >> +
> >> +DEFINE_EVENT(csd_function, csd_function_entry,
> >> +    TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> +    TP_ARGS(func, csd)
> >> +);
> >> +
> >> +DEFINE_EVENT(csd_function, csd_function_exit,
> >> +    TP_PROTO(smp_call_func_t func, call_single_data_t *csd),
> >> +    TP_ARGS(func, csd)
> >> +);
> >
> > Oh, this is what event_class is for. Thanks for the example :)
> >
> >> +
> >> +#endif /* _TRACE_SMP_H */
> >> +
> >> +/* This part must be outside protection */
> >> +#include <trace/define_trace.h>
> >> diff --git a/kernel/smp.c b/kernel/smp.c
> >> index ab3e5dad6cfe9..7d28db303e9bc 100644
> >> --- a/kernel/smp.c
> >> +++ b/kernel/smp.c
> >> @@ -27,6 +27,9 @@
> >>  #include <linux/jump_label.h>
> >>
> >>  #include <trace/events/ipi.h>
> >> +#define CREATE_TRACE_POINTS
> >> +#include <trace/events/smp.h>
> >> +#undef CREATE_TRACE_POINTS
> >>
> >>  #include "smpboot.h"
> >>  #include "sched/smp.h"
> >> @@ -121,6 +124,14 @@ send_call_function_ipi_mask(struct cpumask *mask)
> >>      arch_send_call_function_ipi_mask(mask);
> >>  }
> >>
> >> +static __always_inline void
> >> +csd_do_func(smp_call_func_t func, void *info, call_single_data_t *csd)
> >> +{
> >> +    trace_csd_function_entry(func, csd);
> >> +    func(info);
> >> +    trace_csd_function_exit(func, csd);
> >> +}
> >> +
> >
> > Good one, a helper to avoid calling those traces everywhere.
> >
> > Honest question:
> > Since info == csd->info and func == csd->func, we could just pass csd, right?
> > I suppose the suggestion on the 3-argument version is to use the values already
> > fetched from memory instead of fetching them again. Is that correct?
> >
>
> There's also the special case of CSD_TYPE_TTWU where there is no csd->func,
> instead we have an implicit func mapping to sched_ttwu_pending). I think
> it's preferable to directly feed the right things to the TP than to
> duplicate the "decoding" logic against the *csd passed as TP argument.
>

Quite interesting, thank you for sharing!

I sent a v3 which got a warning from the kernel testing robot. I
already solved the warning, so please provide feedback on the rest of
the patch.

About the warning:
It is an alignment error between 'struct __call_single_data' from
generic_exec_single() and 'call_single_data_t' from csd_do_func(): the
first is 8-byte aligned, and the second is 32-byte aligned according
to the typedef.

My first idea was to convert my patches' parameters from
call_single_data_t to 'struct __call_single_data', but then I found
out the 'struct' option allows splitting csd between 2 cachelines,
which is usually bad.
Then I decided to send a patchset [1] fixing generic_exec_single() and
its callers. If it's accepted, the warning will go away, and v3 will
merge cleanly.

If not, I will send a v4 changing the parameters.

Thanks for reviewing,
Leo


1. https://lore.kernel.org/lkml/20230511085836.579679-1-leobras@redhat.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-05-11  9:27 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-06  7:57 [RFC PATCH 1/1] smp: Add tracepoints for functions called with smp_call_function*() Leonardo Bras
2023-04-06  8:15 ` Sebastian Andrzej Siewior
2023-04-06  8:42   ` Leonardo Brás
2023-04-06 13:49     ` Steven Rostedt
2023-04-19  4:04       ` Leonardo Brás
2023-04-06  9:55 ` Peter Zijlstra
2023-04-19  3:45   ` Leonardo Brás
2023-05-03  4:23     ` Leonardo Brás
2023-05-03 14:59     ` Peter Zijlstra
2023-05-03 15:53       ` Leonardo Bras Soares Passos
2023-05-04 11:59       ` Valentin Schneider
2023-05-04 13:34         ` Steven Rostedt
2023-05-04 15:01         ` Peter Zijlstra
2023-05-10 20:27         ` Leonardo Brás
2023-05-10 22:18           ` Leonardo Bras Soares Passos
2023-05-10 23:05             ` Leonardo Bras Soares Passos
2023-05-11  8:13           ` Valentin Schneider
2023-05-11  9:25             ` Leonardo Bras Soares Passos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).