linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] tracepoint: Run tracepoints even after CPU is offline
@ 2018-08-06  3:44 Joel Fernandes (Google)
  2018-08-06 16:44 ` Steven Rostedt
  0 siblings, 1 reply; 2+ messages in thread
From: Joel Fernandes (Google) @ 2018-08-06  3:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Joel Fernandes (Google),
	Ingo Molnar, Steven Rostedt, Masami Hiramatsu, paulmck,
	mathieu.desnoyers, namhyung, peterz

Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
causes a problem for lockdep using tracepoint code. Once a CPU is
offline, tracepoints donot get called, however this causes a big problem
for lockdep probes that need to run so that IRQ annotations are marked
correctly.

An issue is possible where while the CPU is going offline, an interrupt
can come in and then a lockdep assert causes an annotation warning:

[  106.551354] IRQs not enabled as expected
[  106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
                                         tick_nohz_idle_enter+0x99/0xb0
[  106.552964] Modules linked in:
[  106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W

We need tracepoints to run as late as possible. This commit tries to fix
the issue by removing the cpu_online check in tracepoint code that was
introduced in the mentioned commit, however now we run the risk of
running dereferencing probes that aren't RCU protected, which gives an
RCU warning like so on boot up:
[    0.030159] x86: Booting SMP configuration:
[    0.030169] .... node  #0, CPUs:      #1
[    0.001000]
[    0.001000] =============================
[    0.001000] WARNING: suspicious RCU usage
[    0.001000] 4.18.0-rc6+ #42 Not tainted
[    0.001000] -----------------------------
[    0.001000] ./include/trace/events/timer.h:38 suspicious
				rcu_dereference_check() usage!
[    0.001000]
[    0.001000] other info that might help us debug this:
[    0.001000]
[    0.001000]
[    0.001000] RCU used illegally from offline CPU!
[    0.001000] rcu_scheduler_active = 1, debug_locks = 1
[    0.001000] no locks held by swapper/1/0.
[    0.001000]

Any ideas on how we can fix this? Basically we need RCU to work here
even after !cpu_online. I thought of just using SRCU for all tracepoints
however that may mean we can't use tracepoints from NMI..

Tries-to-Fix: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
                             unify their usage")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 include/linux/tracepoint.h | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d9a084c72541..020885714a0f 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -365,19 +365,17 @@ extern void syscall_unregfunc(void);
  * "void *__data, proto" as the callback prototype.
  */
 #define DECLARE_TRACE_NOARGS(name)					\
-	__DECLARE_TRACE(name, void, ,					\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, void, , 1,				\
 			void *__data, __data)
 
 #define DECLARE_TRACE(name, proto, args)				\
-	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,		\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 
 #define DECLARE_TRACE_CONDITION(name, proto, args, cond)		\
 	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
+			PARAMS(cond),					\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 
-- 
2.18.0.597.ga71716f1ad-goog


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC] tracepoint: Run tracepoints even after CPU is offline
  2018-08-06  3:44 [RFC] tracepoint: Run tracepoints even after CPU is offline Joel Fernandes (Google)
@ 2018-08-06 16:44 ` Steven Rostedt
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Rostedt @ 2018-08-06 16:44 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, kernel-team, Ingo Molnar, Masami Hiramatsu,
	paulmck, mathieu.desnoyers, namhyung, peterz

On Sun,  5 Aug 2018 20:44:37 -0700
"Joel Fernandes (Google)" <joel@joelfernandes.org> wrote:

> Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
> causes a problem for lockdep using tracepoint code. Once a CPU is
> offline, tracepoints donot get called, however this causes a big problem
> for lockdep probes that need to run so that IRQ annotations are marked
> correctly.
> 
> An issue is possible where while the CPU is going offline, an interrupt
> can come in and then a lockdep assert causes an annotation warning:
> 
> [  106.551354] IRQs not enabled as expected
> [  106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
>                                          tick_nohz_idle_enter+0x99/0xb0
> [  106.552964] Modules linked in:
> [  106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W
> 
> We need tracepoints to run as late as possible. This commit tries to fix
> the issue by removing the cpu_online check in tracepoint code that was
> introduced in the mentioned commit, however now we run the risk of
> running dereferencing probes that aren't RCU protected, which gives an
> RCU warning like so on boot up:
> [    0.030159] x86: Booting SMP configuration:
> [    0.030169] .... node  #0, CPUs:      #1
> [    0.001000]
> [    0.001000] =============================
> [    0.001000] WARNING: suspicious RCU usage
> [    0.001000] 4.18.0-rc6+ #42 Not tainted
> [    0.001000] -----------------------------
> [    0.001000] ./include/trace/events/timer.h:38 suspicious
> 				rcu_dereference_check() usage!
> [    0.001000]
> [    0.001000] other info that might help us debug this:
> [    0.001000]
> [    0.001000]
> [    0.001000] RCU used illegally from offline CPU!
> [    0.001000] rcu_scheduler_active = 1, debug_locks = 1
> [    0.001000] no locks held by swapper/1/0.
> [    0.001000]
> 
> Any ideas on how we can fix this? Basically we need RCU to work here
> even after !cpu_online. I thought of just using SRCU for all tracepoints
> however that may mean we can't use tracepoints from NMI..
> 
> Tries-to-Fix: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
>                              unify their usage")
> Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  include/linux/tracepoint.h | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index d9a084c72541..020885714a0f 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -365,19 +365,17 @@ extern void syscall_unregfunc(void);
>   * "void *__data, proto" as the callback prototype.
>   */
>  #define DECLARE_TRACE_NOARGS(name)					\
> -	__DECLARE_TRACE(name, void, ,					\
> -			cpu_online(raw_smp_processor_id()),		\
> +	__DECLARE_TRACE(name, void, , 1,				\
>  			void *__data, __data)
>  
>  #define DECLARE_TRACE(name, proto, args)				\
> -	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
> -			cpu_online(raw_smp_processor_id()),		\
> +	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,		\
>  			PARAMS(void *__data, proto),			\
>  			PARAMS(__data, args))
>  
>  #define DECLARE_TRACE_CONDITION(name, proto, args, cond)		\
>  	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
> -			cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
> +			PARAMS(cond),					\
>  			PARAMS(void *__data, proto),			\
>  			PARAMS(__data, args))
>  

Actually, I took a look at it too, and came up with this patch. Can you
test it out? The difference is that it doesn't stop the _rcuidle
version of the trace events to not be called by "cpu_online".

Which brings up another question. Can srcu work on cpu offlined CPUs?

-- Steve

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index d9a084c72541..4aba6c807d28 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -211,8 +211,11 @@ extern void syscall_unregfunc(void);
 			__DO_TRACE(&__tracepoint_##name,		\
 				TP_PROTO(data_proto),			\
 				TP_ARGS(data_args),			\
+				cpu_online(raw_smp_processor_id()) &&	\
 				TP_CONDITION(cond), 0);			\
-		if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {		\
+		if (IS_ENABLED(CONFIG_LOCKDEP) &&			\
+		    cpu_online(raw_smp_processor_id()) &&		\
+		    (cond)) {						\
 			rcu_read_lock_sched_notrace();			\
 			rcu_dereference_sched(__tracepoint_##name.funcs);\
 			rcu_read_unlock_sched_notrace();		\
@@ -365,19 +368,17 @@ extern void syscall_unregfunc(void);
  * "void *__data, proto" as the callback prototype.
  */
 #define DECLARE_TRACE_NOARGS(name)					\
-	__DECLARE_TRACE(name, void, ,					\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, void, , 1,				\
 			void *__data, __data)
 
 #define DECLARE_TRACE(name, proto, args)				\
-	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()),		\
+	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1,		\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 
 #define DECLARE_TRACE_CONDITION(name, proto, args, cond)		\
 	__DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),		\
-			cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \
+			PARAMS(cond),					\
 			PARAMS(void *__data, proto),			\
 			PARAMS(__data, args))
 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-08-06 16:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-06  3:44 [RFC] tracepoint: Run tracepoints even after CPU is offline Joel Fernandes (Google)
2018-08-06 16:44 ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).