linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
@ 2019-07-29  1:07 Eiichi Tsukata
  2019-07-29  4:25 ` Andy Lutomirski
  2019-07-29 15:21 ` Steven Rostedt
  0 siblings, 2 replies; 9+ messages in thread
From: Eiichi Tsukata @ 2019-07-29  1:07 UTC (permalink / raw)
  To: joel, paulmck, tglx, peterz, rostedt, mingo, fweisbec, luto,
	linux-kernel
  Cc: Eiichi Tsukata

If context tracking is enabled, causing page fault in preemptirq
irq_enable or irq_disable events triggers the following RCU EQS warning.

Reproducer:

  // CONFIG_PREEMPTIRQ_EVENTS=y
  // CONFIG_CONTEXT_TRACKING=y
  // CONFIG_RCU_EQS_DEBUG=y
  # echo 1 > events/preemptirq/irq_disable/enable
  # echo 1 > options/userstacktrace

  WARNING: CPU: 0 PID: 2574 at kernel/rcu/tree.c:262 rcu_dynticks_eqs_exit+0x48/0x50
  CPU: 0 PID: 2574 Comm: sh Not tainted 5.3.0-rc1+ #105
  RIP: 0010:rcu_dynticks_eqs_exit+0x48/0x50
  Call Trace:
   rcu_eqs_exit+0x4e/0xd0
   rcu_user_exit+0x13/0x20
   __context_tracking_exit.part.0+0x74/0x120
   context_tracking_exit.part.0+0x28/0x50
   context_tracking_exit+0x1d/0x20
   do_page_fault+0xab/0x1b0
   do_async_page_fault+0x35/0xb0
   async_page_fault+0x3e/0x50
  RIP: 0010:arch_stack_walk_user+0x8e/0x100
   stack_trace_save_user+0x7d/0xa9
   trace_buffer_unlock_commit_regs+0x178/0x220
   trace_event_buffer_commit+0x6b/0x200
   trace_event_raw_event_preemptirq_template+0x7b/0xc0
   trace_hardirqs_off_caller+0xb3/0xf0
   trace_hardirqs_off_thunk+0x1a/0x20
   entry_SYSCALL_64_after_hwframe+0x3e/0xbe

Details of call trace and RCU EQS/Context:

  entry_SYSCALL_64_after_hwframe() EQS: IN, CTX: USER
    trace_irq_disable_rcuidle()
      rcu_irq_enter_irqson()
        rcu_dynticks_eqs_exit() EQS: IN => OUT
      stack_trace_save_user() EQS: OUT, CTX: USER
        page_fault()
          do_page_fault()
            exception_enter() EQS: OUT, CTX: USER
              context_tracking_exit()
                rcu_eqs_exit()
		  rcu_dynticks_eqs_exit() EQS: OUT => OUT? (warning)

trace_irq_disable/enable_rcuidle() are called from user mode in entry
code, and calls rcu_irq_enter_irqson() in __DO_TRACE(). This can cause
the state "RCU ESQ: OUT but CTX: USER", then stack_trace_save_user() can
cause page fault which calls rcu_dynticks_eqs_exit() again leads to hit
the EQS validation warning if CONFIG_RCU_EQS_DEBUG is enabled.

Fix it by calling exception_enter/exit() around
trace_irq_disable/enable_rcuidle() to enter CONTEXT_KERNEL before
tracing code causes page fault. Also makes the timing of state change to
CONTEXT_KERNL earlier to prevent tracing codes from calling
context_tracking_exit() recursively.

Ideally, the problem can be fixed by calling enter_from_user_mode() before
TRACE_IRQS_OFF in entry codes (then we need to tell lockdep that IRQs are
off eariler) and calling prepare_exit_to_usermode() after TRACE_IRQS_ON.
But this patch will be much simpler and limit the most change to tracing
codes.

Fixes: 865e63b04e9b ("tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints")
Signed-off-by: Eiichi Tsukata <devel@etsukata.com>
---
 kernel/context_tracking.c       |  6 +++++-
 kernel/trace/trace_preemptirq.c | 15 +++++++++++++--
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index be01a4d627c9..860eaf9780e5 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -148,6 +148,11 @@ void __context_tracking_exit(enum ctx_state state)
 		return;
 
 	if (__this_cpu_read(context_tracking.state) == state) {
+		/*
+		 * Change state before executing codes which can trigger
+		 * page fault leading unnecessary re-entrance.
+		 */
+		__this_cpu_write(context_tracking.state, CONTEXT_KERNEL);
 		if (__this_cpu_read(context_tracking.active)) {
 			/*
 			 * We are going to run code that may use RCU. Inform
@@ -159,7 +164,6 @@ void __context_tracking_exit(enum ctx_state state)
 				trace_user_exit(0);
 			}
 		}
-		__this_cpu_write(context_tracking.state, CONTEXT_KERNEL);
 	}
 	context_tracking_recursion_exit();
 }
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 4d8e99fdbbbe..031b51cb94d0 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
+#include <linux/context_tracking.h>
 #include "trace.h"
 
 #define CREATE_TRACE_POINTS
@@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
 
 __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 {
+	enum ctx_state prev_state;
+
 	if (this_cpu_read(tracing_irq_cpu)) {
-		if (!in_nmi())
+		if (!in_nmi()) {
+			prev_state = exception_enter();
 			trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+			exception_exit(prev_state);
+		}
 		tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
 		this_cpu_write(tracing_irq_cpu, 0);
 	}
@@ -63,11 +69,16 @@ NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
 
 __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
 {
+	enum ctx_state prev_state;
+
 	if (!this_cpu_read(tracing_irq_cpu)) {
 		this_cpu_write(tracing_irq_cpu, 1);
 		tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
-		if (!in_nmi())
+		if (!in_nmi()) {
+			prev_state = exception_enter();
 			trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+			exception_exit(prev_state);
+		}
 	}
 
 	lockdep_hardirqs_off(CALLER_ADDR0);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-29  1:07 [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events Eiichi Tsukata
@ 2019-07-29  4:25 ` Andy Lutomirski
  2019-07-29 10:29   ` Peter Zijlstra
  2019-07-29 15:21 ` Steven Rostedt
  1 sibling, 1 reply; 9+ messages in thread
From: Andy Lutomirski @ 2019-07-29  4:25 UTC (permalink / raw)
  To: Eiichi Tsukata
  Cc: Joel Fernandes, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Steven Rostedt, Ingo Molnar, Frederic Weisbecker,
	LKML

On Sun, Jul 28, 2019 at 6:08 PM Eiichi Tsukata <devel@etsukata.com> wrote:
>
> If context tracking is enabled, causing page fault in preemptirq
> irq_enable or irq_disable events triggers the following RCU EQS warning.
>

Yuck.

> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index be01a4d627c9..860eaf9780e5 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -148,6 +148,11 @@ void __context_tracking_exit(enum ctx_state state)
>                 return;
>
>         if (__this_cpu_read(context_tracking.state) == state) {
> +               /*
> +                * Change state before executing codes which can trigger
> +                * page fault leading unnecessary re-entrance.
> +                */
> +               __this_cpu_write(context_tracking.state, CONTEXT_KERNEL);

Seems reasonable.

>                 if (__this_cpu_read(context_tracking.active)) {
>                         /*
>                          * We are going to run code that may use RCU. Inform
> @@ -159,7 +164,6 @@ void __context_tracking_exit(enum ctx_state state)
>                                 trace_user_exit(0);
>                         }
>                 }
> -               __this_cpu_write(context_tracking.state, CONTEXT_KERNEL);
>         }
>         context_tracking_recursion_exit();
>  }
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> index 4d8e99fdbbbe..031b51cb94d0 100644
> --- a/kernel/trace/trace_preemptirq.c
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -10,6 +10,7 @@
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
>  #include <linux/kprobes.h>
> +#include <linux/context_tracking.h>
>  #include "trace.h"
>
>  #define CREATE_TRACE_POINTS
> @@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
>
>  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
>  {
> +       enum ctx_state prev_state;
> +
>         if (this_cpu_read(tracing_irq_cpu)) {
> -               if (!in_nmi())
> +               if (!in_nmi()) {
> +                       prev_state = exception_enter();
>                         trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> +                       exception_exit(prev_state);
> +               }
>                 tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
>                 this_cpu_write(tracing_irq_cpu, 0);
>         }

This seems a bit distressing.  Now we're going to do a whole bunch of
context tracking transitions for each kernel entry.  Would a better
fix me to change trace_hardirqs_on_caller to skip the trace event if
the previous state was already IRQs on and, more importantly, to skip
tracing IRQs off if IRQs were already off?  The x86 code is very
careful to avoid ever having IRQs on and CONTEXT_USER at the same
time.

--Andy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-29  4:25 ` Andy Lutomirski
@ 2019-07-29 10:29   ` Peter Zijlstra
  2019-07-30  1:50     ` Eiichi Tsukata
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2019-07-29 10:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eiichi Tsukata, Joel Fernandes, Paul E. McKenney,
	Thomas Gleixner, Steven Rostedt, Ingo Molnar,
	Frederic Weisbecker, LKML

On Sun, Jul 28, 2019 at 09:25:58PM -0700, Andy Lutomirski wrote:
> On Sun, Jul 28, 2019 at 6:08 PM Eiichi Tsukata <devel@etsukata.com> wrote:

> > diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> > index 4d8e99fdbbbe..031b51cb94d0 100644
> > --- a/kernel/trace/trace_preemptirq.c
> > +++ b/kernel/trace/trace_preemptirq.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/module.h>
> >  #include <linux/ftrace.h>
> >  #include <linux/kprobes.h>
> > +#include <linux/context_tracking.h>
> >  #include "trace.h"
> >
> >  #define CREATE_TRACE_POINTS
> > @@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
> >
> >  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> >  {
> > +       enum ctx_state prev_state;
> > +
> >         if (this_cpu_read(tracing_irq_cpu)) {
> > -               if (!in_nmi())
> > +               if (!in_nmi()) {
> > +                       prev_state = exception_enter();
> >                         trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> > +                       exception_exit(prev_state);
> > +               }
> >                 tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
> >                 this_cpu_write(tracing_irq_cpu, 0);
> >         }
> 
> This seems a bit distressing.  Now we're going to do a whole bunch of
> context tracking transitions for each kernel entry.  Would a better
> fix me to change trace_hardirqs_on_caller to skip the trace event if
> the previous state was already IRQs on and, more importantly, to skip
> tracing IRQs off if IRQs were already off?  The x86 code is very
> careful to avoid ever having IRQs on and CONTEXT_USER at the same
> time.

I think they already (try to) do that; see 'tracing_irq_cpu'.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-29  1:07 [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events Eiichi Tsukata
  2019-07-29  4:25 ` Andy Lutomirski
@ 2019-07-29 15:21 ` Steven Rostedt
  2019-07-30  2:00   ` Eiichi Tsukata
  1 sibling, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2019-07-29 15:21 UTC (permalink / raw)
  To: Eiichi Tsukata
  Cc: joel, paulmck, tglx, peterz, mingo, fweisbec, luto, linux-kernel

On Mon, 29 Jul 2019 10:07:34 +0900
Eiichi Tsukata <devel@etsukata.com> wrote:

> If context tracking is enabled, causing page fault in preemptirq
> irq_enable or irq_disable events triggers the following RCU EQS warning.
> 
> Reproducer:
> 
>   // CONFIG_PREEMPTIRQ_EVENTS=y
>   // CONFIG_CONTEXT_TRACKING=y
>   // CONFIG_RCU_EQS_DEBUG=y
>   # echo 1 > events/preemptirq/irq_disable/enable
>   # echo 1 > options/userstacktrace

So the problem is only with userstacktrace enabled?


>  }
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> index 4d8e99fdbbbe..031b51cb94d0 100644
> --- a/kernel/trace/trace_preemptirq.c
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -10,6 +10,7 @@
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
>  #include <linux/kprobes.h>
> +#include <linux/context_tracking.h>
>  #include "trace.h"
>  
>  #define CREATE_TRACE_POINTS
> @@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
>  
>  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
>  {
> +	enum ctx_state prev_state;
> +
>  	if (this_cpu_read(tracing_irq_cpu)) {
> -		if (!in_nmi())
> +		if (!in_nmi()) {

This is a very high fast path (for tracing irqs off and such). Instead
of adding a check here for a case that is seldom used (userstacktrace
and tracing irqs on/off). Move this to surround the userstack trace
code.

-- Steve


> +			prev_state = exception_enter();
>  			trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
> +			exception_exit(prev_state);
> +		}
>  		tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
>  		this_cpu_write(tracing_irq_cpu, 0);
>  	}
> @@ -63,11 +69,16 @@ NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
>  
>  __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
>  {
> +	enum ctx_state prev_state;
> +
>  	if (!this_cpu_read(tracing_irq_cpu)) {
>  		this_cpu_write(tracing_irq_cpu, 1);
>  		tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
> -		if (!in_nmi())
> +		if (!in_nmi()) {
> +			prev_state = exception_enter();
>  			trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
> +			exception_exit(prev_state);
> +		}
>  	}
>  
>  	lockdep_hardirqs_off(CALLER_ADDR0);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-29 10:29   ` Peter Zijlstra
@ 2019-07-30  1:50     ` Eiichi Tsukata
  2019-07-30  1:59       ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Eiichi Tsukata @ 2019-07-30  1:50 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Joel Fernandes, Paul E. McKenney, Thomas Gleixner,
	Steven Rostedt, Ingo Molnar, Frederic Weisbecker, LKML

Thanks for comments.

On 2019/07/29 19:29, Peter Zijlstra wrote:
> On Sun, Jul 28, 2019 at 09:25:58PM -0700, Andy Lutomirski wrote:
>> On Sun, Jul 28, 2019 at 6:08 PM Eiichi Tsukata <devel@etsukata.com> wrote:
> 
>>> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
>>> index 4d8e99fdbbbe..031b51cb94d0 100644
>>> --- a/kernel/trace/trace_preemptirq.c
>>> +++ b/kernel/trace/trace_preemptirq.c
>>> @@ -10,6 +10,7 @@
>>>  #include <linux/module.h>
>>>  #include <linux/ftrace.h>
>>>  #include <linux/kprobes.h>
>>> +#include <linux/context_tracking.h>
>>>  #include "trace.h"
>>>
>>>  #define CREATE_TRACE_POINTS
>>> @@ -49,9 +50,14 @@ NOKPROBE_SYMBOL(trace_hardirqs_off);
>>>
>>>  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
>>>  {
>>> +       enum ctx_state prev_state;
>>> +
>>>         if (this_cpu_read(tracing_irq_cpu)) {
>>> -               if (!in_nmi())
>>> +               if (!in_nmi()) {
>>> +                       prev_state = exception_enter();
>>>                         trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
>>> +                       exception_exit(prev_state);
>>> +               }
>>>                 tracer_hardirqs_on(CALLER_ADDR0, caller_addr);
>>>                 this_cpu_write(tracing_irq_cpu, 0);
>>>         }
>>
>> This seems a bit distressing.  Now we're going to do a whole bunch of
>> context tracking transitions for each kernel entry.  Would a better>> fix me to change trace_hardirqs_on_caller to skip the trace event if
>> the previous state was already IRQs on and, more importantly, to skip
>> tracing IRQs off if IRQs were already off?  The x86 code is very
>> careful to avoid ever having IRQs on and CONTEXT_USER at the same
>> time.
> 
> I think they already (try to) do that; see 'tracing_irq_cpu'.
> 

Or you mean something like this?
As for trace_hardirqs_off_caller:

diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 4d8e99fdbbbe..d39478bcf0f2 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -66,7 +66,7 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
        if (!this_cpu_read(tracing_irq_cpu)) {
                this_cpu_write(tracing_irq_cpu, 1);
                tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
-               if (!in_nmi())
+               if (!in_nmi() && !irqs_disabled())
                        trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
        }

Or

diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 4d8e99fdbbbe..e08c5c6ff2b3 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -66,8 +66,6 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
        if (!this_cpu_read(tracing_irq_cpu)) {
                this_cpu_write(tracing_irq_cpu, 1);
                tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
-               if (!in_nmi())
-                       trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
        }


As for trace_hardirqs_on_caller, it is called when IRQs off and CONTEXT_USER.
So even though we skipped the trace event if the previous state was already IRQs on,
we will fall into the same situation.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-30  1:50     ` Eiichi Tsukata
@ 2019-07-30  1:59       ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2019-07-30  1:59 UTC (permalink / raw)
  To: Eiichi Tsukata
  Cc: Peter Zijlstra, Andy Lutomirski, Joel Fernandes,
	Paul E. McKenney, Thomas Gleixner, Ingo Molnar,
	Frederic Weisbecker, LKML

On Tue, 30 Jul 2019 10:50:36 +0900
Eiichi Tsukata <devel@etsukata.com> wrote:

> > 
> > I think they already (try to) do that; see 'tracing_irq_cpu'.
> >   
> 
> Or you mean something like this?
> As for trace_hardirqs_off_caller:

You missed what Peter said.

> 
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> index 4d8e99fdbbbe..d39478bcf0f2 100644
> --- a/kernel/trace/trace_preemptirq.c
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -66,7 +66,7 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
>         if (!this_cpu_read(tracing_irq_cpu)) {
                             ^^^^^^^^^^^^^^^
The above makes this called only the first time we disable interrupts.

>                 this_cpu_write(tracing_irq_cpu, 1);
>                 tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
> -               if (!in_nmi())
> +               if (!in_nmi() && !irqs_disabled())

This would always be false. This function is always called with irqs_disabled()!

So no, this is not what is meant.

>                         trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
>         }
> 
> Or
> 
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> index 4d8e99fdbbbe..e08c5c6ff2b3 100644
> --- a/kernel/trace/trace_preemptirq.c
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -66,8 +66,6 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
>         if (!this_cpu_read(tracing_irq_cpu)) {
>                 this_cpu_write(tracing_irq_cpu, 1);
>                 tracer_hardirqs_off(CALLER_ADDR0, caller_addr);
> -               if (!in_nmi())
> -                       trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);

And this just removes the tracepoint completely?

-- Steve

>         }
> 
> 
> As for trace_hardirqs_on_caller, it is called when IRQs off and CONTEXT_USER.
> So even though we skipped the trace event if the previous state was already IRQs on,
> we will fall into the same situation.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-29 15:21 ` Steven Rostedt
@ 2019-07-30  2:00   ` Eiichi Tsukata
  2019-07-30  2:15     ` Steven Rostedt
  0 siblings, 1 reply; 9+ messages in thread
From: Eiichi Tsukata @ 2019-07-30  2:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: joel, paulmck, tglx, peterz, mingo, fweisbec, luto, linux-kernel

Thanks for comments.

On 2019/07/30 0:21, Steven Rostedt wrote:
> On Mon, 29 Jul 2019 10:07:34 +0900
> Eiichi Tsukata <devel@etsukata.com> wrote:
> 
>> If context tracking is enabled, causing page fault in preemptirq
>> irq_enable or irq_disable events triggers the following RCU EQS warning.
>>
>> Reproducer:
>>
>>   // CONFIG_PREEMPTIRQ_EVENTS=y
>>   // CONFIG_CONTEXT_TRACKING=y
>>   // CONFIG_RCU_EQS_DEBUG=y
>>   # echo 1 > events/preemptirq/irq_disable/enable
>>   # echo 1 > options/userstacktrace
> 
> So the problem is only with userstacktrace enabled?

It can happen when tracing code causes page fault in preemptirq events.
For example, the following perf command also hit the warning:

  # perf record -e 'preemptirq:irq_enable' -g ls


>>  
>>  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
>>  {
>> +	enum ctx_state prev_state;
>> +
>>  	if (this_cpu_read(tracing_irq_cpu)) {
>> -		if (!in_nmi())
>> +		if (!in_nmi()) {
> 
> This is a very high fast path (for tracing irqs off and such). Instead
> of adding a check here for a case that is seldom used (userstacktrace
> and tracing irqs on/off). Move this to surround the userstack trace
> code.
> 
> -- Steve

If the problem was only with userstacktrace, it will be reasonable to
surround only the userstack unwinder. But the situation is similar to
the previous "tracing vs CR2" case. As Peter taught me in
https://lore.kernel.org/lkml/20190708074823.GV3402@hirez.programming.kicks-ass.net/
there are some other codes likely to to user access.
So I surround preemptirq events earlier.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-30  2:00   ` Eiichi Tsukata
@ 2019-07-30  2:15     ` Steven Rostedt
  2019-07-30  4:19       ` Joel Fernandes
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2019-07-30  2:15 UTC (permalink / raw)
  To: Eiichi Tsukata
  Cc: joel, paulmck, tglx, peterz, mingo, fweisbec, luto, linux-kernel

On Tue, 30 Jul 2019 11:00:42 +0900
Eiichi Tsukata <devel@etsukata.com> wrote:

> Thanks for comments.
> 
> On 2019/07/30 0:21, Steven Rostedt wrote:
> > On Mon, 29 Jul 2019 10:07:34 +0900
> > Eiichi Tsukata <devel@etsukata.com> wrote:
> >   
> >> If context tracking is enabled, causing page fault in preemptirq
> >> irq_enable or irq_disable events triggers the following RCU EQS warning.
> >>
> >> Reproducer:
> >>
> >>   // CONFIG_PREEMPTIRQ_EVENTS=y
> >>   // CONFIG_CONTEXT_TRACKING=y
> >>   // CONFIG_RCU_EQS_DEBUG=y
> >>   # echo 1 > events/preemptirq/irq_disable/enable
> >>   # echo 1 > options/userstacktrace  
> > 
> > So the problem is only with userstacktrace enabled?  
> 
> It can happen when tracing code causes page fault in preemptirq events.
> For example, the following perf command also hit the warning:
> 
>   # perf record -e 'preemptirq:irq_enable' -g ls

Again,

That's not a irq trace event issue, that's a stack trace issue.

> 
> 
> >>  
> >>  __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
> >>  {
> >> +	enum ctx_state prev_state;
> >> +
> >>  	if (this_cpu_read(tracing_irq_cpu)) {
> >> -		if (!in_nmi())
> >> +		if (!in_nmi()) {  
> > 
> > This is a very high fast path (for tracing irqs off and such). Instead
> > of adding a check here for a case that is seldom used (userstacktrace
> > and tracing irqs on/off). Move this to surround the userstack trace
> > code.
> > 
> > -- Steve  
> 
> If the problem was only with userstacktrace, it will be reasonable to
> surround only the userstack unwinder. But the situation is similar to
> the previous "tracing vs CR2" case. As Peter taught me in
> https://lore.kernel.org/lkml/20190708074823.GV3402@hirez.programming.kicks-ass.net/
> there are some other codes likely to to user access.
> So I surround preemptirq events earlier.

I disagree. The issue is with the attached callbacks that call
something (a stack unwinder) that can fault.

This is called whenever irqs are disabled. I say we surround the
culprit (the stack unwinder or stack trace) and not punish the irq
enable/disable events.

So NAK on this patch.

-- Steve


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events
  2019-07-30  2:15     ` Steven Rostedt
@ 2019-07-30  4:19       ` Joel Fernandes
  0 siblings, 0 replies; 9+ messages in thread
From: Joel Fernandes @ 2019-07-30  4:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eiichi Tsukata, Paul E. McKenney, Thomas Glexiner, Peter Zilstra,
	Ingo Molnar, Frederic Weisbecker, Luto, LKML

On Mon, Jul 29, 2019 at 10:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
[snip]
> > If the problem was only with userstacktrace, it will be reasonable to
> > surround only the userstack unwinder. But the situation is similar to
> > the previous "tracing vs CR2" case. As Peter taught me in
> > https://lore.kernel.org/lkml/20190708074823.GV3402@hirez.programming.kicks-ass.net/
> > there are some other codes likely to to user access.
> > So I surround preemptirq events earlier.
>
> I disagree. The issue is with the attached callbacks that call
> something (a stack unwinder) that can fault.
>
> This is called whenever irqs are disabled. I say we surround the
> culprit (the stack unwinder or stack trace) and not punish the irq
> enable/disable events.

I agree with everything Steve said.

thanks,

 - Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-30  4:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29  1:07 [PATCH] tracing: Prevent RCU EQS breakage in preemptirq events Eiichi Tsukata
2019-07-29  4:25 ` Andy Lutomirski
2019-07-29 10:29   ` Peter Zijlstra
2019-07-30  1:50     ` Eiichi Tsukata
2019-07-30  1:59       ` Steven Rostedt
2019-07-29 15:21 ` Steven Rostedt
2019-07-30  2:00   ` Eiichi Tsukata
2019-07-30  2:15     ` Steven Rostedt
2019-07-30  4:19       ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).