All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] kernel/trace: fix watchdog soft lockup
@ 2018-11-28  8:13 Anders Roxell
  2018-11-28 14:09 ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Anders Roxell @ 2018-11-28  8:13 UTC (permalink / raw)
  To: rostedt, mingo; +Cc: akpm, dvyukov, linux-kernel, Anders Roxell, Arnd Bergmann

When building a allmodconfig kernel for arm64 and boot that in qemu,
CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
watchdog expires and prints out a message like this:
'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
Each time the function ftrace_replace_code gets called it stays in that
functions loop for 41424 times.
Rework so that function cond_resched() gets called in the
ftrace_replace_code loop.

Co-developed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
 kernel/trace/ftrace.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5b4f73e4fd56..3f456921dedf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable)
 
 	do_for_each_ftrace_rec(pg, rec) {
 
+		/* This loop can take minutes when sanitizers are enabled, so
+		 * lets make sure we allow RCU processing.
+		 */
+		cond_resched();
 		if (rec->flags & FTRACE_FL_DISABLED)
 			continue;
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] kernel/trace: fix watchdog soft lockup
  2018-11-28  8:13 [PATCH 2/2] kernel/trace: fix watchdog soft lockup Anders Roxell
@ 2018-11-28 14:09 ` Steven Rostedt
  2018-11-28 22:24   ` Arnd Bergmann
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2018-11-28 14:09 UTC (permalink / raw)
  To: Anders Roxell; +Cc: mingo, akpm, dvyukov, linux-kernel, Arnd Bergmann

On Wed, 28 Nov 2018 09:13:34 +0100
Anders Roxell <anders.roxell@linaro.org> wrote:

> When building a allmodconfig kernel for arm64 and boot that in qemu,
> CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
> watchdog expires and prints out a message like this:
> 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
> Each time the function ftrace_replace_code gets called it stays in that
> functions loop for 41424 times.
> Rework so that function cond_resched() gets called in the
> ftrace_replace_code loop.
> 
> Co-developed-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
>  kernel/trace/ftrace.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 5b4f73e4fd56..3f456921dedf 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable)
>  
>  	do_for_each_ftrace_rec(pg, rec) {
>  
> +		/* This loop can take minutes when sanitizers are enabled, so
> +		 * lets make sure we allow RCU processing.
> +		 */
> +		cond_resched();
>  		if (rec->flags & FTRACE_FL_DISABLED)
>  			continue;
>  

NACK.  On some architectures this code is run from stop machine. We
can't call cond_resched() because it may be called with interrupts
disabled.

This is a weak function. If arm64 has special needs, just copy it in
the arm64 code.

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] kernel/trace: fix watchdog soft lockup
  2018-11-28 14:09 ` Steven Rostedt
@ 2018-11-28 22:24   ` Arnd Bergmann
  2018-11-29  3:24     ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Arnd Bergmann @ 2018-11-28 22:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Anders Roxell, Ingo Molnar, Andrew Morton, Dmitry Vyukov,
	Linux Kernel Mailing List

On Wed, Nov 28, 2018 at 3:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 28 Nov 2018 09:13:34 +0100
> Anders Roxell <anders.roxell@linaro.org> wrote:
>
> > When building a allmodconfig kernel for arm64 and boot that in qemu,
> > CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
> > watchdog expires and prints out a message like this:
> > 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
> > Each time the function ftrace_replace_code gets called it stays in that
> > functions loop for 41424 times.
> > Rework so that function cond_resched() gets called in the
> > ftrace_replace_code loop.
> >
> > Co-developed-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> > ---
> >  kernel/trace/ftrace.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 5b4f73e4fd56..3f456921dedf 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable)
> >
> >       do_for_each_ftrace_rec(pg, rec) {
> >
> > +             /* This loop can take minutes when sanitizers are enabled, so
> > +              * lets make sure we allow RCU processing.
> > +              */
> > +             cond_resched();
> >               if (rec->flags & FTRACE_FL_DISABLED)
> >                       continue;
> >
>
> NACK.  On some architectures this code is run from stop machine. We
> can't call cond_resched() because it may be called with interrupts
> disabled.
>
> This is a weak function. If arm64 has special needs, just copy it in
> the arm64 code.

I think it's currently broken on all architectures that don't already
override it, the problem being that the function is simply too
expensive when all debug options are enabled.

In an ARM64 allmodconfig kernel, there are 41424 records
that we iterate through several times. In an earlier version of the
test, the cond_resched() was only in the loop in
init_trace_selftests(), and I think that is safe and should /mostly/
solve the problem, so maybe Anders can submit that version again.

However, at least trace_selftest_ops() still takes half a minute
to complete in qemu, and that triggers the softlockup watchdog.
trace_selftest_ops() calls ftrace_replace_code() four or five times.

Here is the excerpt with printk times from one of Anders' tests:

[    8.350607] Running postponed tracer tests:
[    8.356045] Testing tracer function:
[   18.932077] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   27.454205] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   27.462594] PASSED
[   27.462954] Testing dynamic ftrace:
[   28.510903] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   28.746934] PASSED
[   28.747469] Testing dynamic ftrace ops #1:
[   32.488427] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   32.501864] (1 0 1 0 0)
[   32.502041] (1 1 2 0 0)
[   50.213914] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   50.219736] (2 1 3 0 1066085)
[   50.220077] (2 2 4 0 1066100)
[   60.580678] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   60.758019] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   60.910501] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   60.918354] PASSED
[   60.919672] Testing dynamic ftrace ops #2:
[   64.680222] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   64.843430] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   81.247068] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   81.250895] (1 0 1 1033119 0)
[   81.251186] (1 1 2 1033134 0)
[   81.343168] (2 1 3 1 3732)
[   81.344492] (2 2 4 118 3849)
[   89.837665] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   89.844371] PASSED
[   89.844719] Testing ftrace recursion:
[   90.890373] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   91.042146] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   91.048475] PASSED
[   91.048806] Testing ftrace recursion safe:
[   92.091174] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   92.242403] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   92.249119] PASSED
[   92.249470] Testing ftrace regs(no arch support):
[   93.293605] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   93.444942] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[   93.451738] PASSED
[   93.452300] Testing tracer nop: PASSED
[   93.453288] Testing tracer irqsoff:
[  104.486368] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[  112.918828] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[  112.925809] PASSED
[  112.926435] Testing tracer function_graph:
[  123.303248] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[  132.599763] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
[  132.607614] PASSED

In particular, the test_probe3 in trace_selftest_ops() takes
around 20 seconds, or 482 microseconds per loop iteration
in ftrace_replace_code().
Do you think there is another bug that makes it slower than
expected, or is that a reasonable time that it could take?

       Arnd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] kernel/trace: fix watchdog soft lockup
  2018-11-28 22:24   ` Arnd Bergmann
@ 2018-11-29  3:24     ` Steven Rostedt
  0 siblings, 0 replies; 4+ messages in thread
From: Steven Rostedt @ 2018-11-29  3:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Anders Roxell, Ingo Molnar, Andrew Morton, Dmitry Vyukov,
	Linux Kernel Mailing List

On Wed, 28 Nov 2018 23:24:26 +0100
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wed, Nov 28, 2018 at 3:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 28 Nov 2018 09:13:34 +0100
> > Anders Roxell <anders.roxell@linaro.org> wrote:
> >  
> > > When building a allmodconfig kernel for arm64 and boot that in qemu,
> > > CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
> > > watchdog expires and prints out a message like this:
> > > 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
> > > Each time the function ftrace_replace_code gets called it stays in that
> > > functions loop for 41424 times.
> > > Rework so that function cond_resched() gets called in the
> > > ftrace_replace_code loop.
> > >
> > > Co-developed-by: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> > > ---
> > >  kernel/trace/ftrace.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > > index 5b4f73e4fd56..3f456921dedf 100644
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable)
> > >
> > >       do_for_each_ftrace_rec(pg, rec) {
> > >
> > > +             /* This loop can take minutes when sanitizers are enabled, so
> > > +              * lets make sure we allow RCU processing.
> > > +              */
> > > +             cond_resched();
> > >               if (rec->flags & FTRACE_FL_DISABLED)
> > >                       continue;
> > >  
> >
> > NACK.  On some architectures this code is run from stop machine. We
> > can't call cond_resched() because it may be called with interrupts
> > disabled.
> >
> > This is a weak function. If arm64 has special needs, just copy it in
> > the arm64 code.  
> 
> I think it's currently broken on all architectures that don't already
> override it, the problem being that the function is simply too
> expensive when all debug options are enabled.

Would it be possible to add something like touch_nmi_watchdog()?

> 
> In an ARM64 allmodconfig kernel, there are 41424 records
> that we iterate through several times. In an earlier version of the
> test, the cond_resched() was only in the loop in
> init_trace_selftests(), and I think that is safe and should /mostly/
> solve the problem, so maybe Anders can submit that version again.
> 
> However, at least trace_selftest_ops() still takes half a minute
> to complete in qemu, and that triggers the softlockup watchdog.
> trace_selftest_ops() calls ftrace_replace_code() four or five times.

And I don't have a problem with adding cond_resched() there. I'm
concerned about adding it where it can be called with interrupts and/or
preemption disabled.

> 
> Here is the excerpt with printk times from one of Anders' tests:
> 
> [    8.350607] Running postponed tracer tests:
> [    8.356045] Testing tracer function:
> [   18.932077] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   27.454205] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   27.462594] PASSED
> [   27.462954] Testing dynamic ftrace:
> [   28.510903] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   28.746934] PASSED
> [   28.747469] Testing dynamic ftrace ops #1:
> [   32.488427] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   32.501864] (1 0 1 0 0)
> [   32.502041] (1 1 2 0 0)
> [   50.213914] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   50.219736] (2 1 3 0 1066085)
> [   50.220077] (2 2 4 0 1066100)
> [   60.580678] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.758019] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.910501] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.918354] PASSED
> [   60.919672] Testing dynamic ftrace ops #2:
> [   64.680222] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   64.843430] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   81.247068] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   81.250895] (1 0 1 1033119 0)
> [   81.251186] (1 1 2 1033134 0)
> [   81.343168] (2 1 3 1 3732)
> [   81.344492] (2 2 4 118 3849)
> [   89.837665] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   89.844371] PASSED
> [   89.844719] Testing ftrace recursion:
> [   90.890373] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   91.042146] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   91.048475] PASSED
> [   91.048806] Testing ftrace recursion safe:
> [   92.091174] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   92.242403] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   92.249119] PASSED
> [   92.249470] Testing ftrace regs(no arch support):
> [   93.293605] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   93.444942] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   93.451738] PASSED
> [   93.452300] Testing tracer nop: PASSED
> [   93.453288] Testing tracer irqsoff:
> [  104.486368] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  112.918828] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  112.925809] PASSED
> [  112.926435] Testing tracer function_graph:
> [  123.303248] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  132.599763] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  132.607614] PASSED
> 
> In particular, the test_probe3 in trace_selftest_ops() takes
> around 20 seconds, or 482 microseconds per loop iteration
> in ftrace_replace_code().
> Do you think there is another bug that makes it slower than
> expected, or is that a reasonable time that it could take?

Well, if you are doing it under qemu and with all the debug options
set, I could expect it to take that long.

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-29  3:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-28  8:13 [PATCH 2/2] kernel/trace: fix watchdog soft lockup Anders Roxell
2018-11-28 14:09 ` Steven Rostedt
2018-11-28 22:24   ` Arnd Bergmann
2018-11-29  3:24     ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.