linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Anders Roxell <anders.roxell@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dmitry Vyukov <dvyukov@google.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] kernel/trace: fix watchdog soft lockup
Date: Wed, 28 Nov 2018 22:24:36 -0500	[thread overview]
Message-ID: <20181128222436.2d29ca9a@vmware.local.home> (raw)
In-Reply-To: <CAK8P3a1tONAdbfncqzEYW_Q8LOrprez+8fYsCTTK9+d+O1Mc3w@mail.gmail.com>

On Wed, 28 Nov 2018 23:24:26 +0100
Arnd Bergmann <arnd@arndb.de> wrote:

> On Wed, Nov 28, 2018 at 3:09 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 28 Nov 2018 09:13:34 +0100
> > Anders Roxell <anders.roxell@linaro.org> wrote:
> >  
> > > When building a allmodconfig kernel for arm64 and boot that in qemu,
> > > CONFIG_FTRACE_STARTUP_TEST gets enabled and that takes time so the
> > > watchdog expires and prints out a message like this:
> > > 'watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]'
> > > Each time the function ftrace_replace_code gets called it stays in that
> > > functions loop for 41424 times.
> > > Rework so that function cond_resched() gets called in the
> > > ftrace_replace_code loop.
> > >
> > > Co-developed-by: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> > > ---
> > >  kernel/trace/ftrace.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > > index 5b4f73e4fd56..3f456921dedf 100644
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -2426,6 +2426,10 @@ void __weak ftrace_replace_code(int enable)
> > >
> > >       do_for_each_ftrace_rec(pg, rec) {
> > >
> > > +             /* This loop can take minutes when sanitizers are enabled, so
> > > +              * lets make sure we allow RCU processing.
> > > +              */
> > > +             cond_resched();
> > >               if (rec->flags & FTRACE_FL_DISABLED)
> > >                       continue;
> > >  
> >
> > NACK.  On some architectures this code is run from stop machine. We
> > can't call cond_resched() because it may be called with interrupts
> > disabled.
> >
> > This is a weak function. If arm64 has special needs, just copy it in
> > the arm64 code.  
> 
> I think it's currently broken on all architectures that don't already
> override it, the problem being that the function is simply too
> expensive when all debug options are enabled.

Would it be possible to add something like touch_nmi_watchdog()?

> 
> In an ARM64 allmodconfig kernel, there are 41424 records
> that we iterate through several times. In an earlier version of the
> test, the cond_resched() was only in the loop in
> init_trace_selftests(), and I think that is safe and should /mostly/
> solve the problem, so maybe Anders can submit that version again.
> 
> However, at least trace_selftest_ops() still takes half a minute
> to complete in qemu, and that triggers the softlockup watchdog.
> trace_selftest_ops() calls ftrace_replace_code() four or five times.

And I don't have a problem with adding cond_resched() there. I'm
concerned about adding it where it can be called with interrupts and/or
preemption disabled.

> 
> Here is the excerpt with printk times from one of Anders' tests:
> 
> [    8.350607] Running postponed tracer tests:
> [    8.356045] Testing tracer function:
> [   18.932077] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   27.454205] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   27.462594] PASSED
> [   27.462954] Testing dynamic ftrace:
> [   28.510903] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   28.746934] PASSED
> [   28.747469] Testing dynamic ftrace ops #1:
> [   32.488427] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   32.501864] (1 0 1 0 0)
> [   32.502041] (1 1 2 0 0)
> [   50.213914] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   50.219736] (2 1 3 0 1066085)
> [   50.220077] (2 2 4 0 1066100)
> [   60.580678] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.758019] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.910501] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   60.918354] PASSED
> [   60.919672] Testing dynamic ftrace ops #2:
> [   64.680222] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   64.843430] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   81.247068] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   81.250895] (1 0 1 1033119 0)
> [   81.251186] (1 1 2 1033134 0)
> [   81.343168] (2 1 3 1 3732)
> [   81.344492] (2 2 4 118 3849)
> [   89.837665] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   89.844371] PASSED
> [   89.844719] Testing ftrace recursion:
> [   90.890373] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   91.042146] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   91.048475] PASSED
> [   91.048806] Testing ftrace recursion safe:
> [   92.091174] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   92.242403] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   92.249119] PASSED
> [   92.249470] Testing ftrace regs(no arch support):
> [   93.293605] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   93.444942] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [   93.451738] PASSED
> [   93.452300] Testing tracer nop: PASSED
> [   93.453288] Testing tracer irqsoff:
> [  104.486368] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  112.918828] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  112.925809] PASSED
> [  112.926435] Testing tracer function_graph:
> [  123.303248] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  132.599763] ../kernel/trace/ftrace.c:2441, loop_counter: 41424
> [  132.607614] PASSED
> 
> In particular, the test_probe3 in trace_selftest_ops() takes
> around 20 seconds, or 482 microseconds per loop iteration
> in ftrace_replace_code().
> Do you think there is another bug that makes it slower than
> expected, or is that a reasonable time that it could take?

Well, if you are doing it under qemu and with all the debug options
set, I could expect it to take that long.

-- Steve

      reply	other threads:[~2018-11-29  3:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28  8:13 [PATCH 2/2] kernel/trace: fix watchdog soft lockup Anders Roxell
2018-11-28 14:09 ` Steven Rostedt
2018-11-28 22:24   ` Arnd Bergmann
2018-11-29  3:24     ` Steven Rostedt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181128222436.2d29ca9a@vmware.local.home \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=anders.roxell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=dvyukov@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).