From: Daniel Thompson <daniel.thompson@linaro.org>
To: Doug Anderson <dianders@chromium.org>
Cc: Jason Wessel <jason.wessel@windriver.com>,
kgdb-bugreport@lists.sourceforge.net,
Christophe Leroy <christophe.leroy@c-s.fr>,
LKML <linux-kernel@vger.kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>
Subject: Re: [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master
Date: Tue, 27 Aug 2019 08:24:38 +0100 [thread overview]
Message-ID: <20190827072438.fwgggambxp34onid@holly.lan> (raw)
In-Reply-To: <CAD=FV=Wh4M_A01gsWYBXSdgs0Gg4LAGDZ8X+5=4j=nuxiJ8kNA@mail.gmail.com>
On Mon, Aug 26, 2019 at 03:25:43PM -0700, Doug Anderson wrote:
> Jason / Daniel,
>
> On Wed, Jul 31, 2019 at 11:38 AM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
> > give you the right info. Specifically on many architectures
> > (including arm64, where I tested) you can't dump the stack of a
> > "running" process that isn't the process running on the current CPU.
> > This can be seen by this:
> >
> > echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > # wait 2 seconds
> > <sysrq>g
> >
> > Here's what I see now on rk3399-gru-kevin. I see the stack crawl for
> > the CPU that handled the sysrq but everything else just shows me stuck
> > in __switch_to() which is bogus:
> >
> > ======
> >
> > [0]kdb> btc
> > btc: cpu status: Currently on cpu 0
> > Available cpus: 0, 1-3(I), 4, 5(I)
> > Stack traceback for pid 0
> > 0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0
> > Call trace:
> > dump_backtrace+0x0/0x138
> > ...
> > kgdb_compiled_brk_fn+0x34/0x44
> > ...
> > sysrq_handle_dbg+0x34/0x5c
> > Stack traceback for pid 0
> > 0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1
> > Call trace:
> > __switch_to+0x1e4/0x240
> > 0xffffffc0f65616c0
> > Stack traceback for pid 0
> > 0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2
> > Call trace:
> > __switch_to+0x1e4/0x240
> > 0xffffffc0f65806c0
> > Stack traceback for pid 0
> > 0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
> > Call trace:
> > __switch_to+0x1e4/0x240
> > 0xffffffc0f659f6c0
> > Stack traceback for pid 1474
> > 0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash
> > Call trace:
> > __switch_to+0x1e4/0x240
> > __schedule+0x464/0x618
> > 0xffffffc0dde8b040
> > Stack traceback for pid 0
> > 0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5
> > Call trace:
> > __switch_to+0x1e4/0x240
> > 0xffffffc0f65dd6c0
> >
> > ===
> >
> > The problem is that 'btc' eventually boils down to
> > show_stack(task_struct, NULL);
> >
> > ...and show_stack() doesn't work for "running" CPUs because their
> > registers haven't been stashed.
> >
> > On x86 things might work better (I haven't tested) because kdb has a
> > special case for x86 in kdb_show_stack() where it passes the stack
> > pointer to show_stack(). This wouldn't work on arm64 where the stack
> > crawling function seems needs the "fp" and "pc", not the "sp" which is
> > presumably why arm64's show_stack() function totally ignores the "sp"
> > parameter.
> >
> > NOTE: we _can_ get a good stack dump for all the cpus if we manually
> > switch each one to the kdb master and do a back trace. AKA:
> > cpu 4
> > bt
> > ...will give the expected trace. That's because now arm64's
> > dump_backtrace will now see that "tsk == current" and go through a
> > different path.
> >
> > In this patch I fix the problems by catching a request to stack crawl
> > a task that's running on a CPU and then I ask that CPU to do the stack
> > crawl.
> >
> > NOTE: this will (presumably) change what stack crawls are printed for
> > x86 machines. Now kdb functions will show up in the stack crawl.
> > Presumably this is OK but if it's not we can go back and add a special
> > case for x86 again.
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> > Changes in v2:
> > - Totally new approach; now arch agnostic.
> >
> > kernel/debug/debug_core.c | 5 +++++
> > kernel/debug/debug_core.h | 1 +
> > kernel/debug/kdb/kdb_bt.c | 44 ++++++++++++++++++++++++++++++---------
> > 3 files changed, 40 insertions(+), 10 deletions(-)
>
> Did either of you have thoughts on this patch?
Hi Doug
Sorry about this. It got backlogged during a recent holiday... it's still
on the list.
I took a quick look a week or so ago but at this point I haven't yet
tested out the behaviour on x86 and I wanted to do a closer review to
check I am happy with the barriering.
Daniel.
>
> -Doug
next prev parent reply other threads:[~2019-08-27 7:24 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-31 18:37 [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master Douglas Anderson
2019-08-26 22:25 ` Doug Anderson
2019-08-27 7:24 ` Daniel Thompson [this message]
2019-08-30 14:52 ` Daniel Thompson
2019-09-25 20:03 ` Doug Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190827072438.fwgggambxp34onid@holly.lan \
--to=daniel.thompson@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=christophe.leroy@c-s.fr \
--cc=dianders@chromium.org \
--cc=jason.wessel@windriver.com \
--cc=kgdb-bugreport@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).