All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Anderson <dianders@chromium.org>
To: Jason Wessel <jason.wessel@windriver.com>,
	Daniel Thompson <daniel.thompson@linaro.org>
Cc: kgdb-bugreport@lists.sourceforge.net,
	Christophe Leroy <christophe.leroy@c-s.fr>,
	LKML <linux-kernel@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master
Date: Mon, 26 Aug 2019 15:25:43 -0700	[thread overview]
Message-ID: <CAD=FV=Wh4M_A01gsWYBXSdgs0Gg4LAGDZ8X+5=4j=nuxiJ8kNA@mail.gmail.com> (raw)
In-Reply-To: <20190731183732.178134-1-dianders@chromium.org>

Jason / Daniel,

On Wed, Jul 31, 2019 at 11:38 AM Douglas Anderson <dianders@chromium.org> wrote:
>
> In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
> give you the right info.  Specifically on many architectures
> (including arm64, where I tested) you can't dump the stack of a
> "running" process that isn't the process running on the current CPU.
> This can be seen by this:
>
>  echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
>  # wait 2 seconds
>  <sysrq>g
>
> Here's what I see now on rk3399-gru-kevin.  I see the stack crawl for
> the CPU that handled the sysrq but everything else just shows me stuck
> in __switch_to() which is bogus:
>
> ======
>
> [0]kdb> btc
> btc: cpu status: Currently on cpu 0
> Available cpus: 0, 1-3(I), 4, 5(I)
> Stack traceback for pid 0
> 0xffffff801101a9c0        0        0  1    0   R  0xffffff801101b3b0 *swapper/0
> Call trace:
>  dump_backtrace+0x0/0x138
>  ...
>  kgdb_compiled_brk_fn+0x34/0x44
>  ...
>  sysrq_handle_dbg+0x34/0x5c
> Stack traceback for pid 0
> 0xffffffc0f175a040        0        0  1    1   I  0xffffffc0f175aa30  swapper/1
> Call trace:
>  __switch_to+0x1e4/0x240
>  0xffffffc0f65616c0
> Stack traceback for pid 0
> 0xffffffc0f175d040        0        0  1    2   I  0xffffffc0f175da30  swapper/2
> Call trace:
>  __switch_to+0x1e4/0x240
>  0xffffffc0f65806c0
> Stack traceback for pid 0
> 0xffffffc0f175b040        0        0  1    3   I  0xffffffc0f175ba30  swapper/3
> Call trace:
>  __switch_to+0x1e4/0x240
>  0xffffffc0f659f6c0
> Stack traceback for pid 1474
> 0xffffffc0dde8b040     1474      727  1    4   R  0xffffffc0dde8ba30  bash
> Call trace:
>  __switch_to+0x1e4/0x240
>  __schedule+0x464/0x618
>  0xffffffc0dde8b040
> Stack traceback for pid 0
> 0xffffffc0f17b0040        0        0  1    5   I  0xffffffc0f17b0a30  swapper/5
> Call trace:
>  __switch_to+0x1e4/0x240
>  0xffffffc0f65dd6c0
>
> ===
>
> The problem is that 'btc' eventually boils down to
>   show_stack(task_struct, NULL);
>
> ...and show_stack() doesn't work for "running" CPUs because their
> registers haven't been stashed.
>
> On x86 things might work better (I haven't tested) because kdb has a
> special case for x86 in kdb_show_stack() where it passes the stack
> pointer to show_stack().  This wouldn't work on arm64 where the stack
> crawling function seems needs the "fp" and "pc", not the "sp" which is
> presumably why arm64's show_stack() function totally ignores the "sp"
> parameter.
>
> NOTE: we _can_ get a good stack dump for all the cpus if we manually
> switch each one to the kdb master and do a back trace.  AKA:
>   cpu 4
>   bt
> ...will give the expected trace.  That's because now arm64's
> dump_backtrace will now see that "tsk == current" and go through a
> different path.
>
> In this patch I fix the problems by catching a request to stack crawl
> a task that's running on a CPU and then I ask that CPU to do the stack
> crawl.
>
> NOTE: this will (presumably) change what stack crawls are printed for
> x86 machines.  Now kdb functions will show up in the stack crawl.
> Presumably this is OK but if it's not we can go back and add a special
> case for x86 again.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
> Changes in v2:
> - Totally new approach; now arch agnostic.
>
>  kernel/debug/debug_core.c |  5 +++++
>  kernel/debug/debug_core.h |  1 +
>  kernel/debug/kdb/kdb_bt.c | 44 ++++++++++++++++++++++++++++++---------
>  3 files changed, 40 insertions(+), 10 deletions(-)

Did either of you have thoughts on this patch?

-Doug

  reply	other threads:[~2019-08-26 22:25 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-31 18:37 [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master Douglas Anderson
2019-08-26 22:25 ` Doug Anderson [this message]
2019-08-27  7:24   ` Daniel Thompson
2019-08-30 14:52 ` Daniel Thompson
2019-09-25 20:03   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=FV=Wh4M_A01gsWYBXSdgs0Gg4LAGDZ8X+5=4j=nuxiJ8kNA@mail.gmail.com' \
    --to=dianders@chromium.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=daniel.thompson@linaro.org \
    --cc=jason.wessel@windriver.com \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.