From: Daniel Thompson <firstname.lastname@example.org> To: Doug Anderson <email@example.com> Cc: Jason Wessel <firstname.lastname@example.org>, email@example.com, Christophe Leroy <firstname.lastname@example.org>, LKML <email@example.com>, Catalin Marinas <firstname.lastname@example.org>, Will Deacon <email@example.com> Subject: Re: [PATCH v2] kdb: Fix stack crawling on 'running' CPUs that aren't the master Date: Tue, 27 Aug 2019 08:24:38 +0100 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <CAD=FV=Wh4M_A01gsWYBXSdgs0Gg4LAGDZ8X+5=4j=nuxiJ8kNA@mail.gmail.com> On Mon, Aug 26, 2019 at 03:25:43PM -0700, Doug Anderson wrote: > Jason / Daniel, > > On Wed, Jul 31, 2019 at 11:38 AM Douglas Anderson <email@example.com> wrote: > > > > In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily > > give you the right info. Specifically on many architectures > > (including arm64, where I tested) you can't dump the stack of a > > "running" process that isn't the process running on the current CPU. > > This can be seen by this: > > > > echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT > > # wait 2 seconds > > <sysrq>g > > > > Here's what I see now on rk3399-gru-kevin. I see the stack crawl for > > the CPU that handled the sysrq but everything else just shows me stuck > > in __switch_to() which is bogus: > > > > ====== > > > > kdb> btc > > btc: cpu status: Currently on cpu 0 > > Available cpus: 0, 1-3(I), 4, 5(I) > > Stack traceback for pid 0 > > 0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0 > > Call trace: > > dump_backtrace+0x0/0x138 > > ... > > kgdb_compiled_brk_fn+0x34/0x44 > > ... > > sysrq_handle_dbg+0x34/0x5c > > Stack traceback for pid 0 > > 0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1 > > Call trace: > > __switch_to+0x1e4/0x240 > > 0xffffffc0f65616c0 > > Stack traceback for pid 0 > > 0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2 > > Call trace: > > __switch_to+0x1e4/0x240 > > 0xffffffc0f65806c0 > > Stack traceback for pid 0 > > 0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3 > > Call trace: > > __switch_to+0x1e4/0x240 > > 0xffffffc0f659f6c0 > > Stack traceback for pid 1474 > > 0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash > > Call trace: > > __switch_to+0x1e4/0x240 > > __schedule+0x464/0x618 > > 0xffffffc0dde8b040 > > Stack traceback for pid 0 > > 0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5 > > Call trace: > > __switch_to+0x1e4/0x240 > > 0xffffffc0f65dd6c0 > > > > === > > > > The problem is that 'btc' eventually boils down to > > show_stack(task_struct, NULL); > > > > ...and show_stack() doesn't work for "running" CPUs because their > > registers haven't been stashed. > > > > On x86 things might work better (I haven't tested) because kdb has a > > special case for x86 in kdb_show_stack() where it passes the stack > > pointer to show_stack(). This wouldn't work on arm64 where the stack > > crawling function seems needs the "fp" and "pc", not the "sp" which is > > presumably why arm64's show_stack() function totally ignores the "sp" > > parameter. > > > > NOTE: we _can_ get a good stack dump for all the cpus if we manually > > switch each one to the kdb master and do a back trace. AKA: > > cpu 4 > > bt > > ...will give the expected trace. That's because now arm64's > > dump_backtrace will now see that "tsk == current" and go through a > > different path. > > > > In this patch I fix the problems by catching a request to stack crawl > > a task that's running on a CPU and then I ask that CPU to do the stack > > crawl. > > > > NOTE: this will (presumably) change what stack crawls are printed for > > x86 machines. Now kdb functions will show up in the stack crawl. > > Presumably this is OK but if it's not we can go back and add a special > > case for x86 again. > > > > Signed-off-by: Douglas Anderson <firstname.lastname@example.org> > > --- > > > > Changes in v2: > > - Totally new approach; now arch agnostic. > > > > kernel/debug/debug_core.c | 5 +++++ > > kernel/debug/debug_core.h | 1 + > > kernel/debug/kdb/kdb_bt.c | 44 ++++++++++++++++++++++++++++++--------- > > 3 files changed, 40 insertions(+), 10 deletions(-) > > Did either of you have thoughts on this patch? Hi Doug Sorry about this. It got backlogged during a recent holiday... it's still on the list. I took a quick look a week or so ago but at this point I haven't yet tested out the behaviour on x86 and I wanted to do a closer review to check I am happy with the barriering. Daniel. > > -Doug
next prev parent reply other threads:[~2019-08-27 7:24 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-07-31 18:37 Douglas Anderson 2019-08-26 22:25 ` Doug Anderson 2019-08-27 7:24 ` Daniel Thompson [this message] 2019-08-30 14:52 ` Daniel Thompson 2019-09-25 20:03 ` Doug Anderson
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v2] kdb: Fix stack crawling on '\''running'\'' CPUs that aren'\''t the master' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.