linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Brian Gerst <brgerst@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Byungchul Park <byungchul.park@lge.com>
Subject: Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack
Date: Fri, 22 Jul 2016 17:20:54 -0500	[thread overview]
Message-ID: <20160722222054.qlxyvbh3edfft37i@treble> (raw)
In-Reply-To: <CALCETrUzOnyj0s2Tv4M8WDxiyXBvXHnBv3mfRvqWUNJWgYiZgA@mail.gmail.com>

On Fri, Jul 22, 2016 at 02:46:10PM -0700, Andy Lutomirski wrote:
> On Fri, Jul 22, 2016 at 8:57 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jul 21, 2016 at 10:13:03PM -0700, Andy Lutomirski wrote:
> >> On Thu, Jul 21, 2016 at 8:30 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Thu, Jul 21, 2016 at 03:32:32PM -0700, Andy Lutomirski wrote:
> >> >> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> > Now that we can find pt_regs registers in the middle of the stack due to
> >> >> > an interrupt or exception, we can print them.  Here's what it looks
> >> >> > like:
> >> >> >
> >> >> >    ...
> >> >> >    [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
> >> >> >    [<ffffffff8189f558>] async_page_fault+0x28/0x30
> >> >> >   RIP: 0010:[<ffffffff814529e2>]  [<ffffffff814529e2>] __clear_user+0x42/0x70
> >> >> >   RSP: 0018:ffff88007876fd38  EFLAGS: 00010202
> >> >> >   RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
> >> >> >   RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
> >> >> >   RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000
> >> >> >   R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
> >> >> >   R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200
> >> >> >    [<ffffffff814529e2>] ? __clear_user+0x42/0x70
> >> >> >    [<ffffffff814529c3>] ? __clear_user+0x23/0x70
> >> >> >    [<ffffffff81452a7b>] clear_user+0x2b/0x40
> >> >> >    ...
> >> >>
> >> >> This looks wrong.  Here are some theories:
> >> >>
> >> >> (a) __clear_user is a reliable address that is indicated by RIP: ....
> >> >> Then it's found again as an unreliable address as "?
> >> >> __clear_user+0x42/0x70" by scanning the stack.  "?
> >> >> __clear_user+0x23/0x70" is a genuine leftover artifact on the stack.
> >> >> In this case, shouldn't "? __clear_user+0x42/0x70" have been
> >> >> suppressed because it matched a reliable address?
> >> >>
> >> >> (b) You actually intended for all the addresses to be printed, in
> >> >> which case "? __clear_user+0x42/0x70" should have been
> >> >> "__clear_user+0x42/0x70" and you have a bug.  In this case, it's
> >> >> plausible that your state machine got a bit lost leading to "?
> >> >> __clear_user+0x23/0x70" as well (i.e. it's not just an artifact --
> >> >> it's a real frame and you didn't find it).
> >> >>
> >> >> (c) Something else and I'm confused.
> >> >
> >> > So there's a subtle difference between addresses reported by regs->ip
> >> > and normal return addresses.  For example:
> >> >
> >> >    ...
> >> >    [<ffffffff8189ff4d>] smp_apic_timer_interrupt+0x3d/0x50
> >> >    [<ffffffff8189de6e>] apic_timer_interrupt+0x9e/0xb0
> >> >   RIP: 0010:[<ffffffff8129b350>]  [<ffffffff8129b350>] path_init+0x0/0x750
> >> >   RSP: 0018:ffff880036a3fd80  EFLAGS: 00000296
> >> >   RAX: ffff88003691aa40 RBX: ffff880036a3ff08 RCX: ffff880036a3ff08
> >> >   RDX: ffff880036a3ff08 RSI: 0000000000000041 RDI: ffff880036a3fdb0
> >> >   RBP: ffff880036a3fda0 R08: 0000000000000000 R09: 0000000000000010
> >> >   R10: 8080808080808080 R11: fefefefefefefeff R12: ffff880036a3fdb0
> >> >   R13: 0000000000000001 R14: ffff880036a3ff08 R15: 0000000000000000
> >> >    <EOI>
> >> >    [<ffffffff8129b350>] ? lookup_fast+0x3d0/0x3d0
> >> >    [<ffffffff8129c81b>] ? path_lookupat+0x1b/0x120
> >> >    [<ffffffff8129ddd1>] filename_lookup+0xb1/0x180
> >> >    ...
> >> >
> >> > In this case the irq hit right after path_lookupat() called into
> >> > path_init().  So the "path_init+0x0" printed by __show_regs() is right.
> >> >
> >> > Note the backtrace reports the same address, but it instead describes it
> >> > as "lookup_fast+0x3d0", which is the end of lookup_fast().  That's
> >> > because normally, such an address after a call instruction at the end of
> >> > a function would indicate a tail call (e.g., to a noreturn function).
> >> > If that were the case, printing "path_init+0x0" would be completely
> >> > wrong, because path_init() just happens to be the function located
> >> > immediately after lookup_fast().
> >> >
> >> > Maybe I could add some special logic to say: "if this return address was
> >> > from a call, use printk_stack_address(); else if it was from a pt_regs,
> >> > use printk_address()."  (The former prints the preceding function, the
> >> > latter prints the current function.)  Then we could remove the question
> >> > mark.
> >> >
> >> > There's also the question of whether or not the address should be
> >> > printed again, after it's already been printed by __show_regs().  I
> >> > don't have a strong opinion either way.
> >> >
> >>
> >> IIRC we don't show the actual faulting function in the call trace, so
> >> we probably shouldn't duplicate the entry after the show_regs.
> >
> > Just to clarify, that's true today for cases where the stack dump starts
> > from a handler which has regs.  It starts dumping based on regs->ip and
> > regs->bp, so the regs themselves aren't dumped.
> >
> > But for cases where regs are in the middle of the stack, they aren't
> > detected today, and you'll still see the value of regs->ip dumped with a
> > question mark.
> >
> > That said, with this patch, now that regs in the middle of the stack
> > *are* being printed, I can't think of a good reason to print the return
> > address twice: both in regs and the stack trace.  So removing it from
> > the stack trace is fine with me.
> >
> >> That being said, I'm still confused by the question marks.  What
> >> exactly is going on?  Is the code really doing the right thing wrt
> >> resuming the unwind?  Is there a git tree with these patches applied
> >> somewhere so I can look at it easily in context?
> >
> > show_trace_log_lvl() is doing two things in parallel: scanning all
> > kernel text addresses on the stack while simultaneously using the
> > unwinder to walk the frame pointers.  Only those scanned addresses which
> > are also found by the unwinder are printed without question marks.
> >
> > The pt_regs aren't in a frame of their own; they're just data inside of
> > a bigger frame.  (You may recall that you objected to my proposal to put
> > them in their own frame :-))  So that's why the address stored in
> > regs->ip was printed with a question mark: it's not in the header of a
> > real frame; it's just data.
> 
> It wasn't the separate frame part I was objecting to -- it was their
> encoding on the stack.  Maybe they should unwind as though they're a
> separate frame.  For example, the unwind API could give the frame that
> returns to apic_timer_interrupt+0x9e/0xb0 and then the next frame
> could literally list regs->ip as its retaddr (and maybe that frame or
> even the following one should be the one with non-NULL
> unwind_get_entry_regs).

Having the unwinder treat the pt_regs as a "fake" frame is problematic:

- As I described above, you can't treat regs->ip as a normal return
  value anyway.

- Also, for exceptions and nested interrupts, the regs are stored on the
  interrupting stack.  But for non-nested interrupts, they're stored on
  the thread stack.  So the regs aren't always on the same stack as the
  corresponding encoded pt_regs pointer.  Another issue is that there's
  not always a frame after the regs.  For those reasons, creating a
  "fake" frame abstraction in the state machine is quite a bit trickier
  than just dealing with those details in the only place that cares
  about them: show_trace_log_lvl().

> In some sense, the regs belong to the frame that got interrupted, not
> the frame that did the interrupting.  But maybe that's backwards -- if
> we have DWARF, then the regs correspond to the regs at the time of a
> call, and those regs are reasonably likely to contain the arguments to
> the called function.
> 
> But regardless of which way this goes, it seems quite awkward to me
> that regs->ip never shows up as the return addr of any frame as
> exposed by the unwind API.

Again, regs->ip is special.  It's not a call return address and we
shouldn't force it to be.

-- 
Josh

  reply	other threads:[~2016-07-22 22:20 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21 21:21 [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 01/19] x86/dumpstack: remove show_trace() Josh Poimboeuf
2016-07-21 21:49   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 02/19] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-07-21 21:53   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 03/19] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-07-21 21:56   ` Andy Lutomirski
2016-07-22  1:41     ` Josh Poimboeuf
2016-07-22  2:29       ` Andy Lutomirski
2016-07-22  3:08       ` Brian Gerst
2016-07-21 21:21 ` [PATCH 04/19] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 05/19] x86/dumpstack: fix function graph tracing stack dump reliability issues Josh Poimboeuf
2016-07-29 22:55   ` Steven Rostedt
2016-07-30  0:50     ` Josh Poimboeuf
2016-07-30  2:20       ` Steven Rostedt
2016-07-30 13:51         ` Josh Poimboeuf
2016-08-01 14:28           ` Steven Rostedt
2016-08-01 15:36             ` Josh Poimboeuf
2016-08-02 21:00               ` Josh Poimboeuf
2016-08-02 21:16                 ` Steven Rostedt
2016-08-02 22:13                   ` Josh Poimboeuf
2016-08-02 23:16                     ` Steven Rostedt
2016-08-03  1:56                       ` Josh Poimboeuf
2016-08-03  2:30                         ` Steven Rostedt
2016-08-03  2:50                           ` Josh Poimboeuf
2016-08-03  2:59                             ` Steven Rostedt
2016-08-03  3:12                               ` Josh Poimboeuf
2016-08-03  3:18                                 ` Steven Rostedt
2016-08-03  3:21                                   ` Steven Rostedt
2016-08-03  3:31                                     ` Josh Poimboeuf
2016-08-03  3:45                                       ` Steven Rostedt
2016-08-03 14:13                                         ` Josh Poimboeuf
2016-08-03  3:30                                   ` Josh Poimboeuf
2016-08-01 15:59     ` Josh Poimboeuf
2016-08-01 16:05       ` Steven Rostedt
2016-08-01 16:19         ` Josh Poimboeuf
2016-08-01 16:24     ` Josh Poimboeuf
2016-08-01 16:56       ` Steven Rostedt
2016-07-21 21:21 ` [PATCH 06/19] x86/dumpstack: remove extra brackets around "EOE" Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 07/19] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
2016-07-21 22:01   ` Andy Lutomirski
2016-07-22  1:48     ` Josh Poimboeuf
2016-07-22  8:24       ` Ingo Molnar
2016-07-21 21:21 ` [PATCH 08/19] x86/dumpstack: don't disable preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 09/19] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
2016-07-21 22:05   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 10/19] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
2016-07-22 23:26   ` Andy Lutomirski
2016-07-22 23:52     ` Andy Lutomirski
2016-07-23 13:09       ` Josh Poimboeuf
2016-07-22 23:54     ` Josh Poimboeuf
2016-07-23  0:15       ` Andy Lutomirski
2016-07-23 14:04         ` Josh Poimboeuf
2016-07-26  0:09           ` Andy Lutomirski
2016-07-26 16:26             ` Josh Poimboeuf
2016-07-26 17:51               ` Steven Rostedt
2016-07-26 18:56                 ` Josh Poimboeuf
2016-07-26 20:59               ` Andy Lutomirski
2016-07-26 22:24                 ` Josh Poimboeuf
2016-07-26 22:31                   ` Steven Rostedt
2016-07-26 22:37                   ` Andy Lutomirski
2016-07-26 16:47             ` Josh Poimboeuf
2016-07-26 17:49               ` Brian Gerst
2016-07-26 18:59                 ` Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 11/19] x86/dumptrace: add new unwind interface and implementations Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 12/19] perf/x86: convert perf_callchain_kernel() to the new unwinder Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 13/19] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 14/19] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 15/19] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
2016-07-21 21:49   ` Byungchul Park
2016-07-22  1:38     ` Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 16/19] x86/dumpstack: remove dump_trace() Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 17/19] x86/entry/dumpstack: encode pt_regs pointer in frame pointer Josh Poimboeuf
2016-07-21 22:27   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 18/19] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
2016-07-21 22:32   ` Andy Lutomirski
2016-07-22  3:30     ` Josh Poimboeuf
2016-07-22  5:13       ` Andy Lutomirski
2016-07-22 15:57         ` Josh Poimboeuf
2016-07-22 21:46           ` Andy Lutomirski
2016-07-22 22:20             ` Josh Poimboeuf [this message]
2016-07-22 23:18               ` Andy Lutomirski
2016-07-22 23:30                 ` Josh Poimboeuf
2016-07-22 23:39                   ` Andy Lutomirski
2016-07-23  0:00                     ` Josh Poimboeuf
2016-07-23  0:22 ` [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code Linus Torvalds
2016-07-23  0:31   ` Andy Lutomirski
2016-07-23  5:35     ` Josh Poimboeuf
2016-07-23  5:39       ` Linus Torvalds
2016-07-23 12:53         ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160722222054.qlxyvbh3edfft37i@treble \
    --to=jpoimboe@redhat.com \
    --cc=brgerst@gmail.com \
    --cc=byungchul.park@lge.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).