All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Poimboeuf <jpoimboe@redhat.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	X86 ML <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Brian Gerst <brgerst@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Byungchul Park <byungchul.park@lge.com>
Subject: Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack
Date: Fri, 22 Jul 2016 19:00:53 -0500	[thread overview]
Message-ID: <20160723000053.2wufllifxj62rqw6@treble> (raw)
In-Reply-To: <CALCETrW=dTCt0csqeDAHQt=K_wUPiN=ORFhGmb0cKSzZizU+qQ@mail.gmail.com>

On Fri, Jul 22, 2016 at 04:39:00PM -0700, Andy Lutomirski wrote:
> On Fri, Jul 22, 2016 at 4:30 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Fri, Jul 22, 2016 at 04:18:04PM -0700, Andy Lutomirski wrote:
> >> On Fri, Jul 22, 2016 at 3:20 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Fri, Jul 22, 2016 at 02:46:10PM -0700, Andy Lutomirski wrote:
> >> >> On Fri, Jul 22, 2016 at 8:57 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> > On Thu, Jul 21, 2016 at 10:13:03PM -0700, Andy Lutomirski wrote:
> >> >> >> On Thu, Jul 21, 2016 at 8:30 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> >> > On Thu, Jul 21, 2016 at 03:32:32PM -0700, Andy Lutomirski wrote:
> >> >> >> >> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> >> >> >> > Now that we can find pt_regs registers in the middle of the stack due to
> >> >> >> >> > an interrupt or exception, we can print them.  Here's what it looks
> >> >> >> >> > like:
> >> >> >> >> >
> >> >> >> >> >    ...
> >> >> >> >> >    [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
> >> >> >> >> >    [<ffffffff8189f558>] async_page_fault+0x28/0x30
> >> >> >> >> >   RIP: 0010:[<ffffffff814529e2>]  [<ffffffff814529e2>] __clear_user+0x42/0x70
> >> >> >> >> >   RSP: 0018:ffff88007876fd38  EFLAGS: 00010202
> >> >> >> >> >   RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
> >> >> >> >> >   RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
> >> >> >> >> >   RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000
> >> >> >> >> >   R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
> >> >> >> >> >   R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200
> >> >> >> >> >    [<ffffffff814529e2>] ? __clear_user+0x42/0x70
> >> >> >> >> >    [<ffffffff814529c3>] ? __clear_user+0x23/0x70
> >> >> >> >> >    [<ffffffff81452a7b>] clear_user+0x2b/0x40
> >> >> >> >> >    ...
> >> >> >> >>
> >> >> >> >> This looks wrong.  Here are some theories:
> >> >> >> >>
> >> >> >> >> (a) __clear_user is a reliable address that is indicated by RIP: ....
> >> >> >> >> Then it's found again as an unreliable address as "?
> >> >> >> >> __clear_user+0x42/0x70" by scanning the stack.  "?
> >> >> >> >> __clear_user+0x23/0x70" is a genuine leftover artifact on the stack.
> >> >> >> >> In this case, shouldn't "? __clear_user+0x42/0x70" have been
> >> >> >> >> suppressed because it matched a reliable address?
> >> >> >> >>
> >> >> >> >> (b) You actually intended for all the addresses to be printed, in
> >> >> >> >> which case "? __clear_user+0x42/0x70" should have been
> >> >> >> >> "__clear_user+0x42/0x70" and you have a bug.  In this case, it's
> >> >> >> >> plausible that your state machine got a bit lost leading to "?
> >> >> >> >> __clear_user+0x23/0x70" as well (i.e. it's not just an artifact --
> >> >> >> >> it's a real frame and you didn't find it).
> >> >> >> >>
> >> >> >> >> (c) Something else and I'm confused.
> >> >> >> >
> >> >> >> > So there's a subtle difference between addresses reported by regs->ip
> >> >> >> > and normal return addresses.  For example:
> >> >> >> >
> >> >> >> >    ...
> >> >> >> >    [<ffffffff8189ff4d>] smp_apic_timer_interrupt+0x3d/0x50
> >> >> >> >    [<ffffffff8189de6e>] apic_timer_interrupt+0x9e/0xb0
> >> >> >> >   RIP: 0010:[<ffffffff8129b350>]  [<ffffffff8129b350>] path_init+0x0/0x750
> >> >> >> >   RSP: 0018:ffff880036a3fd80  EFLAGS: 00000296
> >> >> >> >   RAX: ffff88003691aa40 RBX: ffff880036a3ff08 RCX: ffff880036a3ff08
> >> >> >> >   RDX: ffff880036a3ff08 RSI: 0000000000000041 RDI: ffff880036a3fdb0
> >> >> >> >   RBP: ffff880036a3fda0 R08: 0000000000000000 R09: 0000000000000010
> >> >> >> >   R10: 8080808080808080 R11: fefefefefefefeff R12: ffff880036a3fdb0
> >> >> >> >   R13: 0000000000000001 R14: ffff880036a3ff08 R15: 0000000000000000
> >> >> >> >    <EOI>
> >> >> >> >    [<ffffffff8129b350>] ? lookup_fast+0x3d0/0x3d0
> >> >> >> >    [<ffffffff8129c81b>] ? path_lookupat+0x1b/0x120
> >> >> >> >    [<ffffffff8129ddd1>] filename_lookup+0xb1/0x180
> >> >> >> >    ...
> >> >> >> >
> >> >> >> > In this case the irq hit right after path_lookupat() called into
> >> >> >> > path_init().  So the "path_init+0x0" printed by __show_regs() is right.
> >> >> >> >
> >> >> >> > Note the backtrace reports the same address, but it instead describes it
> >> >> >> > as "lookup_fast+0x3d0", which is the end of lookup_fast().  That's
> >> >> >> > because normally, such an address after a call instruction at the end of
> >> >> >> > a function would indicate a tail call (e.g., to a noreturn function).
> >> >> >> > If that were the case, printing "path_init+0x0" would be completely
> >> >> >> > wrong, because path_init() just happens to be the function located
> >> >> >> > immediately after lookup_fast().
> >> >> >> >
> >> >> >> > Maybe I could add some special logic to say: "if this return address was
> >> >> >> > from a call, use printk_stack_address(); else if it was from a pt_regs,
> >> >> >> > use printk_address()."  (The former prints the preceding function, the
> >> >> >> > latter prints the current function.)  Then we could remove the question
> >> >> >> > mark.
> >> >> >> >
> >> >> >> > There's also the question of whether or not the address should be
> >> >> >> > printed again, after it's already been printed by __show_regs().  I
> >> >> >> > don't have a strong opinion either way.
> >> >> >> >
> >> >> >>
> >> >> >> IIRC we don't show the actual faulting function in the call trace, so
> >> >> >> we probably shouldn't duplicate the entry after the show_regs.
> >> >> >
> >> >> > Just to clarify, that's true today for cases where the stack dump starts
> >> >> > from a handler which has regs.  It starts dumping based on regs->ip and
> >> >> > regs->bp, so the regs themselves aren't dumped.
> >> >> >
> >> >> > But for cases where regs are in the middle of the stack, they aren't
> >> >> > detected today, and you'll still see the value of regs->ip dumped with a
> >> >> > question mark.
> >> >> >
> >> >> > That said, with this patch, now that regs in the middle of the stack
> >> >> > *are* being printed, I can't think of a good reason to print the return
> >> >> > address twice: both in regs and the stack trace.  So removing it from
> >> >> > the stack trace is fine with me.
> >> >> >
> >> >> >> That being said, I'm still confused by the question marks.  What
> >> >> >> exactly is going on?  Is the code really doing the right thing wrt
> >> >> >> resuming the unwind?  Is there a git tree with these patches applied
> >> >> >> somewhere so I can look at it easily in context?
> >> >> >
> >> >> > show_trace_log_lvl() is doing two things in parallel: scanning all
> >> >> > kernel text addresses on the stack while simultaneously using the
> >> >> > unwinder to walk the frame pointers.  Only those scanned addresses which
> >> >> > are also found by the unwinder are printed without question marks.
> >> >> >
> >> >> > The pt_regs aren't in a frame of their own; they're just data inside of
> >> >> > a bigger frame.  (You may recall that you objected to my proposal to put
> >> >> > them in their own frame :-))  So that's why the address stored in
> >> >> > regs->ip was printed with a question mark: it's not in the header of a
> >> >> > real frame; it's just data.
> >> >>
> >> >> It wasn't the separate frame part I was objecting to -- it was their
> >> >> encoding on the stack.  Maybe they should unwind as though they're a
> >> >> separate frame.  For example, the unwind API could give the frame that
> >> >> returns to apic_timer_interrupt+0x9e/0xb0 and then the next frame
> >> >> could literally list regs->ip as its retaddr (and maybe that frame or
> >> >> even the following one should be the one with non-NULL
> >> >> unwind_get_entry_regs).
> >> >
> >> > Having the unwinder treat the pt_regs as a "fake" frame is problematic:
> >> >
> >> > - As I described above, you can't treat regs->ip as a normal return
> >> >   value anyway.
> >> >
> >> > - Also, for exceptions and nested interrupts, the regs are stored on the
> >> >   interrupting stack.  But for non-nested interrupts, they're stored on
> >> >   the thread stack.  So the regs aren't always on the same stack as the
> >> >   corresponding encoded pt_regs pointer.  Another issue is that there's
> >> >   not always a frame after the regs.  For those reasons, creating a
> >> >   "fake" frame abstraction in the state machine is quite a bit trickier
> >> >   than just dealing with those details in the only place that cares
> >> >   about them: show_trace_log_lvl().
> >> >
> >> >> In some sense, the regs belong to the frame that got interrupted, not
> >> >> the frame that did the interrupting.  But maybe that's backwards -- if
> >> >> we have DWARF, then the regs correspond to the regs at the time of a
> >> >> call, and those regs are reasonably likely to contain the arguments to
> >> >> the called function.
> >> >>
> >> >> But regardless of which way this goes, it seems quite awkward to me
> >> >> that regs->ip never shows up as the return addr of any frame as
> >> >> exposed by the unwind API.
> >> >
> >> > Again, regs->ip is special.  It's not a call return address and we
> >> > shouldn't force it to be.
> >>
> >> This is only mostly true.  If the exception was a trap, then it is
> >> (e.g. if a function ends in int3, then regs->ip will be off the end).
> >> But that's just me being pedantic.
> >>
> >> More relevantly, regs->ip is a reliable address indicating a function
> >> that will be returned to if we ever return, and both
> >> show_trace_log_lvl() and the livepatch stuff should interpret it as
> >> such.
> >
> > Actually livepatch doesn't care; once it sees that there are regs, it
> > will bail because the stack is unreliable.
> 
> Would it be better for livepatch not to bail some day?

Not until we have a DWARF unwinder.

> >> Whether this means the unwinder should change or
> >> show_trace_log_lvl() should change isn't a big deal, but I think one
> >> of them should change so we get this right.
> >
> > I have no problem doing so, but can you clarify what you mean?  Earlier
> > you said:
> >
> >   "IIRC we don't show the actual faulting function in the call trace, so
> >   we probably shouldn't duplicate the entry after the show_regs."
> >
> > Maybe I'm misunderstanding, but that seems to contradict what you're
> > saying now.  So which is it?  Do you want the RIP address printed twice
> > (both in the regs printout and in the stack trace)?  Or not?
> 
> I don't have a stong preference as to how many times it's printed.
> But I think we need to get rid of the question mark.  I think that
> means there are two options:
> 
> a) Teach show_stack_log_lvl() that regs->ip is a "reliable" entry and
> print it again.  That will get confused if it's the first instruction
> in a function, so maybe it's not so great.

I proposed a fix for this above, so that it would print regs->ip one
way and a normal return address another way, to avoid the confusion:

> >> >> >> > Maybe I could add some special logic to say: "if this return address was
> >> >> >> > from a call, use printk_stack_address(); else if it was from a pt_regs,
> >> >> >> > use printk_address()."  (The former prints the preceding function, the
> >> >> >> > latter prints the current function.)  Then we could remove the question
> >> >> >> > mark.

> b) Teach show_stack_log_lvl() that regs->ip is a thing that we just
> printed (via show_regs) and skip the ? entry.
> 
> Option b probably makes more sense.  I think I'm starting to
> understand all this, but maybe I'm still missing something.

I also think b is a good option.  I'll do it for v2 unless others
disagree.

-- 
Josh

  reply	other threads:[~2016-07-23  0:01 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-21 21:21 [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 01/19] x86/dumpstack: remove show_trace() Josh Poimboeuf
2016-07-21 21:49   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 02/19] x86/dumpstack: add get_stack_pointer() and get_frame_pointer() Josh Poimboeuf
2016-07-21 21:53   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 03/19] x86/dumpstack: remove unnecessary stack pointer arguments Josh Poimboeuf
2016-07-21 21:56   ` Andy Lutomirski
2016-07-22  1:41     ` Josh Poimboeuf
2016-07-22  2:29       ` Andy Lutomirski
2016-07-22  3:08       ` Brian Gerst
2016-07-21 21:21 ` [PATCH 04/19] x86/dumpstack: make printk_stack_address() more generally useful Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 05/19] x86/dumpstack: fix function graph tracing stack dump reliability issues Josh Poimboeuf
2016-07-29 22:55   ` Steven Rostedt
2016-07-30  0:50     ` Josh Poimboeuf
2016-07-30  2:20       ` Steven Rostedt
2016-07-30 13:51         ` Josh Poimboeuf
2016-08-01 14:28           ` Steven Rostedt
2016-08-01 15:36             ` Josh Poimboeuf
2016-08-02 21:00               ` Josh Poimboeuf
2016-08-02 21:16                 ` Steven Rostedt
2016-08-02 22:13                   ` Josh Poimboeuf
2016-08-02 23:16                     ` Steven Rostedt
2016-08-03  1:56                       ` Josh Poimboeuf
2016-08-03  2:30                         ` Steven Rostedt
2016-08-03  2:50                           ` Josh Poimboeuf
2016-08-03  2:59                             ` Steven Rostedt
2016-08-03  3:12                               ` Josh Poimboeuf
2016-08-03  3:18                                 ` Steven Rostedt
2016-08-03  3:21                                   ` Steven Rostedt
2016-08-03  3:31                                     ` Josh Poimboeuf
2016-08-03  3:45                                       ` Steven Rostedt
2016-08-03 14:13                                         ` Josh Poimboeuf
2016-08-03  3:30                                   ` Josh Poimboeuf
2016-08-01 15:59     ` Josh Poimboeuf
2016-08-01 16:05       ` Steven Rostedt
2016-08-01 16:19         ` Josh Poimboeuf
2016-08-01 16:24     ` Josh Poimboeuf
2016-08-01 16:56       ` Steven Rostedt
2016-07-21 21:21 ` [PATCH 06/19] x86/dumpstack: remove extra brackets around "EOE" Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 07/19] x86/dumpstack: add IRQ_USABLE_STACK_SIZE define Josh Poimboeuf
2016-07-21 22:01   ` Andy Lutomirski
2016-07-22  1:48     ` Josh Poimboeuf
2016-07-22  8:24       ` Ingo Molnar
2016-07-21 21:21 ` [PATCH 08/19] x86/dumpstack: don't disable preemption in show_stack_log_lvl() and dump_trace() Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 09/19] x86/dumpstack: simplify in_exception_stack() Josh Poimboeuf
2016-07-21 22:05   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 10/19] x86/dumpstack: add get_stack_info() interface Josh Poimboeuf
2016-07-22 23:26   ` Andy Lutomirski
2016-07-22 23:52     ` Andy Lutomirski
2016-07-23 13:09       ` Josh Poimboeuf
2016-07-22 23:54     ` Josh Poimboeuf
2016-07-23  0:15       ` Andy Lutomirski
2016-07-23 14:04         ` Josh Poimboeuf
2016-07-26  0:09           ` Andy Lutomirski
2016-07-26 16:26             ` Josh Poimboeuf
2016-07-26 17:51               ` Steven Rostedt
2016-07-26 18:56                 ` Josh Poimboeuf
2016-07-26 20:59               ` Andy Lutomirski
2016-07-26 22:24                 ` Josh Poimboeuf
2016-07-26 22:31                   ` Steven Rostedt
2016-07-26 22:37                   ` Andy Lutomirski
2016-07-26 16:47             ` Josh Poimboeuf
2016-07-26 17:49               ` Brian Gerst
2016-07-26 18:59                 ` Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 11/19] x86/dumptrace: add new unwind interface and implementations Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 12/19] perf/x86: convert perf_callchain_kernel() to the new unwinder Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 13/19] x86/stacktrace: convert save_stack_trace_*() " Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 14/19] oprofile/x86: convert x86_backtrace() " Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 15/19] x86/dumpstack: convert show_trace_log_lvl() " Josh Poimboeuf
2016-07-21 21:49   ` Byungchul Park
2016-07-22  1:38     ` Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 16/19] x86/dumpstack: remove dump_trace() Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 17/19] x86/entry/dumpstack: encode pt_regs pointer in frame pointer Josh Poimboeuf
2016-07-21 22:27   ` Andy Lutomirski
2016-07-21 21:21 ` [PATCH 18/19] x86/dumpstack: print stack identifier on its own line Josh Poimboeuf
2016-07-21 21:21 ` [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack Josh Poimboeuf
2016-07-21 22:32   ` Andy Lutomirski
2016-07-22  3:30     ` Josh Poimboeuf
2016-07-22  5:13       ` Andy Lutomirski
2016-07-22 15:57         ` Josh Poimboeuf
2016-07-22 21:46           ` Andy Lutomirski
2016-07-22 22:20             ` Josh Poimboeuf
2016-07-22 23:18               ` Andy Lutomirski
2016-07-22 23:30                 ` Josh Poimboeuf
2016-07-22 23:39                   ` Andy Lutomirski
2016-07-23  0:00                     ` Josh Poimboeuf [this message]
2016-07-23  0:22 ` [PATCH 00/19] x86/dumpstack: rewrite x86 stack dump code Linus Torvalds
2016-07-23  0:31   ` Andy Lutomirski
2016-07-23  5:35     ` Josh Poimboeuf
2016-07-23  5:39       ` Linus Torvalds
2016-07-23 12:53         ` Josh Poimboeuf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160723000053.2wufllifxj62rqw6@treble \
    --to=jpoimboe@redhat.com \
    --cc=brgerst@gmail.com \
    --cc=byungchul.park@lge.com \
    --cc=fweisbec@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.