From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752148AbcGVFNZ (ORCPT ); Fri, 22 Jul 2016 01:13:25 -0400 Received: from mail-vk0-f47.google.com ([209.85.213.47]:33797 "EHLO mail-vk0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750920AbcGVFNY (ORCPT ); Fri, 22 Jul 2016 01:13:24 -0400 MIME-Version: 1.0 In-Reply-To: <20160722033008.w24xxgvvlab5xvbf@treble> References: <23efe18b84aaa45d55b917d880a588c11091b788.1469136008.git.jpoimboe@redhat.com> <20160722033008.w24xxgvvlab5xvbf@treble> From: Andy Lutomirski Date: Thu, 21 Jul 2016 22:13:03 -0700 Message-ID: Subject: Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack To: Josh Poimboeuf Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , X86 ML , "linux-kernel@vger.kernel.org" , Linus Torvalds , Steven Rostedt , Brian Gerst , Kees Cook , Peter Zijlstra , Frederic Weisbecker , Byungchul Park Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 21, 2016 at 8:30 PM, Josh Poimboeuf wrote: > On Thu, Jul 21, 2016 at 03:32:32PM -0700, Andy Lutomirski wrote: >> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf wrote: >> > Now that we can find pt_regs registers in the middle of the stack due to >> > an interrupt or exception, we can print them. Here's what it looks >> > like: >> > >> > ... >> > [] do_async_page_fault+0x2c/0xa0 >> > [] async_page_fault+0x28/0x30 >> > RIP: 0010:[] [] __clear_user+0x42/0x70 >> > RSP: 0018:ffff88007876fd38 EFLAGS: 00010202 >> > RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138 >> > RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640 >> > RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000 >> > R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640 >> > R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200 >> > [] ? __clear_user+0x42/0x70 >> > [] ? __clear_user+0x23/0x70 >> > [] clear_user+0x2b/0x40 >> > ... >> >> This looks wrong. Here are some theories: >> >> (a) __clear_user is a reliable address that is indicated by RIP: .... >> Then it's found again as an unreliable address as "? >> __clear_user+0x42/0x70" by scanning the stack. "? >> __clear_user+0x23/0x70" is a genuine leftover artifact on the stack. >> In this case, shouldn't "? __clear_user+0x42/0x70" have been >> suppressed because it matched a reliable address? >> >> (b) You actually intended for all the addresses to be printed, in >> which case "? __clear_user+0x42/0x70" should have been >> "__clear_user+0x42/0x70" and you have a bug. In this case, it's >> plausible that your state machine got a bit lost leading to "? >> __clear_user+0x23/0x70" as well (i.e. it's not just an artifact -- >> it's a real frame and you didn't find it). >> >> (c) Something else and I'm confused. > > So there's a subtle difference between addresses reported by regs->ip > and normal return addresses. For example: > > ... > [] smp_apic_timer_interrupt+0x3d/0x50 > [] apic_timer_interrupt+0x9e/0xb0 > RIP: 0010:[] [] path_init+0x0/0x750 > RSP: 0018:ffff880036a3fd80 EFLAGS: 00000296 > RAX: ffff88003691aa40 RBX: ffff880036a3ff08 RCX: ffff880036a3ff08 > RDX: ffff880036a3ff08 RSI: 0000000000000041 RDI: ffff880036a3fdb0 > RBP: ffff880036a3fda0 R08: 0000000000000000 R09: 0000000000000010 > R10: 8080808080808080 R11: fefefefefefefeff R12: ffff880036a3fdb0 > R13: 0000000000000001 R14: ffff880036a3ff08 R15: 0000000000000000 > > [] ? lookup_fast+0x3d0/0x3d0 > [] ? path_lookupat+0x1b/0x120 > [] filename_lookup+0xb1/0x180 > ... > > In this case the irq hit right after path_lookupat() called into > path_init(). So the "path_init+0x0" printed by __show_regs() is right. > > Note the backtrace reports the same address, but it instead describes it > as "lookup_fast+0x3d0", which is the end of lookup_fast(). That's > because normally, such an address after a call instruction at the end of > a function would indicate a tail call (e.g., to a noreturn function). > If that were the case, printing "path_init+0x0" would be completely > wrong, because path_init() just happens to be the function located > immediately after lookup_fast(). > > Maybe I could add some special logic to say: "if this return address was > from a call, use printk_stack_address(); else if it was from a pt_regs, > use printk_address()." (The former prints the preceding function, the > latter prints the current function.) Then we could remove the question > mark. > > There's also the question of whether or not the address should be > printed again, after it's already been printed by __show_regs(). I > don't have a strong opinion either way. > IIRC we don't show the actual faulting function in the call trace, so we probably shouldn't duplicate the entry after the show_regs. That being said, I'm still confused by the question marks. What exactly is going on? Is the code really doing the right thing wrt resuming the unwind? Is there a git tree with these patches applied somewhere so I can look at it easily in context? --Andy