From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756004AbcHBWOS (ORCPT ); Tue, 2 Aug 2016 18:14:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55576 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753994AbcHBWOE (ORCPT ); Tue, 2 Aug 2016 18:14:04 -0400 Date: Tue, 2 Aug 2016 17:13:59 -0500 From: Josh Poimboeuf To: Steven Rostedt Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Linus Torvalds , Brian Gerst , Kees Cook , Peter Zijlstra , Frederic Weisbecker , Byungchul Park Subject: Re: [PATCH 05/19] x86/dumpstack: fix function graph tracing stack dump reliability issues Message-ID: <20160802221359.tkb3mkjzkq3petrq@treble> References: <20160729185521.62a5cd2a@gandalf.local.home> <20160730005059.5krpl2xsvqfbnntj@treble> <20160729222036.340b51ce@grimm.local.home> <20160730135125.qqw5qz24szlc2crz@treble> <20160801102821.1d6261a9@gandalf.local.home> <20160801153633.c7sa2rclkqwbdagd@treble> <20160802210011.k6li4mnyvky5jtev@treble> <20160802171610.09156c90@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160802171610.09156c90@gandalf.local.home> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 02 Aug 2016 22:14:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 02, 2016 at 05:16:10PM -0400, Steven Rostedt wrote: > On Tue, 2 Aug 2016 16:00:11 -0500 > Josh Poimboeuf wrote: > > [] nmi_raise_cpu_backtrace+0x1b/0x20 > > > > The ret_stack is out of sync with the stack dump because the stack dump > > was started with the regs from the NMI, instead of being started from > > the current frame. > > > > So I guess there are a couple of ways to fix it: > > > > a) keep track of the return address pointer like we discussed above; > > > > or > > > > b) have the unwinder count the # of skipped frames which refer to > > 'return_to_handler', and pass that as the initial index value to > > ftrace_graph_ret_addr(). > > > > Option a) would be much cleaner. But to fix it for both mcount and > > fentry, we couldn't override 'fp' so I guess we'd need to add a new > > field to ftrace_ret_stack. > > Actually, what about calling ftrace_graph_ret_addr() to figure out the > next stack conversion only if reliable or CONFIG_FRAME_POINTER is not > enabled? > > unsigned long real_addr = addr; > > [...] > > if (!IS_ENABLED(CONFIG_FRAME_POINTER) || reliable) > real_addr = ftrace_graph_ret_addr(task, graph, addr); > if (addr != real_addr) > ops->address(data, addr, 0); > ops->address(data, real_addr, reliable); > > Then we only need the fp use case when FRAME_POINTER is not set. As > mcount forces FRAME_POINTER, we only need to worry about the fentry > case. Hm, I'm confused. First, I don't see where mcount forces FRAME_POINTER. Second, I don't see why that even matters. If mcount and frame pointers are enabled, then the 'fp' field of ftrace_ret_stack is needed for the gcc sanity check, right? So we couldn't override 'fp', and the old "stateful index" version of ftrace_graph_ret_addr() would have to be used in the code above for reliable addresses, and we'd still have the same out-of-sync bug. Or am I missing something? -- Josh