From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757221AbcHBVAU (ORCPT ); Tue, 2 Aug 2016 17:00:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60628 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757107AbcHBVAP (ORCPT ); Tue, 2 Aug 2016 17:00:15 -0400 Date: Tue, 2 Aug 2016 16:00:11 -0500 From: Josh Poimboeuf To: Steven Rostedt Cc: Thomas Gleixner , Ingo Molnar , "H . Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Linus Torvalds , Brian Gerst , Kees Cook , Peter Zijlstra , Frederic Weisbecker , Byungchul Park Subject: Re: [PATCH 05/19] x86/dumpstack: fix function graph tracing stack dump reliability issues Message-ID: <20160802210011.k6li4mnyvky5jtev@treble> References: <20160729185521.62a5cd2a@gandalf.local.home> <20160730005059.5krpl2xsvqfbnntj@treble> <20160729222036.340b51ce@grimm.local.home> <20160730135125.qqw5qz24szlc2crz@treble> <20160801102821.1d6261a9@gandalf.local.home> <20160801153633.c7sa2rclkqwbdagd@treble> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160801153633.c7sa2rclkqwbdagd@treble> User-Agent: Mutt/1.6.0.1 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 02 Aug 2016 21:00:14 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 01, 2016 at 10:36:33AM -0500, Josh Poimboeuf wrote: > On Mon, Aug 01, 2016 at 10:28:21AM -0400, Steven Rostedt wrote: > > On Sat, 30 Jul 2016 08:51:25 -0500 > > Josh Poimboeuf wrote: > > > > > On Fri, Jul 29, 2016 at 10:20:36PM -0400, Steven Rostedt wrote: > > > > On Fri, 29 Jul 2016 19:50:59 -0500 > > > > Josh Poimboeuf wrote: > > > > > > > > > BTW, it would be really nice if ftrace_graph_ret_addr() were idempotent > > > > > so we could get the "real" return address without having to pass in a > > > > > state variable. > > > > > > > > > > For example we could add an "unsigned long *retp" pointer to > > > > > ftrace_ret_stack, which points to the return address on the stack. Then > > > > > we could get rid of the index state variable in ftrace_graph_ret_addr, > > > > > and also then there would never be a chance of the stack dump getting > > > > > out of sync with the ret_stack. > > > > > > > > > > What do you think? > > > > > > > > > > > > > I don't want to extend ret_stack as that is allocated 50 of these > > > > structures for every task. That said, we have the "fp" field that's > > > > used to check for frame pointer corruption when mcount is used. With > > > > CC_USING_FENTRY, that field is ignored. Perhaps we could overload that > > > > field for this. > > > > > > In that case, I guess we would need two versions of > > > ftrace_graph_ret_addr(), with the current implementation still needed > > > for mcount+HAVE_FUNCTION_GRAPH_FP_TEST. > > > > How hard would it be in that case? > > Well, it would be easy enough, but then the caller would still need to > pass in the state variable. So maybe it's not worth the trouble. I did some stack trace testing on mainline with function graph tracing. As it turns out, print_ftrace_graph_addr() is already buggy today if the caller of dump_trace() specifies a stack pointer or a pt_regs (which is usually done in order to skip some irrelevant stack frames in the trace). For example, here's a stack trace based on NMI regs: $ echo 1 > /proc/sys/kernel/sysrq $ echo l > /proc/sysrq-trigger ... Call Trace: [] ? __x2apic_send_IPI_dest.constprop.4+0x31/0x40 [] __x2apic_send_IPI_mask+0x95/0xe0 [] ? irq_force_complete_move+0xf0/0xf0 [] x2apic_send_IPI_mask+0x13/0x20 [] nmi_raise_cpu_backtrace+0x1b/0x20 [] nmi_trigger_all_cpu_backtrace+0xc6/0xf0 [] arch_trigger_all_cpu_backtrace+0x19/0x20 [] sysrq_handle_showallcpus+0x13/0x20 [] __handle_sysrq+0x138/0x220 [] ? __handle_sysrq+0x5/0x220 [] write_sysrq_trigger+0x51/0x60 [] proc_reg_write+0x42/0x70 [] __vfs_write+0x37/0x140 [] ? update_fast_ctr+0x51/0x80 [] ? percpu_down_read+0x57/0xa0 [] ? __sb_start_write+0xb4/0xf0 [] ? __sb_start_write+0xb4/0xf0 [] vfs_write+0xb8/0x1a0 [] SyS_write+0x58/0xc0 [] entry_SYSCALL_64_fastpath+0x1f/0xbd And here's the same trace with function graph tracing: $ echo function_graph > /sys/kernel/debug/tracing/current_tracer $ echo l > /proc/sysrq-trigger ... Call Trace: [] ? __x2apic_send_IPI_dest.constprop.4+0x31/0x40 [] ftrace_graph_caller+0xa8/0xa8 [] print_context_stack+0xfc/0x100 [] ? irq_force_complete_move+0xf0/0xf0 [] ftrace_graph_caller+0xa8/0xa8 [] dump_trace+0x12b/0x350 [] ftrace_graph_caller+0xa8/0xa8 [] show_trace_log_lvl+0x4b/0x60 [] ftrace_graph_caller+0xa8/0xa8 [] show_stack_log_lvl+0x136/0x1d0 [] arch_trigger_all_cpu_backtrace+0x19/0x20 [] ftrace_graph_caller+0xa8/0xa8 [] show_regs+0xa8/0x1b0 [] ftrace_graph_caller+0xa8/0xa8 [] nmi_cpu_backtrace+0x46/0x60 [] ? __handle_sysrq+0x5/0x220 [] ftrace_graph_caller+0xa8/0xa8 [] nmi_handle+0xbf/0x2f0 [] ftrace_graph_caller+0xa8/0xa8 [] default_do_nmi+0x73/0x180 [] ftrace_graph_caller+0xa8/0xa8 [] do_nmi+0x119/0x170 [] ? ftrace_return_to_handler+0x9d/0x110 [] ? __vfs_write+0x5/0x140 [] ? __vfs_write+0x5/0x140 [] ftrace_graph_caller+0xa8/0xa8 [] __x2apic_send_IPI_mask+0x95/0xe0 [] ftrace_graph_caller+0xa8/0xa8 [] x2apic_send_IPI_mask+0x13/0x20 [] ftrace_graph_caller+0xa8/0xa8 [] nmi_raise_cpu_backtrace+0x1b/0x20 The ret_stack is out of sync with the stack dump because the stack dump was started with the regs from the NMI, instead of being started from the current frame. So I guess there are a couple of ways to fix it: a) keep track of the return address pointer like we discussed above; or b) have the unwinder count the # of skipped frames which refer to 'return_to_handler', and pass that as the initial index value to ftrace_graph_ret_addr(). Option a) would be much cleaner. But to fix it for both mcount and fentry, we couldn't override 'fp' so I guess we'd need to add a new field to ftrace_ret_stack. Option b) is uglier, but I could probably make it work with the new unwinder. -- Josh