From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758073AbaAJThk (ORCPT ); Fri, 10 Jan 2014 14:37:40 -0500 Received: from g6t0187.atlanta.hp.com ([15.193.32.64]:24834 "EHLO g6t0187.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751483AbaAJThi (ORCPT ); Fri, 10 Jan 2014 14:37:38 -0500 Message-ID: <52D04BE6.9050907@hp.com> Date: Fri, 10 Jan 2014 14:37:10 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Peter Zijlstra CC: Ingo Molnar , Arnaldo Carvalho de Melo , Linux Kernel Mailing List , Aswin Chandramouleeswaran , Scott J Norton , Linus Torvalds Subject: Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel References: <52D011C9.7000209@hp.com> <20140110165822.GI7572@laptop.programming.kicks-ass.net> <20140110170223.GD8224@laptop.programming.kicks-ass.net> In-Reply-To: <20140110170223.GD8224@laptop.programming.kicks-ass.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/10/2014 12:02 PM, Peter Zijlstra wrote: > On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote: >> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote: >>> Peter, >>> >>> Call Trace: >>> [] dump_stack+0x49/0x62 >>> [] warn_slowpath_common+0x8c/0xc0 >>> [] warn_slowpath_null+0x1a/0x20 >>> [] force_sig_info+0x131/0x140 >>> [] force_sig_info_fault+0x5f/0x70 >>> [] ? search_exception_tables+0x2a/0x50 >>> [] ? fixup_exception+0x1d/0x70 >>> [] no_context+0x159/0x1f0 >>> [] __bad_area_nosemaphore+0x12d/0x230 >>> [] ? __bad_area_nosemaphore+0x12d/0x230 >>> [] bad_area_nosemaphore+0x13/0x20 >>> [] __do_page_fault+0x362/0x480 >>> [] ? __do_page_fault+0x362/0x480 >>> [] do_page_fault+0xe/0x10 >>> [] page_fault+0x22/0x30 >>> [] ? bad_to_user+0x5e/0x66b >>> [] copy_from_user_nmi+0x76/0x90 >>> [] perf_callchain_user+0xd0/0x360 >>> [] perf_callchain+0x1af/0x1f0 >>> [] perf_prepare_sample+0x2f3/0x3a0 >>> [] __perf_event_overflow+0x10f/0x220 >>> [] perf_event_overflow+0x14/0x20 >>> [] intel_pmu_handle_irq+0x1de/0x3c0 >>> [] ? emulate_vsyscall+0x144/0x390 >>> [] perf_event_nmi_handler+0x34/0x60 >>> [] nmi_handle+0x8a/0x170 >>> [] default_do_nmi+0x68/0x210 >>> [] do_nmi+0x90/0xe0 >>> [] end_repeat_nmi+0x1e/0x2e >>> [] ? emulate_vsyscall+0x144/0x390 >>> [] ? emulate_vsyscall+0x144/0x390 >>> [] ? emulate_vsyscall+0x144/0x390 >>> <> [] __bad_area_nosemaphore+0x21d/0x230 >>> [] bad_area_nosemaphore+0x13/0x20 >>> [] __do_page_fault+0x362/0x480 >>> [] ? vm_mmap_pgoff+0xbc/0xe0 >>> [] do_page_fault+0xe/0x10 >>> [] page_fault+0x22/0x30 >>> ---[ end trace 037bf09d279751ec ]--- >>> >>> So this is a double page faults. Looking at relevant changes in >>> 3.13 kernel, I spotted the following one patch that modified the >>> perf_callchain_user() function shown up in the stack trace above: >>> >> Hurm, that's an expected double fault, not something we should take the >> process down for. >> >> I'll have to look at how all that works for a bit. > How easily can you reproduce this? Could you test something like the > below, which would allow us to take double faults from NMI context. The error can be readily reproducible in my current setup. > --- > arch/x86/mm/fault.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index 9ff85bb8dd69..18c498d4274d 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -641,7 +641,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, > > /* Are we prepared to handle this kernel fault? */ > if (fixup_exception(regs)) { > - if (current_thread_info()->sig_on_uaccess_error&& signal) { > + if (!in_nmi()&& current_thread_info()->sig_on_uaccess_error&& signal) { > tsk->thread.trap_nr = X86_TRAP_PF; > tsk->thread.error_code = error_code | PF_USER; > tsk->thread.cr2 = address; Yes, this change fixed the error that I got. I no longer see SIGSEGV when I run the test. I did tried to back out your "perf: Fix arch_perf_out_copy_user default" patch, but it didn't fix the problem. Thank for the quick turnaround! -Longman