From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758041AbaAJToK (ORCPT ); Fri, 10 Jan 2014 14:44:10 -0500 Received: from g4t0016.houston.hp.com ([15.201.24.19]:8481 "EHLO g4t0016.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751172AbaAJToI (ORCPT ); Fri, 10 Jan 2014 14:44:08 -0500 Message-ID: <52D04D6D.9010504@hp.com> Date: Fri, 10 Jan 2014 14:43:41 -0500 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Andy Lutomirski CC: Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Linux Kernel Mailing List , Aswin Chandramouleeswaran , Scott J Norton , Linus Torvalds Subject: Re: SIGSEGV when using "perf record -g" with 3.13-rc* kernel References: <52D011C9.7000209@hp.com> <20140110165822.GI7572@laptop.programming.kicks-ass.net> <20140110170223.GD8224@laptop.programming.kicks-ass.net> <20140110174141.GE8224@laptop.programming.kicks-ass.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/10/2014 01:54 PM, Andy Lutomirski wrote: > On Fri, Jan 10, 2014 at 9:41 AM, Peter Zijlstra wrote: >> On Fri, Jan 10, 2014 at 06:02:23PM +0100, Peter Zijlstra wrote: >>> On Fri, Jan 10, 2014 at 05:58:22PM +0100, Peter Zijlstra wrote: >>>> On Fri, Jan 10, 2014 at 10:29:13AM -0500, Waiman Long wrote: >>>>> Peter, >>>>> >>>>> Call Trace: >>>>> [] dump_stack+0x49/0x62 >>>>> [] warn_slowpath_common+0x8c/0xc0 >>>>> [] warn_slowpath_null+0x1a/0x20 >>>>> [] force_sig_info+0x131/0x140 >>>>> [] force_sig_info_fault+0x5f/0x70 >>>>> [] ? search_exception_tables+0x2a/0x50 >>>>> [] ? fixup_exception+0x1d/0x70 >>>>> [] no_context+0x159/0x1f0 >>>>> [] __bad_area_nosemaphore+0x12d/0x230 >>>>> [] ? __bad_area_nosemaphore+0x12d/0x230 >>>>> [] bad_area_nosemaphore+0x13/0x20 >>>>> [] __do_page_fault+0x362/0x480 >>>>> [] ? __do_page_fault+0x362/0x480 >>>>> [] do_page_fault+0xe/0x10 >>>>> [] page_fault+0x22/0x30 >>>>> [] ? bad_to_user+0x5e/0x66b >>>>> [] copy_from_user_nmi+0x76/0x90 >>>>> [] perf_callchain_user+0xd0/0x360 >>>>> [] perf_callchain+0x1af/0x1f0 >>>>> [] perf_prepare_sample+0x2f3/0x3a0 >>>>> [] __perf_event_overflow+0x10f/0x220 >>>>> [] perf_event_overflow+0x14/0x20 >>>>> [] intel_pmu_handle_irq+0x1de/0x3c0 >>>>> [] ? emulate_vsyscall+0x144/0x390 >>>>> [] perf_event_nmi_handler+0x34/0x60 >>>>> [] nmi_handle+0x8a/0x170 >>>>> [] default_do_nmi+0x68/0x210 >>>>> [] do_nmi+0x90/0xe0 >>>>> [] end_repeat_nmi+0x1e/0x2e >>>>> [] ? emulate_vsyscall+0x144/0x390 >>>>> [] ? emulate_vsyscall+0x144/0x390 >>>>> [] ? emulate_vsyscall+0x144/0x390 >>>>> <> [] __bad_area_nosemaphore+0x21d/0x230 >>>>> [] bad_area_nosemaphore+0x13/0x20 >>>>> [] __do_page_fault+0x362/0x480 >>>>> [] ? vm_mmap_pgoff+0xbc/0xe0 >>>>> [] do_page_fault+0xe/0x10 >>>>> [] page_fault+0x22/0x30 >>>>> ---[ end trace 037bf09d279751ec ]--- >>>>> >>>>> So this is a double page faults. Looking at relevant changes in >>>>> 3.13 kernel, I spotted the following one patch that modified the >>>>> perf_callchain_user() function shown up in the stack trace above: >>>>> >>>> Hurm, that's an expected double fault, not something we should take the >>>> process down for. >>>> >>>> I'll have to look at how all that works for a bit. >> Andy, introduced all this in 4fc3490114bb ("x86-64: Set siginfo and >> context on vsyscall emulation faults"). >> >> It looks like your initial userspace fault hit the magic button and ends >> up in emulate_vsyscall. Right at that point we trigger a PMI, which >> tries to do a stack-trace. That stack-trace also stumbles into unmapped >> memory (might be the same) and faults again. >> >> Now at that point, we usually just give up on the callchain and proceed >> like normal, however because of this double fault emulate-vsyscall >> SIGSEGV magic you loose. >> >> So the below might well be a valid fix.. Anybody? Andy? > Yuck -- when I wrote that thing, I hadn't imagined that an interrupt > (there's nothing particularly special about NMIs here, I think) would > try to access user memory. The fix below looks okay, but IMO it needs > a big fat comment explaining what's going on. > > Is there a way to ask whether the previous entry into the kernel came > from user space? The valid "sig_on_uaccess_error" case happens when > the current fault was triggered by a fault from userspace. The > invalid case (and any invalid case from, say, an int3 that a > tracepoint stuck in there) would be a page fault triggered by a fault > handler that in turn started in kernel space (in particular, in > emulate_vsyscall). The processes that got the SIGSEGV were all running shell scripts. I am not totally sure that they were running in user space when getting the PMIs, but are likely the case. -Longman