From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753466AbcDUTkL (ORCPT ); Thu, 21 Apr 2016 15:40:11 -0400 Received: from mail-ob0-f169.google.com ([209.85.214.169]:35410 "EHLO mail-ob0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752506AbcDUTkH (ORCPT ); Thu, 21 Apr 2016 15:40:07 -0400 MIME-Version: 1.0 In-Reply-To: <20160420190507.GF3408@twins.programming.kicks-ass.net> References: <57064E6C.2030202@virtuozzo.com> <5707B70F.9080402@virtuozzo.com> <5707D9F1.3090102@virtuozzo.com> <570E79EF.7030408@virtuozzo.com> <20160420110402.GY3408@twins.programming.kicks-ass.net> <20160420190507.GF3408@twins.programming.kicks-ass.net> From: Andy Lutomirski Date: Thu, 21 Apr 2016 12:39:42 -0700 Message-ID: Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode To: Peter Zijlstra Cc: Dmitry Safonov , Thomas Gleixner , Shuah Khan , Ingo Molnar , Dave Hansen , Borislav Petkov , khorenko@virtuozzo.com, X86 ML , Andrew Morton , xemul@virtuozzo.com, linux-kselftest@vger.kernel.org, Cyrill Gorcunov , Dmitry Safonov <0x7f454c46@gmail.com>, "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 20, 2016 at 12:05 PM, Peter Zijlstra wrote: > On Wed, Apr 20, 2016 at 08:40:23AM -0700, Andy Lutomirski wrote: >> Do LBR, PEBS, and similar report user regs or do they merely want to >> know the instruction format? If the latter, I could whip up a tiny >> function to do just that (like perf_get_regs_user but just for ABI -- >> it would be simpler). > > Just the instruction format, nothing else. > >> >> Peter, I got lost in the code that calls this. Are regs coming from >> >> the overflow interrupt's regs, current_pt_regs(), or >> >> perf_get_regs_user? >> > >> > So get_perf_callchain() will get regs from: >> > >> > - interrupt/NMI regs >> > - perf_arch_fetch_caller_regs() >> > >> > And when user && !user_mode(), we'll use: >> > >> > - task_pt_regs() (which arguably should maybe be perf_get_regs_user()) >> >> Could you point me to this bit of the code? > > kernel/events/callchain.c:198 But that only applies to the callchain code, right? AFAICS the PEBS code is invoked through the x86_pmu NMI handler and always gets the IRQ regs. Except for this case: static inline void intel_pmu_drain_pebs_buffer(void) { struct pt_regs regs; x86_pmu.drain_pebs(®s); } which seems a bit confused. I don't suppose we could arrange to pass something consistent into the PEBS handlers... Or is the PEBS code being called from the callchain code somehow? I haven't dug in to the LBR code much. >> >> One call to perf_get_user_regs per interrupt shouldn't be too bad -- >> certainly much better then one per PEBS record. One call to get user >> ABI per overflow would be even less bad, but at that point, folding it >> in to the PEBS code wouldn't be so bad either. > > Right; although note that the whole fixup_ip() thing requires a single > record per interrupt (for we need the LBR state for each record in order > to rewind). So do earlier PEBS events not get rewound? Or so we just program the thing to only ever give us one event at a time? > > Also, HSW+ PEBS doesn't do the fixup anymore. > >> If I'm understanding this right (a big, big if), if we get a PEBS >> overflow while running in user mode, we'll dump out the user regs (and >> call perf_get_regs_user) and all the PEBS entries (subject to >> exclude_kernel and with all the decoding magic). So, in that case, we >> call perf_get_user_regs. > > We only dump user regs if PERF_SAMPLE_REGS_USER, and in case we hit > userspace userspace with the interrupt we use the interrupt regs; see > perf_sample_regs_user(). > >> If we get a PEBS overflow while running in kernel mode, we'll report >> the kernel regs (if !exclude_kernel) and report the PEBS data as well. >> If any of those records are in user mode, then, ideally, we'd invoke >> perf_get_regs_user or similar *once* to get the ABI. Although, if we >> can get the user ABI efficiently enough, then maybe we don't care if >> we call it once per PEBS record. > > Right, if we interrupt kernel mode, we'll call perf_get_regs_user() if > PERF_SAMPLE_REGS_USER (| PERF_SAMPLE_STACK_USER). But not in get_perf_callchain. So we'll show the correct user *regs* but not the current user callchain under some conditions, AFAICS. > > The problem here is that the overflow stuff is designed for a single > 'event' per interrupt, so passing it data for multiple events is > somewhat icky. It also seems that there's a certain amount of confusion as to exactly what "regs" means in various contexts. Or at least I'm confused by it. --Andy