From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752875AbcDUXqh (ORCPT ); Thu, 21 Apr 2016 19:46:37 -0400 Received: from mail-oi0-f43.google.com ([209.85.218.43]:33825 "EHLO mail-oi0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751697AbcDUXqf (ORCPT ); Thu, 21 Apr 2016 19:46:35 -0400 MIME-Version: 1.0 In-Reply-To: References: <5707B70F.9080402@virtuozzo.com> <5707D9F1.3090102@virtuozzo.com> <570E79EF.7030408@virtuozzo.com> <20160420110402.GY3408@twins.programming.kicks-ass.net> <20160420190507.GF3408@twins.programming.kicks-ass.net> <20160421201241.GM3430@twins.programming.kicks-ass.net> From: Andy Lutomirski Date: Thu, 21 Apr 2016 16:46:14 -0700 Message-ID: Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode To: Peter Zijlstra Cc: Dmitry Safonov , Thomas Gleixner , Shuah Khan , Ingo Molnar , Dave Hansen , Borislav Petkov , khorenko@virtuozzo.com, X86 ML , Andrew Morton , xemul@virtuozzo.com, linux-kselftest@vger.kernel.org, Cyrill Gorcunov , Dmitry Safonov <0x7f454c46@gmail.com>, "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 21, 2016 at 4:27 PM, Andy Lutomirski wrote: > On Thu, Apr 21, 2016 at 1:12 PM, Peter Zijlstra wrote: >> On Thu, Apr 21, 2016 at 12:39:42PM -0700, Andy Lutomirski wrote: >>> On Wed, Apr 20, 2016 at 12:05 PM, Peter Zijlstra wrote: >>> > On Wed, Apr 20, 2016 at 08:40:23AM -0700, Andy Lutomirski wrote: >> >>> >> >> Peter, I got lost in the code that calls this. Are regs coming from >>> >> >> the overflow interrupt's regs, current_pt_regs(), or >>> >> >> perf_get_regs_user? >>> >> > >>> >> > So get_perf_callchain() will get regs from: >>> >> > >>> >> > - interrupt/NMI regs >>> >> > - perf_arch_fetch_caller_regs() >>> >> > >>> >> > And when user && !user_mode(), we'll use: >>> >> > >>> >> > - task_pt_regs() (which arguably should maybe be perf_get_regs_user()) >>> >> >>> >> Could you point me to this bit of the code? >>> > >>> > kernel/events/callchain.c:198 >>> >>> But that only applies to the callchain code, right? >> >> Yes, which is what I thought you were after.. >> >>> AFAICS the PEBS >>> code is invoked through the x86_pmu NMI handler and always gets the >>> IRQ regs. Except for this case: >>> >>> static inline void intel_pmu_drain_pebs_buffer(void) >>> { >>> struct pt_regs regs; >>> >>> x86_pmu.drain_pebs(®s); >>> } >>> >>> which seems a bit confused. >> >> Yes, so that only gets used with 'large' pebs, which requires no other >> flags than PERF_FRERERUNNING_FLAGS, which precludes the regs set from >> being used. >> >> Could definitely use a comment. >> >>> I don't suppose we could arrange to pass something consistent into the >>> PEBS handlers... >>> >>> Or is the PEBS code being called from the callchain code somehow? >> >> No. I think we were/are slightly talking past one another. >> >>> >> One call to perf_get_user_regs per interrupt shouldn't be too bad -- >>> >> certainly much better then one per PEBS record. One call to get user >>> >> ABI per overflow would be even less bad, but at that point, folding it >>> >> in to the PEBS code wouldn't be so bad either. >>> > >>> > Right; although note that the whole fixup_ip() thing requires a single >>> > record per interrupt (for we need the LBR state for each record in order >>> > to rewind). >>> >>> So do earlier PEBS events not get rewound? Or so we just program the >>> thing to only ever give us one event at a time? >> >> The latter; we program PEBS such that it can hold but a single record >> and thereby assure we get an interrupt for each record. >> >>> > The problem here is that the overflow stuff is designed for a single >>> > 'event' per interrupt, so passing it data for multiple events is >>> > somewhat icky. >>> >>> It also seems that there's a certain amount of confusion as to exactly >>> what "regs" means in various contexts. Or at least I'm confused by >>> it. >> >> Yes, there's too much regs. >> >> Typically 'regs' is the 'interrrupt'/'event' regs, that is the register >> set at eventing time. For sampling hardware PMUs this is NMI/IRQ like >> things, for software events this ends up being >> perf_arch_fetch_caller_regs(). >> >> Then there's PERF_SAMPLE_REGS_USER|PERF_SAMPLE_STACK_USER, which, for >> each event with it set, use perf_get_regs_user() to dump the thing into >> our ringbuffer as part of the event record. >> >> And then there's the callchain code, which first unwinds kernel space if >> the 'interrupt'/'event' reg set points into the kernel, and then uses >> task_pt_regs() (which I think we agree should be perf_get_regs_user()) >> to obtain the user regs to continue with the user stack unwind. >> >> Finally there's PERF_SAMPLE_REGS_INTR, which dumps whatever >> 'interrupt/event' regs we get into the ringbuffer sample record. >> >> >> Did that help? Or did I confuse you moar? >> > > I think I'm starting to get it. What if we rearrange slightly, like this: > I started fiddling to see what's involved, then I got to this: if (sample_type & PERF_SAMPLE_REGS_INTR) { u64 abi = data->regs_intr.abi; /* * If there are no regs to dump, notice it through * first u64 being zero (PERF_SAMPLE_REGS_ABI_NONE). */ perf_output_put(handle, abi); if (abi) { u64 mask = event->attr.sample_regs_intr; perf_output_sample_regs(handle, data->regs_intr.regs, mask); } } regs_intr.abi comes from perf_regs_abi(current), which, on x86_64 or arm64, may indicate 32-bit regs, but the actual regs are always 64-bit. Am I just confused or is this a bug? --Andy