From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751957AbcDTTFZ (ORCPT ); Wed, 20 Apr 2016 15:05:25 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:39244 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750850AbcDTTFX (ORCPT ); Wed, 20 Apr 2016 15:05:23 -0400 Date: Wed, 20 Apr 2016 21:05:07 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Dmitry Safonov , Thomas Gleixner , Shuah Khan , Ingo Molnar , Dave Hansen , Borislav Petkov , khorenko@virtuozzo.com, X86 ML , Andrew Morton , xemul@virtuozzo.com, linux-kselftest@vger.kernel.org, Cyrill Gorcunov , Dmitry Safonov <0x7f454c46@gmail.com>, "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Message-ID: <20160420190507.GF3408@twins.programming.kicks-ass.net> References: <57064E6C.2030202@virtuozzo.com> <5707B70F.9080402@virtuozzo.com> <5707D9F1.3090102@virtuozzo.com> <570E79EF.7030408@virtuozzo.com> <20160420110402.GY3408@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 20, 2016 at 08:40:23AM -0700, Andy Lutomirski wrote: > Do LBR, PEBS, and similar report user regs or do they merely want to > know the instruction format? If the latter, I could whip up a tiny > function to do just that (like perf_get_regs_user but just for ABI -- > it would be simpler). Just the instruction format, nothing else. > >> Peter, I got lost in the code that calls this. Are regs coming from > >> the overflow interrupt's regs, current_pt_regs(), or > >> perf_get_regs_user? > > > > So get_perf_callchain() will get regs from: > > > > - interrupt/NMI regs > > - perf_arch_fetch_caller_regs() > > > > And when user && !user_mode(), we'll use: > > > > - task_pt_regs() (which arguably should maybe be perf_get_regs_user()) > > Could you point me to this bit of the code? kernel/events/callchain.c:198 > > to call perf_callchain_user(), which then, ands up calling > > perf_callchain_user32() which is expected to NO-OP for 64bit userspace. > > > >> If it's the perf_get_regs_user, then this should be okay, but passing > >> in the ABI field directly would be even nicer. If they're coming from > >> the overflow interrupt's regs or current_pt_regs(), could we change > >> that? > >> > >> It might also be nice to make sure that we call perf_get_regs_user > >> exactly once per overflow interrupt -- i.e. we could push it into the > >> main code rather than the regs sampling code. > > > > The risk there is that we might not need the user regs at all to handle > > the overflow thingy, so doing it unconditionally would be unwanted. > > One call to perf_get_user_regs per interrupt shouldn't be too bad -- > certainly much better then one per PEBS record. One call to get user > ABI per overflow would be even less bad, but at that point, folding it > in to the PEBS code wouldn't be so bad either. Right; although note that the whole fixup_ip() thing requires a single record per interrupt (for we need the LBR state for each record in order to rewind). Also, HSW+ PEBS doesn't do the fixup anymore. > If I'm understanding this right (a big, big if), if we get a PEBS > overflow while running in user mode, we'll dump out the user regs (and > call perf_get_regs_user) and all the PEBS entries (subject to > exclude_kernel and with all the decoding magic). So, in that case, we > call perf_get_user_regs. We only dump user regs if PERF_SAMPLE_REGS_USER, and in case we hit userspace userspace with the interrupt we use the interrupt regs; see perf_sample_regs_user(). > If we get a PEBS overflow while running in kernel mode, we'll report > the kernel regs (if !exclude_kernel) and report the PEBS data as well. > If any of those records are in user mode, then, ideally, we'd invoke > perf_get_regs_user or similar *once* to get the ABI. Although, if we > can get the user ABI efficiently enough, then maybe we don't care if > we call it once per PEBS record. Right, if we interrupt kernel mode, we'll call perf_get_regs_user() if PERF_SAMPLE_REGS_USER (| PERF_SAMPLE_STACK_USER). The problem here is that the overflow stuff is designed for a single 'event' per interrupt, so passing it data for multiple events is somewhat icky.