From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [PATCH v10 09/20] x86/VPMU: Add public xenpmu.h Date: Mon, 15 Sep 2014 20:49:15 -0400 Message-ID: <5417890B.7070101@oracle.com> References: <54115FCA0200007800033A6C@mail.emea.novell.com> <5411A97C.7090100@oracle.com> <5411D40F0200007800033FDA@mail.emea.novell.com> <5411BF12.6090600@oracle.com> <5411E2F502000078000340BA@mail.emea.novell.com> <5411D31B.90403@oracle.com> <5412B3DE02000078000344A0@mail.emea.novell.com> <54130169.9000006@oracle.com> <5413219402000078000348D8@mail.emea.novell.com> <54130ED8.1000707@oracle.com> <20140915115635.GA4831@laptop.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140915115635.GA4831@laptop.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: kevin.tian@intel.com, keir@xen.org, Jan Beulich , jun.nakajima@intel.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, eddie.dong@intel.com, Aravind.Gopalakrishnan@amd.com, suravee.suthikulpanit@amd.com List-Id: xen-devel@lists.xenproject.org On 09/15/2014 07:56 AM, Konrad Rzeszutek Wilk wrote: > On Fri, Sep 12, 2014 at 11:18:48AM -0400, Boris Ostrovsky wrote: >> On 09/12/2014 10:38 AM, Jan Beulich wrote: >>>>>> On 12.09.14 at 16:21, wrote: >>>> On 09/12/2014 02:50 AM, Jan Beulich wrote: >>>>> Okay, so you're tailoring the hypervisor interface to Linux. That's >>>>> not what we generally aim for, and hence I continue to think this >>>>> isn't the right approach. >>>> But we know that at least one consumer of this interface (Linux/perf) >>>> does want to have full GPR set. >>> You said they use a struct pt_regs somewhere - that's not the same >>> as struct cpu_user_regs (and the overlap of registers being in both >>> is not much more than a coincidence), and you didn't clarify yet how >>> much of this information is _actually_ getting consumed (i.e. this may >>> well just happen to be a suitable container for conveying a smaller >>> set). >> I don't know how all registers are consumed in Linux code (except for RIP >> and CS) and I am not sure it's all that important: we know that we need to >> pass GPRs to perf infrastructure which, for the most part, is a black box >> (especially given the insane amount of code churn that happens in perf). >> >> If and when we find out that something that perf wants is not provided in >> pt_regs (converted from cpu_user_regs) then we will have to deal with it. >> But it's not much different from what would happen if we use specific >> registers in the shared structure. >> >>>> So why not provide it? Especially since >>>> (I suspect that) doing memcpy may be faster than copying only selected >>>> fields. >>>> >>>> If you are categorically against this I can certainly rework this and >>>> pass RIP, RSP, CS and EFLAGS only. >>> My problem with this is that (a) it exposes an arbitrary subset of >>> registers and (b) in an incompatible way (a 32-bit profiling domain >>> should very well be told all 64-bit registers of a 64-bit guest it >>> profiles). Since b would need addressing by a custom structure >>> anyway, dealing with a at the same time seems reasonable to me. >> Note that (b) is not supported --- 32-bit domains can only profile >> themselves and not the hypervisor (or other domains). > And that is due to the tools limitation of not being able to understand > 64-bit values right?. Right. Specifically, in perf case it will be very surprised to see 64-bit samples (in fact, it won't see them as 64-bit since registers that it sees are 32-bit). > >>> But of course it would help if other interested parties would voice >>> their view of this... > My thinking (with the release manager hat) is thishat: > > 1). Adding in the support for 32-bit consumer is nice but won't be > used that often (64-bit consumer with 32-bit 'producer' - as in > 32-bit guests - will surely be present). Profiling hypervisor from 32-bit dom0 is not planned at all. This will require new toolset. > > 2). Having the right ABI for these perf counters is hard across > two set of vendors (and what if we want to expand this for ARM?). Intel/AMD, I think/hope, is taken care of. ARM is now stubbed out, most of the code is in x86-specific directory. -boris > > 3). The one 'generic' profiling tool that worked across all OSes > was 'oprofile' which has hit the dustpan. We still support > it with the oprofile hypercalls. Doing something generic > might not be the best way going forward. > > 4). The ABI should have means to look in the future in case things > change (and then there is a method to provide the version information > in case the ABI alters). Future version of the Intel > or AMD performance counters architecture might expand the > amount of registers they expose. Or they might not. > Either way, a new minor rev would have to be increased to > benefit from this. > > 5). Exposing only CS, RIP, and EFLAGS seems to be tailoring the > interface for Linux perf use case. > > 6). Exposing all of the GPRs, while not needed for the current > set of Linux patches, could allow other tools to use this > (Intel's tracing tool for example - which uses the perf > ABI). But surely there are tools on FreeBSD, NetBSD that > do profiling. > > > If I understand Jan's concerns correctly, the question is - > if we are not using all of the GPRs right now in the tools > why even bother expanding it? And if the new version of > the architecture does expose it - and we have the support > - then we can expand the page structure? (and of course > rev the major). > > While Boris's view is that the condition above will happen - > we will expand the registers/use case - so why do the > intermediate step of expanding the page structure - when we can > make the structure bigger now (and rev the minor). > > Is that about right?