Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
@ 2011-04-22  8:47 Stephane Eranian
  2011-04-22  9:23 ` Ingo Molnar
  0 siblings, 1 reply; 80+ messages in thread
From: Stephane Eranian @ 2011-04-22  8:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Peter Zijlstra, eranian, Arun Sharma

On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> This needs to be a *lot* more user friendly. Users do not want to type in
>> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> era really.
>>
>> Unless there's proper generalized and human usable support i'm leaning
>> towards turning off the offcore user-space accessible raw bits for now, and
>> use them only kernel-internally, for the cache events.
>
Generic cache events are a myth. They are not usable. I keep getting questions
from users because nobody knows what they are actually counting, thus nobody
knows how to interpret the counts. You cannot really hide the micro-architecture
if you want to make any sensible measurements.

I agree with the poor usability of perf when you have to pass hex
values for events.
But that's why I have a user level library to map event strings to
event codes for perf.
Arun Sharma posted a patch a while ago to connect this library with perf, so far
it's been ignored, it seems:
    perf stat -e offcore_response_0:dmd_data_rd foo


> I'm about to push out the patch attached below - it lays out the arguments in
> detail. I don't think we have time to fix this properly for .39 - but memory
> profiling could be a nice feature for v2.6.40.
>
You will not be able to do any reasonable memory profiling using
offcore response
events. Dont' expect a profile to point to the missing loads. If
you're lucky it would
point to the use instruction.


> --------------------->
> From b52c55c6a25e4515b5e075a989ff346fc251ed09 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 22 Apr 2011 08:44:38 +0200
> Subject: [PATCH] x86, perf event: Turn off unstructured raw event access to offcore registers
>
> Andi Kleen pointed out that the Intel offcore support patches were merged
> without user-space tool support to the functionality:
>
>  |
>  | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
>  | user space bits were not. This made it impossible to set the extra mask
>  | and actually do the OFFCORE profiling
>  |
>
> Andi submitted a preliminary patch for user-space support, as an
> extension to perf's raw event syntax:
>
>  |
>  | Some raw events -- like the Intel OFFCORE events -- support additional
>  | parameters. These can be appended after a ':'.
>  |
>  | For example on a multi socket Intel Nehalem:
>  |
>  |    perf stat -e r1b7:20ff -a sleep 1
>  |
>  | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
>  | that measures any access to DRAM on another socket.
>  |
>
> But this kind of usability is absolutely unacceptable - users should not
> be expected to type in magic, CPU and model specific incantations to get
> access to useful hardware functionality.
>
> The proper solution is to expose useful offcore functionality via
> generalized events - that way users do not have to care which specific
> CPU model they are using, they can use the conceptual event and not some
> model specific quirky hexa number.
>
> We already have such generalization in place for CPU cache events,
> and it's all very extensible.
>
> "Offcore" events measure general DRAM access patters along various
> parameters. They are particularly useful in NUMA systems.
>
> We want to support them via generalized DRAM events: either as the
> fourth level of cache (after the last-level cache), or as a separate
> generalization category.
>
> That way user-space support would be very obvious, memory access
> profiling could be done via self-explanatory commands like:
>
>  perf record -e dram ./myapp
>  perf record -e dram-remote ./myapp
>
> ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
> accesses.
>
> These generalized events would work on all CPUs and architectures that
> have comparable PMU features.
>
> ( Note, these are just examples: actual implementation could have more
>  sophistication and more parameter - as long as they center around
>  similarly simple usecases. )
>
> Now we do not want to revert *all* of the current offcore bits, as they
> are still somewhat useful for generic last-level-cache events, implemented
> in this commit:
>
>  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
>
> But we definitely do not yet want to expose the unstructured raw events
> to user-space, until better generalization and usability is implemented
> for these hardware event features.
>
> ( Note: after generalization has been implemented raw offcore events can be
>  supported as well: there can always be an odd event that is marginally
>  useful but not useful enough to generalize. DRAM profiling is definitely
>  *not* such a category so generalization must be done first. )
>
> Furthermore, PERF_TYPE_RAW access to these registers was not intended
> to go upstream without proper support - it was a side-effect of the above
> e994d7d23a0b commit, not mentioned in the changelog.
>
> As v2.6.39 is nearing release we go for the simplest approach: disable
> the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
> kernel and becomes an ABI.
>
> Once proper structure is implemented for these hardware events and users
> are offered usable solutions we can revisit this issue.
>
> Reported-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/cpu/perf_event.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index eed3673a..632e5dc 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -586,8 +586,12 @@ static int x86_setup_perfctr(struct perf_event *event)
>                        return -EOPNOTSUPP;
>        }
>
> +       /*
> +        * Do not allow config1 (extended registers) to propagate,
> +        * there's no sane user-space generalization yet:
> +        */
>        if (attr->type == PERF_TYPE_RAW)
> -               return x86_pmu_extra_regs(event->attr.config, event);
> +               return 0;
>
>        if (attr->type == PERF_TYPE_HW_CACHE)
>                return set_ext_hw_attr(hwc, event);
>

^ permalink raw reply	[flat|nested] 80+ messages in thread