linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 0/1] perf/urgent Fix missing support for config1/config2
@ 2011-04-21 17:41 Arnaldo Carvalho de Melo
  2011-04-21 17:41 ` [PATCH 1/1] perf tools: Add missing user space " Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 39+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-04-21 17:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Andi Kleen, Lin Ming,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

Hi Ingo,

        Please consider pulling from:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 perf/urgent

Regards,

- Arnaldo

Andi Kleen (1):
  perf tools: Add missing user space support for config1/config2

 tools/perf/Documentation/perf-list.txt |   11 +++++++++++
 tools/perf/util/parse-events.c         |   18 +++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletions(-)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-21 17:41 [GIT PULL 0/1] perf/urgent Fix missing support for config1/config2 Arnaldo Carvalho de Melo
@ 2011-04-21 17:41 ` Arnaldo Carvalho de Melo
  2011-04-22  6:34   ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Arnaldo Carvalho de Melo @ 2011-04-21 17:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Andi Kleen, Ingo Molnar, Peter Zijlstra,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo

From: Andi Kleen <ak@linux.intel.com>

The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
user space bits were not. This made it impossible to set the extra mask
and actually do the OFFCORE profiling

This patch fixes this. It adds a new syntax ':' to raw events to specify
additional event masks. I also added support for setting config2, even
though that is not needed currently.

[Note: the original version back in time used , -- but that actually
conflicted with event lists, so now it's :]

Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-list.txt |   11 +++++++++++
 tools/perf/util/parse-events.c         |   18 +++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 7a527f7..f19f1e5 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -61,6 +61,17 @@ raw encoding of 0x1A8 can be used:
 You should refer to the processor specific documentation for getting these
 details. Some of them are referenced in the SEE ALSO section below.
 
+Some raw events -- like the Intel OFFCORE events -- support additional
+parameters. These can be appended after a ':'.
+
+For example on a multi socket Intel Nehalem:
+
+ perf stat -e r1b7:20ff -a sleep 1
+
+Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
+that measures any access to DRAM on another socket.  Upto two parameters can
+be specified with additional ':'
+
 OPTIONS
 -------
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 952b4ae..fe9d079 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -688,9 +688,25 @@ parse_raw_event(const char **strp, struct perf_event_attr *attr)
 		return EVT_FAILED;
 	n = hex2u64(str + 1, &config);
 	if (n > 0) {
-		*strp = str + n + 1;
+		str += n + 1;
 		attr->type = PERF_TYPE_RAW;
 		attr->config = config;
+		if (*str == ':') {
+			str++;
+			n = hex2u64(str, &config);
+			if (n == 0)
+				return EVT_FAILED;
+			attr->config1 = config;
+			str += n;
+			if (*str == ':') {
+				str++;
+				n = hex2u64(str + 1, &config);
+				if (n == 0)
+					return EVT_FAILED;
+				attr->config2 = config;
+			}
+		}
+		*strp = str;
 		return EVT_HANDLED;
 	}
 	return EVT_FAILED;
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-21 17:41 ` [PATCH 1/1] perf tools: Add missing user space " Arnaldo Carvalho de Melo
@ 2011-04-22  6:34   ` Ingo Molnar
  2011-04-22  8:06     ` Ingo Molnar
  2011-04-22 16:22     ` Andi Kleen
  0 siblings, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-22  6:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Stephane Eranian,
	Lin Ming, Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@infradead.org> wrote:

> From: Andi Kleen <ak@linux.intel.com>
> 
> The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
> user space bits were not. This made it impossible to set the extra mask
> and actually do the OFFCORE profiling
> 
> This patch fixes this. It adds a new syntax ':' to raw events to specify
> additional event masks. I also added support for setting config2, even
> though that is not needed currently.
> 
> [Note: the original version back in time used , -- but that actually
> conflicted with event lists, so now it's :]
> 
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephane Eranian <eranian@gmail.com>
> Cc: Lin Ming <ming.m.lin@intel.com>
> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/Documentation/perf-list.txt |   11 +++++++++++
>  tools/perf/util/parse-events.c         |   18 +++++++++++++++++-
>  2 files changed, 28 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
> index 7a527f7..f19f1e5 100644
> --- a/tools/perf/Documentation/perf-list.txt
> +++ b/tools/perf/Documentation/perf-list.txt
> @@ -61,6 +61,17 @@ raw encoding of 0x1A8 can be used:
>  You should refer to the processor specific documentation for getting these
>  details. Some of them are referenced in the SEE ALSO section below.
>  
> +Some raw events -- like the Intel OFFCORE events -- support additional
> +parameters. These can be appended after a ':'.
> +
> +For example on a multi socket Intel Nehalem:
> +
> + perf stat -e r1b7:20ff -a sleep 1
> +
> +Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
> +that measures any access to DRAM on another socket.  Upto two parameters can
> +be specified with additional ':'

This needs to be a *lot* more user friendly. Users do not want to type in 
stupid hexa magic numbers to get profiling. We have moved beyond the oprofile 
era really.

Unless there's proper generalized and human usable support i'm leaning towards 
turning off the offcore user-space accessible raw bits for now, and use them 
only kernel-internally, for the cache events.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  6:34   ` Ingo Molnar
@ 2011-04-22  8:06     ` Ingo Molnar
  2011-04-22 21:37       ` Peter Zijlstra
  2011-04-25 17:12       ` [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Vince Weaver
  2011-04-22 16:22     ` Andi Kleen
  1 sibling, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-22  8:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Stephane Eranian,
	Lin Ming, Arnaldo Carvalho de Melo, Thomas Gleixner,
	Peter Zijlstra


* Ingo Molnar <mingo@elte.hu> wrote:

> This needs to be a *lot* more user friendly. Users do not want to type in 
> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile 
> era really.
> 
> Unless there's proper generalized and human usable support i'm leaning 
> towards turning off the offcore user-space accessible raw bits for now, and 
> use them only kernel-internally, for the cache events.

I'm about to push out the patch attached below - it lays out the arguments in 
detail. I don't think we have time to fix this properly for .39 - but memory 
profiling could be a nice feature for v2.6.40.

Thanks,

	Ingo

--------------------->
>From b52c55c6a25e4515b5e075a989ff346fc251ed09 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@elte.hu>
Date: Fri, 22 Apr 2011 08:44:38 +0200
Subject: [PATCH] x86, perf event: Turn off unstructured raw event access to offcore registers

Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a0b commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.

Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/perf_event.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index eed3673a..632e5dc 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -586,8 +586,12 @@ static int x86_setup_perfctr(struct perf_event *event)
 			return -EOPNOTSUPP;
 	}
 
+	/*
+	 * Do not allow config1 (extended registers) to propagate,
+	 * there's no sane user-space generalization yet:
+	 */
 	if (attr->type == PERF_TYPE_RAW)
-		return x86_pmu_extra_regs(event->attr.config, event);
+		return 0;
 
 	if (attr->type == PERF_TYPE_HW_CACHE)
 		return set_ext_hw_attr(hwc, event);

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  6:34   ` Ingo Molnar
  2011-04-22  8:06     ` Ingo Molnar
@ 2011-04-22 16:22     ` Andi Kleen
  2011-04-22 19:54       ` Ingo Molnar
  1 sibling, 1 reply; 39+ messages in thread
From: Andi Kleen @ 2011-04-22 16:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo

On Fri, Apr 22, 2011 at 08:34:29AM +0200, Ingo Molnar wrote:
> This needs to be a *lot* more user friendly. Users do not want to type in 
> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile 
> era really.

I agree that the raw events are quite user unfriendly.

Unfortunately they are the way of life in perf -- unlike oprofile -- currently
if you want any CPU specific events like this.

Really to make sense out of all this you need per CPU full event lists.

I have an own wrapper to make it more user friendly, but its functionality
should arguably migrate into perf.

I did a patch to add a mapping file some time ago, but it likely
needs some improvements before it can be merged (aka not .39), like
auto selecting a suitable mapping file and backtranslating raw
mappings on output.

BTW the new perf lat code needs the raw events config1 specification
internally, so this is needed in some form anyways.

Short of that the extended raw events is the best we can get short term I 
think. So I would prefer to have it for .39 to make this feature
usable at all.

I attached the old mapping file patch for your reference. 
I also put up a few mapping files for Intel CPUs at 
ftp://ftp.kernel.org/pub/linux/kernel/people/ak/pmu/*

e.g. to use it with Nehalem offcore events and this patch you would 
use today

wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/pmu/nhm-ep.map
perf --map-file nhm-ep.map top -e offcore_response_0.any_data.local_cache_dram

-Andi

commit 37323c19ceb57101cc2160059c567ee14055b7c8
Author: Andi Kleen <ak@linux.intel.com>
Date:   Mon Nov 8 04:52:18 2010 +0100

    mapping file support

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index a91f9f9..63bdbbb 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -120,6 +120,9 @@ Do not update the builid cache. This saves some overhead in situations
 where the information in the perf.data file (which includes buildids)
 is sufficient.
 
+--map-events=file
+Use file as event mapping file.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 4b3a2d4..4f20af3 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -53,6 +53,9 @@ comma-sperated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2
 In per-thread mode, this option is ignored. The -a option is still necessary
 to activate system-wide monitoring. Default is to count on all CPUs.
 
+--map-events=file
+Use file as event mapping file.
+
 EXAMPLES
 --------
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 93bd2ff..6fdf892 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -794,6 +794,9 @@ const struct option record_options[] = {
 	OPT_CALLBACK('e', "event", NULL, "event",
 		     "event selector. use 'perf list' to list available events",
 		     parse_events),
+	OPT_CALLBACK(0, "map-events", NULL, "map-events",
+		     "specify mapping file for events",
+		     map_events),
 	OPT_CALLBACK(0, "filter", NULL, "filter",
 		     "event filter", parse_filter),
 	OPT_INTEGER('p', "pid", &target_pid,
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a6b4d44..f21f307 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -525,6 +525,9 @@ static const struct option options[] = {
 	OPT_CALLBACK('e', "event", NULL, "event",
 		     "event selector. use 'perf list' to list available events",
 		     parse_events),
+	OPT_CALLBACK(0, "map-events", NULL, "map-events",
+		     "specify mapping file for events",
+		     map_events),
 	OPT_BOOLEAN('i', "no-inherit", &no_inherit,
 		    "child tasks do not inherit counters"),
 	OPT_INTEGER('p', "pid", &target_pid,
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..2cc7b3d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -83,6 +83,14 @@ static const char *sw_event_names[] = {
 	"emulation-faults",
 };
 
+struct mapping {
+	const char *str;
+	const char *res;
+};
+
+static int 		mapping_max;
+static struct mapping  *mappings;
+
 #define MAX_ALIASES 8
 
 static const char *hw_cache[][MAX_ALIASES] = {
@@ -731,12 +739,28 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr)
 	return 0;
 }
 
+static int cmp_mapping(const void *a, const void *b)
+{
+	const struct mapping *am = a;
+	const struct mapping *bm = b;
+	return strcmp(am->str, bm->str);
+}
+
+static const char *
+get_event_mapping(const char *str)
+{
+	struct mapping key = { .str = str };
+	struct mapping *r = bsearch(&key, mappings, mapping_max,
+				    sizeof(struct mapping), cmp_mapping);
+	return r ? r->res : NULL;
+}
+
 /*
  * Each event can have multiple symbolic names.
  * Symbolic names are (almost) exactly matched.
  */
 static enum event_result
-parse_event_symbols(const char **str, struct perf_event_attr *attr)
+do_parse_event_symbols(const char **str, struct perf_event_attr *attr)
 {
 	enum event_result ret;
 
@@ -774,6 +798,15 @@ modifier:
 	return ret;
 }
 
+static enum event_result
+parse_event_symbols(const char **str, struct perf_event_attr *attr)
+{
+	const char *map = get_event_mapping(*str);
+	if (map)
+		*str = map;
+	return do_parse_event_symbols(str, attr);
+}
+
 static int store_event_type(const char *orgname)
 {
 	char filename[PATH_MAX], *c;
@@ -963,3 +996,54 @@ void print_events(void)
 
 	exit(129);
 }
+
+int map_events(const struct option *opt __used, const char *str,
+		int unset __used)
+{
+	FILE *f;
+	char *line = NULL;
+	size_t linelen = 0;
+	char *p;
+	int lineno = 0;
+	static int mapping_size;
+	struct mapping *map;
+
+	f = fopen(str, "r");
+	if (!f) {
+		pr_err("Cannot open event map file");
+		return -1;
+	}
+	while (getline(&line, &linelen, f) > 0) {
+		lineno++;
+		p = strpbrk(line, "\n#");
+		if (p)
+			*p = 0;
+		p = line + strspn(line, " \t");
+		if (*p == 0)
+			continue;
+		if (mapping_max >= mapping_size) {
+			if (!mapping_size)
+				mapping_size = 2048;
+			mapping_size *= 2;
+			mappings = realloc(mappings,
+				      mapping_size * sizeof(struct mapping));
+			if (!mappings) {
+				pr_err("Out of memory\n");
+				exit(ENOMEM);
+			}
+		}
+		map = &mappings[mapping_max++];
+		map->str = strsep(&p, " \t");
+		map->res = strsep(&p, " \t");
+		if (!map->str || !map->res) {
+			fprintf(stderr, "%s:%d: Invalid line in map file\n",
+				str, lineno);
+		}
+		line = NULL;
+		linelen = 0;
+	}
+	fclose(f);
+	qsort(mappings, mapping_max, sizeof(struct mapping),
+	      cmp_mapping);
+	return 0;
+}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..1d6df9c 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -33,5 +33,6 @@ extern void print_events(void);
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
 
+extern int map_events(const struct option *opt, const char *str, int unset);
 
 #endif /* __PERF_PARSE_EVENTS_H */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 16:22     ` Andi Kleen
@ 2011-04-22 19:54       ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-22 19:54 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo


* Andi Kleen <ak@linux.intel.com> wrote:

> On Fri, Apr 22, 2011 at 08:34:29AM +0200, Ingo Molnar wrote:
> > This needs to be a *lot* more user friendly. Users do not want to type in 
> > stupid hexa magic numbers to get profiling. We have moved beyond the oprofile 
> > era really.
> 
> I agree that the raw events are quite user unfriendly.
> 
> Unfortunately they are the way of life in perf -- unlike oprofile -- 
> currently if you want any CPU specific events like this.

Not sure where you take that blanket statement from, but no, raw events are not 
really the 'way of life' - judging by the various user feedback we get they 
come up pretty rarely.

The thing is, most people just use the default 'perf record' and that's it - 
they do not even care about a *single* event - they just want to profile their 
code somehow.

Then the second most popular event category are the generalized events, the 
ones you can see in perf list output:

  cpu-cycles OR cycles                       [Hardware event]
  instructions                               [Hardware event]
  cache-references                           [Hardware event]
  cache-misses                               [Hardware event]
  branch-instructions OR branches            [Hardware event]
  branch-misses                              [Hardware event]
  bus-cycles                                 [Hardware event]

  cpu-clock                                  [Software event]
  task-clock                                 [Software event]
  page-faults OR faults                      [Software event]
  minor-faults                               [Software event]
  major-faults                               [Software event]
  context-switches OR cs                     [Software event]
  cpu-migrations OR migrations               [Software event]
  alignment-faults                           [Software event]
  emulation-faults                           [Software event]

  L1-dcache-loads                            [Hardware cache event]
  L1-dcache-load-misses                      [Hardware cache event]
  L1-dcache-stores                           [Hardware cache event]
  L1-dcache-store-misses                     [Hardware cache event]
  L1-dcache-prefetches                       [Hardware cache event]
  L1-dcache-prefetch-misses                  [Hardware cache event]
  L1-icache-loads                            [Hardware cache event]
  L1-icache-load-misses                      [Hardware cache event]
  L1-icache-prefetches                       [Hardware cache event]
  L1-icache-prefetch-misses                  [Hardware cache event]
  LLC-loads                                  [Hardware cache event]
  LLC-load-misses                            [Hardware cache event]
  LLC-stores                                 [Hardware cache event]
  LLC-store-misses                           [Hardware cache event]
  LLC-prefetches                             [Hardware cache event]
  LLC-prefetch-misses                        [Hardware cache event]
  dTLB-loads                                 [Hardware cache event]
  dTLB-load-misses                           [Hardware cache event]
  dTLB-stores                                [Hardware cache event]
  dTLB-store-misses                          [Hardware cache event]
  dTLB-prefetches                            [Hardware cache event]
  dTLB-prefetch-misses                       [Hardware cache event]
  iTLB-loads                                 [Hardware cache event]
  iTLB-load-misses                           [Hardware cache event]
  branch-loads                               [Hardware cache event]
  branch-load-misses                         [Hardware cache event]

These are useful but are used less frequently.

Then come tracepoint based events - and as a distant last, come raw events. 
Yes, they raw events are useful occasionally, just like modifying applications 
via a hexa editor is useful occasionally. If done often we better abstract it 
out.

> Really to make sense out of all this you need per CPU full event lists.

To make sense out of what? You are making very sweeping yet vague statements.

> I have an own wrapper to make it more user friendly, but its functionality 
> should arguably migrate into perf.

Uhm, no - your patch seem to reintroduce oprofile's horrible events files. We 
really learned from that mistake and do not want to step back ...

Please see the detailed mails i wrote in this thread, what we want is to extend 
and improve existing generalizations of events. The useful bits of the offcore 
PMU fit nicely into that scheme.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  8:06     ` Ingo Molnar
@ 2011-04-22 21:37       ` Peter Zijlstra
  2011-04-22 21:54         ` Peter Zijlstra
                           ` (2 more replies)
  2011-04-25 17:12       ` [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Vince Weaver
  1 sibling, 3 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-22 21:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 10:06 +0200, Ingo Molnar wrote:
> 
> I'm about to push out the patch attached below - it lays out the arguments in 
> detail. I don't think we have time to fix this properly for .39 - but memory 
> profiling could be a nice feature for v2.6.40. 

Does something like the below provide enough generic infrastructure to
allow the raw offcore bits again?

The below needs filling out for !x86 (which I filled out with
unsupported events) and x86 needs the offcore bits fixed to auto select
between the two offcore events.


Not-signed-off-by: /me
---
 arch/arm/kernel/perf_event_v6.c        |   28 ++++++
 arch/arm/kernel/perf_event_v7.c        |   28 ++++++
 arch/arm/kernel/perf_event_xscale.c    |   14 +++
 arch/mips/kernel/perf_event_mipsxx.c   |   28 ++++++
 arch/powerpc/kernel/e500-pmu.c         |    5 +
 arch/powerpc/kernel/mpc7450-pmu.c      |    5 +
 arch/powerpc/kernel/power4-pmu.c       |    5 +
 arch/powerpc/kernel/power5+-pmu.c      |    5 +
 arch/powerpc/kernel/power5-pmu.c       |    5 +
 arch/powerpc/kernel/power6-pmu.c       |    5 +
 arch/powerpc/kernel/power7-pmu.c       |    5 +
 arch/powerpc/kernel/ppc970-pmu.c       |    5 +
 arch/sh/kernel/cpu/sh4/perf_event.c    |   15 +++
 arch/sh/kernel/cpu/sh4a/perf_event.c   |   15 +++
 arch/sparc/kernel/perf_event.c         |   42 ++++++++
 arch/x86/kernel/cpu/perf_event_amd.c   |   14 +++
 arch/x86/kernel/cpu/perf_event_intel.c |  167 +++++++++++++++++++++++++-------
 arch/x86/kernel/cpu/perf_event_p4.c    |   14 +++
 include/linux/perf_event.h             |    3 +-
 19 files changed, 373 insertions(+), 35 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index f1e8dd9..02178da 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -173,6 +173,20 @@ static const unsigned armv6_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 enum armv6mpcore_perf_types {
@@ -310,6 +324,20 @@ static const unsigned armv6mpcore_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 static inline unsigned long
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 4960686..79ffc83 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -255,6 +255,20 @@ static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
@@ -371,6 +385,20 @@ static const unsigned armv7_a9_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 39affbe..7ed1a55 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -144,6 +144,20 @@ static const unsigned xscale_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 #define	XSCALE_PMU_ENABLE	0x001
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 75266ff..e5ad09a 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -377,6 +377,20 @@ static const struct mips_perf_event mipsxxcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 /* 74K core has completely different cache event map. */
@@ -480,6 +494,20 @@ static const struct mips_perf_event mipsxx74Kcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 #ifdef CONFIG_MIPS_MT_SMP
diff --git a/arch/powerpc/kernel/e500-pmu.c b/arch/powerpc/kernel/e500-pmu.c
index b150b51..cb2e294 100644
--- a/arch/powerpc/kernel/e500-pmu.c
+++ b/arch/powerpc/kernel/e500-pmu.c
@@ -75,6 +75,11 @@ static int e500_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1 	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static int num_events = 128;
diff --git a/arch/powerpc/kernel/mpc7450-pmu.c b/arch/powerpc/kernel/mpc7450-pmu.c
index 2cc5e03..845a584 100644
--- a/arch/powerpc/kernel/mpc7450-pmu.c
+++ b/arch/powerpc/kernel/mpc7450-pmu.c
@@ -388,6 +388,11 @@ static int mpc7450_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 struct power_pmu mpc7450_pmu = {
diff --git a/arch/powerpc/kernel/power4-pmu.c b/arch/powerpc/kernel/power4-pmu.c
index ead8b3c..e9dbc2d 100644
--- a/arch/powerpc/kernel/power4-pmu.c
+++ b/arch/powerpc/kernel/power4-pmu.c
@@ -587,6 +587,11 @@ static int power4_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power4_pmu = {
diff --git a/arch/powerpc/kernel/power5+-pmu.c b/arch/powerpc/kernel/power5+-pmu.c
index eca0ac5..f58a2bd 100644
--- a/arch/powerpc/kernel/power5+-pmu.c
+++ b/arch/powerpc/kernel/power5+-pmu.c
@@ -653,6 +653,11 @@ static int power5p_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5p_pmu = {
diff --git a/arch/powerpc/kernel/power5-pmu.c b/arch/powerpc/kernel/power5-pmu.c
index d5ff0f6..b1acab6 100644
--- a/arch/powerpc/kernel/power5-pmu.c
+++ b/arch/powerpc/kernel/power5-pmu.c
@@ -595,6 +595,11 @@ static int power5_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5_pmu = {
diff --git a/arch/powerpc/kernel/power6-pmu.c b/arch/powerpc/kernel/power6-pmu.c
index 3160392..b24a3a2 100644
--- a/arch/powerpc/kernel/power6-pmu.c
+++ b/arch/powerpc/kernel/power6-pmu.c
@@ -516,6 +516,11 @@ static int power6_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power6_pmu = {
diff --git a/arch/powerpc/kernel/power7-pmu.c b/arch/powerpc/kernel/power7-pmu.c
index 593740f..6d9dccb 100644
--- a/arch/powerpc/kernel/power7-pmu.c
+++ b/arch/powerpc/kernel/power7-pmu.c
@@ -342,6 +342,11 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power7_pmu = {
diff --git a/arch/powerpc/kernel/ppc970-pmu.c b/arch/powerpc/kernel/ppc970-pmu.c
index 9a6e093..b121de9 100644
--- a/arch/powerpc/kernel/ppc970-pmu.c
+++ b/arch/powerpc/kernel/ppc970-pmu.c
@@ -467,6 +467,11 @@ static int ppc970_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu ppc970_pmu = {
diff --git a/arch/sh/kernel/cpu/sh4/perf_event.c b/arch/sh/kernel/cpu/sh4/perf_event.c
index 748955d..fa4f724 100644
--- a/arch/sh/kernel/cpu/sh4/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4/perf_event.c
@@ -180,6 +180,21 @@ static const int sh7750_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh7750_event_map(int event)
diff --git a/arch/sh/kernel/cpu/sh4a/perf_event.c b/arch/sh/kernel/cpu/sh4a/perf_event.c
index 17e6beb..84a2c39 100644
--- a/arch/sh/kernel/cpu/sh4a/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4a/perf_event.c
@@ -205,6 +205,21 @@ static const int sh4a_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh4a_event_map(int event)
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index ee8426e..d890e0f 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -245,6 +245,20 @@ static const cache_map_t ultra3_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu ultra3_pmu = {
@@ -360,6 +374,20 @@ static const cache_map_t niagara1_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara1_pmu = {
@@ -472,6 +500,20 @@ static const cache_map_t niagara2_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara2_pmu = {
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index cf4e369..01c7dd3 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -89,6 +89,20 @@ static __initconst const u64 amd_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0xb8e9, /* CPU Request to Memory, l+r */
+		[ C(RESULT_MISS)   ] = 0x98e9, /* CPU Request to Memory, r   */
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 43fa20b..225efa0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -184,26 +184,23 @@ static __initconst const u64 snb_hw_cache_event_ids
 	},
  },
  [ C(LL  ) ] = {
-	/*
-	 * TBD: Need Off-core Response Performance Monitoring support
-	 */
 	[ C(OP_READ) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_WRITE) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_RFO.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_RFO.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_PREFETCH) ] = {
-		/* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
  },
  [ C(DTLB) ] = {
@@ -248,6 +245,20 @@ static __initconst const u64 snb_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 static __initconst const u64 westmere_hw_cache_event_ids
@@ -285,26 +296,26 @@ static __initconst const u64 westmere_hw_cache_event_ids
  },
  [ C(LL  ) ] = {
 	[ C(OP_READ) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	/*
 	 * Use RFO, not WRITEBACK, because a write miss would typically occur
 	 * on RFO.
 	 */
 	[ C(OP_WRITE) ] = {
-		/* OFFCORE_RESPONSE_1.ANY_RFO.LOCAL_CACHE */
-		[ C(RESULT_ACCESS) ] = 0x01bb,
-		/* OFFCORE_RESPONSE_0.ANY_RFO.ANY_LLC_MISS */
+		/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
 		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_PREFETCH) ] = {
-		/* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
  },
  [ C(DTLB) ] = {
@@ -349,19 +360,51 @@ static __initconst const u64 westmere_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 /*
  * OFFCORE_RESPONSE MSR bits (subset), See IA32 SDM Vol 3 30.6.1.3
  */
 
-#define DMND_DATA_RD     (1 << 0)
-#define DMND_RFO         (1 << 1)
-#define DMND_WB          (1 << 3)
-#define PF_DATA_RD       (1 << 4)
-#define PF_DATA_RFO      (1 << 5)
-#define RESP_UNCORE_HIT  (1 << 8)
-#define RESP_MISS        (0xf600) /* non uncore hit */
+#define DMND_DATA_RD		(1 << 0)
+#define DMND_RFO		(1 << 1)
+#define DMND_IFETCH		(1 << 2)
+#define DMND_WB			(1 << 3)
+#define PF_DATA_RD		(1 << 4)
+#define PF_DATA_RFO		(1 << 5)
+#define PF_IFETCH		(1 << 6)
+#define OFFCORE_OTHER		(1 << 7)
+#define UNCORE_HIT		(1 << 8)
+#define OTHER_CORE_HIT_SNP	(1 << 9)
+#define OTHER_CORE_HITM		(1 << 10)
+				/* reserved */
+#define REMOTE_CACHE_FWD	(1 << 12)
+#define REMOTE_DRAM		(1 << 13)
+#define LOCAL_DRAM		(1 << 14)
+#define NON_DRAM		(1 << 15)
+
+#define ALL_DRAM	(REMOTE_DRAM|LOCAL_DRAM)
+
+#define DMND_READ	(DMND_DATA_RD)
+#define DMND_WRITE	(DMND_RFO|DMND_WB)
+#define DMND_PREFETCH	(PF_DATA_RD|PF_DATA_FRO)
+
+#define L3_HIT	(UNCORE_HIT|OTHER_CORE_HIT_SNP|OTHER_CORE_HITM)
+#define L3_MISS	(NON_DRAM|ALL_DRAM|REMOTE_CACHE_FWD)
 
 static __initconst const u64 nehalem_hw_cache_extra_regs
 				[PERF_COUNT_HW_CACHE_MAX]
@@ -370,18 +413,32 @@ static __initconst const u64 nehalem_hw_cache_extra_regs
 {
  [ C(LL  ) ] = {
 	[ C(OP_READ) ] = {
-		[ C(RESULT_ACCESS) ] = DMND_DATA_RD|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = DMND_DATA_RD|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = DMND_READ|L3_HIT,
+		[ C(RESULT_MISS)   ] = DMND_READ|L3_MISS,
 	},
 	[ C(OP_WRITE) ] = {
-		[ C(RESULT_ACCESS) ] = DMND_RFO|DMND_WB|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = DMND_RFO|DMND_WB|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = DMND_WRITE|L3_HIT,
+		[ C(RESULT_MISS)   ] = DMND_WRITE|L3_MISS,
 	},
 	[ C(OP_PREFETCH) ] = {
-		[ C(RESULT_ACCESS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = PF_DATA_RD|PF_DATA_RFO|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = DMND_PREFETCH|L3_HIT,
+		[ C(RESULT_MISS)   ] = DMND_PREFETCH|L3_MISS,
 	},
  }
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = DMND_READ|ALL_DRAM,
+		[ C(RESULT_MISS)   ] = DMND_READ|REMOTE_DRAM,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = DMND_WRITE|ALL_DRAM,
+		[ C(RESULT_MISS)   ] = DMND_WRITE|REMOTE_DRAM,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = DMND_PREFETCH|ALL_DRAM,
+		[ C(RESULT_MISS)   ] = DMND_PREFETCH|REMOTE_DRAM,
+	},
+ },
 };
 
 static __initconst const u64 nehalem_hw_cache_event_ids
@@ -483,6 +540,20 @@ static __initconst const u64 nehalem_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 static __initconst const u64 core2_hw_cache_event_ids
@@ -574,6 +645,20 @@ static __initconst const u64 core2_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static __initconst const u64 atom_hw_cache_event_ids
@@ -665,6 +750,20 @@ static __initconst const u64 atom_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static void intel_pmu_disable_all(void)
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 74507c1..e802c7e 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -554,6 +554,20 @@ static __initconst const u64 p4_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static u64 p4_general_events[PERF_COUNT_HW_MAX] = {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee9f1e7..df4a841 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -59,7 +59,7 @@ enum perf_hw_id {
 /*
  * Generalized hardware cache events:
  *
- *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x
+ *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
  *       { read, write, prefetch } x
  *       { accesses, misses }
  */
@@ -70,6 +70,7 @@ enum perf_hw_cache_id {
 	PERF_COUNT_HW_CACHE_DTLB		= 3,
 	PERF_COUNT_HW_CACHE_ITLB		= 4,
 	PERF_COUNT_HW_CACHE_BPU			= 5,
+	PERF_COUNT_HW_CACHE_NODE		= 6,
 
 	PERF_COUNT_HW_CACHE_MAX,		/* non-ABI */
 };



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 21:37       ` Peter Zijlstra
@ 2011-04-22 21:54         ` Peter Zijlstra
  2011-04-22 22:19           ` Peter Zijlstra
  2011-04-22 22:57           ` Peter Zijlstra
  2011-04-23  8:13         ` Ingo Molnar
  2011-07-01 15:23         ` [tip:perf/core] perf, arch: Add generic NODE cache events tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-22 21:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 23:37 +0200, Peter Zijlstra wrote:
> The below needs filling out for !x86 (which I filled out with
> unsupported events) and x86 needs the offcore bits fixed to auto select
> between the two offcore events.

Urgh, so SNB has different MSR_OFFCORE_RESPONSE bits and needs another table.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 21:54         ` Peter Zijlstra
@ 2011-04-22 22:19           ` Peter Zijlstra
  2011-04-22 23:54             ` Andi Kleen
  2011-04-22 22:57           ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-22 22:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 23:54 +0200, Peter Zijlstra wrote:
> On Fri, 2011-04-22 at 23:37 +0200, Peter Zijlstra wrote:
> > The below needs filling out for !x86 (which I filled out with
> > unsupported events) and x86 needs the offcore bits fixed to auto select
> > between the two offcore events.
> 
> Urgh, so SNB has different MSR_OFFCORE_RESPONSE bits and needs another table.

/*
 * Sandy Bridge MSR_OFFCORE_RESPONSE bits;
 * See IA32 SDM Vol 3B 30.8.5
 */

#define SNB_DMND_DATA_RD	(1 << 0)
#define SNB_DMND_RFO		(1 << 1)
#define SNB_DMND_IFETCH		(1 << 2)
#define SNB_DMND_WB		(1 << 3)
#define SNB_PF_DATA_RD		(1 << 4)
#define SNB_PF_DATA_RFO		(1 << 5)
#define SNB_PF_IFETCH		(1 << 6)
#define SNB_PF_LLC_DATA_RD	(1 << 7)
#define SNB_PF_LLC_RFO		(1 << 8)
#define SNB_PF_LLC_IFETCH	(1 << 9)
#define SNB_BUS_LOCKS		(1 << 10)
#define SNB_STRM_ST		(1 << 11)
        			/* hole */
#define SNB_OFFCORE_OTHER	(1 << 15)
#define SNB_COMMON		(1 << 16)
#define SNB_NO_SUPP		(1 << 17)
#define SNB_LLC_HITM		(1 << 18)
#define SNB_LLC_HITE		(1 << 19)
#define SNB_LLC_HITS		(1 << 20)
#define SNB_LLC_HITF		(1 << 21)
				/* hole */
#define SNB_SNP_NONE		(1 << 31)
#define SNB_SNP_NOT_NEEDED	(1 << 32)
#define SNB_SNP_MISS		(1 << 33)
#define SNB_SNP_NO_FWD		(1 << 34)
#define SNB_SNP_FWD		(1 << 35)
#define SNB_HITM		(1 << 36)
#define SNB_NON_DRAM		(1 << 37)

#define SNB_DMND_READ		(SNB_DMND_DATA_RD)
#define SNB_DMND_WRITE		(SNB_DMND_RFO|SNB_DMND_WB|SNB_STRM_ST)
#define SNB_DMND_PREFETCH	(SNB_PF_DATA_RD|SNB_PF_DATA_RFO)

Is what I came up with, but I'm stumped on how to construct:

#define SNB_L3_HIT
#define SNB_L3_MISS

#define SNB_ALL_DRAM
#define SNB_REMOTE_DRAM

Anybody got clue?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 21:54         ` Peter Zijlstra
  2011-04-22 22:19           ` Peter Zijlstra
@ 2011-04-22 22:57           ` Peter Zijlstra
  2011-04-23  0:00             ` Andi Kleen
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-22 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 23:54 +0200, Peter Zijlstra wrote:
> On Fri, 2011-04-22 at 23:37 +0200, Peter Zijlstra wrote:
> > The below needs filling out for !x86 (which I filled out with
> > unsupported events) and x86 needs the offcore bits fixed to auto select
> > between the two offcore events.
> 
> Urgh, so SNB has different MSR_OFFCORE_RESPONSE bits and needs another table.

Also, NHM offcore bits were wrong... it implemented _ACCESS as _HIT and
counted OTHER_CORE_HIT* as MISS even though its clearly documented as an
L3 hit.

Current scribblings below..

---
 arch/arm/kernel/perf_event_v6.c        |   28 ++++
 arch/arm/kernel/perf_event_v7.c        |   28 ++++
 arch/arm/kernel/perf_event_xscale.c    |   14 ++
 arch/mips/kernel/perf_event_mipsxx.c   |   28 ++++
 arch/powerpc/kernel/e500-pmu.c         |    5 +
 arch/powerpc/kernel/mpc7450-pmu.c      |    5 +
 arch/powerpc/kernel/power4-pmu.c       |    5 +
 arch/powerpc/kernel/power5+-pmu.c      |    5 +
 arch/powerpc/kernel/power5-pmu.c       |    5 +
 arch/powerpc/kernel/power6-pmu.c       |    5 +
 arch/powerpc/kernel/power7-pmu.c       |    5 +
 arch/powerpc/kernel/ppc970-pmu.c       |    5 +
 arch/sh/kernel/cpu/sh4/perf_event.c    |   15 ++
 arch/sh/kernel/cpu/sh4a/perf_event.c   |   15 ++
 arch/sparc/kernel/perf_event.c         |   42 ++++++
 arch/x86/kernel/cpu/perf_event_amd.c   |   14 ++
 arch/x86/kernel/cpu/perf_event_intel.c |  253 +++++++++++++++++++++++++++-----
 arch/x86/kernel/cpu/perf_event_p4.c    |   14 ++
 include/linux/perf_event.h             |    3 +-
 19 files changed, 458 insertions(+), 36 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index f1e8dd9..02178da 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -173,6 +173,20 @@ static const unsigned armv6_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 enum armv6mpcore_perf_types {
@@ -310,6 +324,20 @@ static const unsigned armv6mpcore_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 static inline unsigned long
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 4960686..79ffc83 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -255,6 +255,20 @@ static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
@@ -371,6 +385,20 @@ static const unsigned armv7_a9_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 39affbe..7ed1a55 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -144,6 +144,20 @@ static const unsigned xscale_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 #define	XSCALE_PMU_ENABLE	0x001
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 75266ff..e5ad09a 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -377,6 +377,20 @@ static const struct mips_perf_event mipsxxcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 /* 74K core has completely different cache event map. */
@@ -480,6 +494,20 @@ static const struct mips_perf_event mipsxx74Kcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 #ifdef CONFIG_MIPS_MT_SMP
diff --git a/arch/powerpc/kernel/e500-pmu.c b/arch/powerpc/kernel/e500-pmu.c
index b150b51..cb2e294 100644
--- a/arch/powerpc/kernel/e500-pmu.c
+++ b/arch/powerpc/kernel/e500-pmu.c
@@ -75,6 +75,11 @@ static int e500_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1 	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static int num_events = 128;
diff --git a/arch/powerpc/kernel/mpc7450-pmu.c b/arch/powerpc/kernel/mpc7450-pmu.c
index 2cc5e03..845a584 100644
--- a/arch/powerpc/kernel/mpc7450-pmu.c
+++ b/arch/powerpc/kernel/mpc7450-pmu.c
@@ -388,6 +388,11 @@ static int mpc7450_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 struct power_pmu mpc7450_pmu = {
diff --git a/arch/powerpc/kernel/power4-pmu.c b/arch/powerpc/kernel/power4-pmu.c
index ead8b3c..e9dbc2d 100644
--- a/arch/powerpc/kernel/power4-pmu.c
+++ b/arch/powerpc/kernel/power4-pmu.c
@@ -587,6 +587,11 @@ static int power4_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power4_pmu = {
diff --git a/arch/powerpc/kernel/power5+-pmu.c b/arch/powerpc/kernel/power5+-pmu.c
index eca0ac5..f58a2bd 100644
--- a/arch/powerpc/kernel/power5+-pmu.c
+++ b/arch/powerpc/kernel/power5+-pmu.c
@@ -653,6 +653,11 @@ static int power5p_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5p_pmu = {
diff --git a/arch/powerpc/kernel/power5-pmu.c b/arch/powerpc/kernel/power5-pmu.c
index d5ff0f6..b1acab6 100644
--- a/arch/powerpc/kernel/power5-pmu.c
+++ b/arch/powerpc/kernel/power5-pmu.c
@@ -595,6 +595,11 @@ static int power5_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5_pmu = {
diff --git a/arch/powerpc/kernel/power6-pmu.c b/arch/powerpc/kernel/power6-pmu.c
index 3160392..b24a3a2 100644
--- a/arch/powerpc/kernel/power6-pmu.c
+++ b/arch/powerpc/kernel/power6-pmu.c
@@ -516,6 +516,11 @@ static int power6_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power6_pmu = {
diff --git a/arch/powerpc/kernel/power7-pmu.c b/arch/powerpc/kernel/power7-pmu.c
index 593740f..6d9dccb 100644
--- a/arch/powerpc/kernel/power7-pmu.c
+++ b/arch/powerpc/kernel/power7-pmu.c
@@ -342,6 +342,11 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power7_pmu = {
diff --git a/arch/powerpc/kernel/ppc970-pmu.c b/arch/powerpc/kernel/ppc970-pmu.c
index 9a6e093..b121de9 100644
--- a/arch/powerpc/kernel/ppc970-pmu.c
+++ b/arch/powerpc/kernel/ppc970-pmu.c
@@ -467,6 +467,11 @@ static int ppc970_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu ppc970_pmu = {
diff --git a/arch/sh/kernel/cpu/sh4/perf_event.c b/arch/sh/kernel/cpu/sh4/perf_event.c
index 748955d..fa4f724 100644
--- a/arch/sh/kernel/cpu/sh4/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4/perf_event.c
@@ -180,6 +180,21 @@ static const int sh7750_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh7750_event_map(int event)
diff --git a/arch/sh/kernel/cpu/sh4a/perf_event.c b/arch/sh/kernel/cpu/sh4a/perf_event.c
index 17e6beb..84a2c39 100644
--- a/arch/sh/kernel/cpu/sh4a/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4a/perf_event.c
@@ -205,6 +205,21 @@ static const int sh4a_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh4a_event_map(int event)
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index ee8426e..d890e0f 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -245,6 +245,20 @@ static const cache_map_t ultra3_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu ultra3_pmu = {
@@ -360,6 +374,20 @@ static const cache_map_t niagara1_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara1_pmu = {
@@ -472,6 +500,20 @@ static const cache_map_t niagara2_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara2_pmu = {
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index cf4e369..01c7dd3 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -89,6 +89,20 @@ static __initconst const u64 amd_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0xb8e9, /* CPU Request to Memory, l+r */
+		[ C(RESULT_MISS)   ] = 0x98e9, /* CPU Request to Memory, r   */
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 43fa20b..fe4e8b1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -150,6 +150,86 @@ static u64 intel_pmu_event_map(int hw_event)
 	return intel_perfmon_event_map[hw_event];
 }
 
+/*
+ * Sandy Bridge MSR_OFFCORE_RESPONSE bits;
+ * See IA32 SDM Vol 3B 30.8.5
+ */
+
+#define SNB_DMND_DATA_RD	(1 << 0)
+#define SNB_DMND_RFO		(1 << 1)
+#define SNB_DMND_IFETCH		(1 << 2)
+#define SNB_DMND_WB		(1 << 3)
+#define SNB_PF_DATA_RD		(1 << 4)
+#define SNB_PF_DATA_RFO		(1 << 5)
+#define SNB_PF_IFETCH		(1 << 6)
+#define SNB_PF_LLC_DATA_RD	(1 << 7)
+#define SNB_PF_LLC_RFO		(1 << 8)
+#define SNB_PF_LLC_IFETCH	(1 << 9)
+#define SNB_BUS_LOCKS		(1 << 10)
+#define SNB_STRM_ST		(1 << 11)
+        			/* hole */
+#define SNB_OFFCORE_OTHER	(1 << 15)
+#define SNB_COMMON		(1 << 16)
+#define SNB_NO_SUPP		(1 << 17)
+#define SNB_LLC_HITM		(1 << 18)
+#define SNB_LLC_HITE		(1 << 19)
+#define SNB_LLC_HITS		(1 << 20)
+#define SNB_LLC_HITF		(1 << 21)
+				/* hole */
+#define SNB_SNP_NONE		(1 << 31)
+#define SNB_SNP_NOT_NEEDED	(1 << 32)
+#define SNB_SNP_MISS		(1 << 33)
+#define SNB_SNP_NO_FWD		(1 << 34)
+#define SNB_SNP_FWD		(1 << 35)
+#define SNB_HITM		(1 << 36)
+#define SNB_NON_DRAM		(1 << 37)
+
+#define SNB_DMND_READ		(SNB_DMND_DATA_RD)
+#define SNB_DMND_WRITE		(SNB_DMND_RFO|SNB_DMND_WB|SNB_STRM_ST)
+#define SNB_DMND_PREFETCH	(SNB_PF_DATA_RD|SNB_PF_DATA_RFO)
+
+#define SNB_L3_HIT		()
+#define SNB_L3_MISS		()
+#define SNB_L3_ACCESS		(SNB_L3_HIT|SNB_L3_MISS)
+
+#define SNB_ALL_DRAM		()
+#define SNB_REMOTE_DRAM		()
+
+static __initconst const u64 snb_hw_cache_extra_regs
+				[PERF_COUNT_HW_CACHE_MAX]
+				[PERF_COUNT_HW_CACHE_OP_MAX]
+				[PERF_COUNT_HW_CACHE_RESULT_MAX] =
+{
+ [ C(LL  ) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_READ|SNB_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = SNB_DMND_READ|SNB_L3_MISS,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_WRITE|SNB_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = SNB_DMND_WRITE|SNB_L3_MISS,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_PREFETCH|SNB_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = SNB_DMND_PREFETCH|SNB_L3_MISS,
+	},
+ }
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_READ|SNB_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = SNB_DMND_READ|SNB_REMOTE_DRAM,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_WRITE|SNB_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = SNB_DMND_WRITE|SNB_REMOTE_DRAM,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = SNB_DMND_PREFETCH|SNB_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = SNB_DMND_PREFETCH|SNB_REMOTE_DRAM,
+	},
+ },
+};
+
 static __initconst const u64 snb_hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -184,26 +264,23 @@ static __initconst const u64 snb_hw_cache_event_ids
 	},
  },
  [ C(LL  ) ] = {
-	/*
-	 * TBD: Need Off-core Response Performance Monitoring support
-	 */
 	[ C(OP_READ) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_WRITE) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_RFO.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_RFO.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_PREFETCH) ] = {
-		/* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
  },
  [ C(DTLB) ] = {
@@ -248,6 +325,20 @@ static __initconst const u64 snb_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 static __initconst const u64 westmere_hw_cache_event_ids
@@ -285,26 +376,26 @@ static __initconst const u64 westmere_hw_cache_event_ids
  },
  [ C(LL  ) ] = {
 	[ C(OP_READ) ] = {
-		/* OFFCORE_RESPONSE_0.ANY_DATA.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.ANY_DATA.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.ANY_DATA.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.ANY_DATA.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	/*
 	 * Use RFO, not WRITEBACK, because a write miss would typically occur
 	 * on RFO.
 	 */
 	[ C(OP_WRITE) ] = {
-		/* OFFCORE_RESPONSE_1.ANY_RFO.LOCAL_CACHE */
-		[ C(RESULT_ACCESS) ] = 0x01bb,
-		/* OFFCORE_RESPONSE_0.ANY_RFO.ANY_LLC_MISS */
+		/* OFFCORE_RESPONSE.ANY_RFO.LOCAL_CACHE */
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		/* OFFCORE_RESPONSE.ANY_RFO.ANY_LLC_MISS */
 		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
 	[ C(OP_PREFETCH) ] = {
-		/* OFFCORE_RESPONSE_0.PREFETCH.LOCAL_CACHE */
+		/* OFFCORE_RESPONSE.PREFETCH.LOCAL_CACHE */
 		[ C(RESULT_ACCESS) ] = 0x01b7,
-		/* OFFCORE_RESPONSE_1.PREFETCH.ANY_LLC_MISS */
-		[ C(RESULT_MISS)   ] = 0x01bb,
+		/* OFFCORE_RESPONSE.PREFETCH.ANY_LLC_MISS */
+		[ C(RESULT_MISS)   ] = 0x01b7,
 	},
  },
  [ C(DTLB) ] = {
@@ -349,19 +440,53 @@ static __initconst const u64 westmere_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 /*
- * OFFCORE_RESPONSE MSR bits (subset), See IA32 SDM Vol 3 30.6.1.3
+ * Nehalem/Westmere MSR_OFFCORE_RESPONSE bits;
+ * See IA32 SDM Vol 3B 30.6.1.3
  */
 
-#define DMND_DATA_RD     (1 << 0)
-#define DMND_RFO         (1 << 1)
-#define DMND_WB          (1 << 3)
-#define PF_DATA_RD       (1 << 4)
-#define PF_DATA_RFO      (1 << 5)
-#define RESP_UNCORE_HIT  (1 << 8)
-#define RESP_MISS        (0xf600) /* non uncore hit */
+#define NHM_DMND_DATA_RD	(1 << 0)
+#define NHM_DMND_RFO		(1 << 1)
+#define NHM_DMND_IFETCH		(1 << 2)
+#define NHM_DMND_WB		(1 << 3)
+#define NHM_PF_DATA_RD		(1 << 4)
+#define NHM_PF_DATA_RFO		(1 << 5)
+#define NHM_PF_IFETCH		(1 << 6)
+#define NHM_OFFCORE_OTHER	(1 << 7)
+#define NHM_UNCORE_HIT		(1 << 8)
+#define NHM_OTHER_CORE_HIT_SNP	(1 << 9)
+#define NHM_OTHER_CORE_HITM	(1 << 10)
+        			/* reserved */
+#define NHM_REMOTE_CACHE_FWD	(1 << 12)
+#define NHM_REMOTE_DRAM		(1 << 13)
+#define NHM_LOCAL_DRAM		(1 << 14)
+#define NHM_NON_DRAM		(1 << 15)
+
+#define NHM_ALL_DRAM		(NHM_REMOTE_DRAM|NHM_LOCAL_DRAM)
+
+#define NHM_DMND_READ		(NHM_DMND_DATA_RD)
+#define NHM_DMND_WRITE		(NHM_DMND_RFO|NHM_DMND_WB)
+#define NHM_DMND_PREFETCH	(NHM_PF_DATA_RD|NHM_PF_DATA_RFO)
+
+#define NHM_L3_HIT	(NHM_UNCORE_HIT|NHM_OTHER_CORE_HIT_SNP|NHM_OTHER_CORE_HITM)
+#define NHM_L3_MISS	(NHM_NON_DRAM|NHM_ALL_DRAM|NHM_REMOTE_CACHE_FWD)
+#define NHM_L3_ACCESS	(NHM_L3_HIT|NHM_L3_MISS)
 
 static __initconst const u64 nehalem_hw_cache_extra_regs
 				[PERF_COUNT_HW_CACHE_MAX]
@@ -370,18 +495,32 @@ static __initconst const u64 nehalem_hw_cache_extra_regs
 {
  [ C(LL  ) ] = {
 	[ C(OP_READ) ] = {
-		[ C(RESULT_ACCESS) ] = DMND_DATA_RD|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = DMND_DATA_RD|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = NHM_DMND_READ|NHM_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = NHM_DMND_READ|NHM_L3_MISS,
 	},
 	[ C(OP_WRITE) ] = {
-		[ C(RESULT_ACCESS) ] = DMND_RFO|DMND_WB|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = DMND_RFO|DMND_WB|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = NHM_DMND_WRITE|NHM_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = NHM_DMND_WRITE|NHM_L3_MISS,
 	},
 	[ C(OP_PREFETCH) ] = {
-		[ C(RESULT_ACCESS) ] = PF_DATA_RD|PF_DATA_RFO|RESP_UNCORE_HIT,
-		[ C(RESULT_MISS)   ] = PF_DATA_RD|PF_DATA_RFO|RESP_MISS,
+		[ C(RESULT_ACCESS) ] = NHM_DMND_PREFETCH|NHM_L3_ACCESS,
+		[ C(RESULT_MISS)   ] = NHM_DMND_PREFETCH|NHM_L3_MISS,
 	},
  }
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_READ|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_READ|NHM_REMOTE_DRAM,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_WRITE|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_WRITE|NHM_REMOTE_DRAM,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_PREFETCH|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_PREFETCH|NHM_REMOTE_DRAM,
+	},
+ },
 };
 
 static __initconst const u64 nehalem_hw_cache_event_ids
@@ -483,6 +622,20 @@ static __initconst const u64 nehalem_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7, /* OFFCORE_RESP */
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 static __initconst const u64 core2_hw_cache_event_ids
@@ -574,6 +727,20 @@ static __initconst const u64 core2_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static __initconst const u64 atom_hw_cache_event_ids
@@ -665,6 +832,20 @@ static __initconst const u64 atom_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static void intel_pmu_disable_all(void)
@@ -1444,6 +1625,8 @@ static __init int intel_pmu_init(void)
 	case 42: /* SandyBridge */
 		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
+		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs,
+		       sizeof(hw_cache_extra_regs));
 
 		intel_pmu_lbr_init_nhm();
 
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 74507c1..e802c7e 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -554,6 +554,20 @@ static __initconst const u64 p4_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static u64 p4_general_events[PERF_COUNT_HW_MAX] = {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee9f1e7..df4a841 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -59,7 +59,7 @@ enum perf_hw_id {
 /*
  * Generalized hardware cache events:
  *
- *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x
+ *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
  *       { read, write, prefetch } x
  *       { accesses, misses }
  */
@@ -70,6 +70,7 @@ enum perf_hw_cache_id {
 	PERF_COUNT_HW_CACHE_DTLB		= 3,
 	PERF_COUNT_HW_CACHE_ITLB		= 4,
 	PERF_COUNT_HW_CACHE_BPU			= 5,
+	PERF_COUNT_HW_CACHE_NODE		= 6,
 
 	PERF_COUNT_HW_CACHE_MAX,		/* non-ABI */
 };



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 22:19           ` Peter Zijlstra
@ 2011-04-22 23:54             ` Andi Kleen
  2011-04-23  7:49               ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Andi Kleen @ 2011-04-22 23:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

> 
> #define SNB_PF_LLC_DATA_RD	(1 << 7)
> #define SNB_PF_LLC_RFO		(1 << 8)
> #define SNB_PF_LLC_IFETCH	(1 << 9)
> #define SNB_BUS_LOCKS		(1 << 10)
> #define SNB_STRM_ST		(1 << 11)
>         			/* hole */
> #define SNB_OFFCORE_OTHER	(1 << 15)
> #define SNB_COMMON		(1 << 16)
> #define SNB_NO_SUPP		(1 << 17)
> #define SNB_LLC_HITM		(1 << 18)
> #define SNB_LLC_HITE		(1 << 19)
> #define SNB_LLC_HITS		(1 << 20)
> #define SNB_LLC_HITF		(1 << 21)
> 				/* hole */
> #define SNB_SNP_NONE		(1 << 31)
> #define SNB_SNP_NOT_NEEDED	(1 << 32)
> #define SNB_SNP_MISS		(1 << 33)
> #define SNB_SNP_NO_FWD		(1 << 34)
> #define SNB_SNP_FWD		(1 << 35)
> #define SNB_HITM		(1 << 36)
> #define SNB_NON_DRAM		(1 << 37)
> 
> #define SNB_DMND_READ		(SNB_DMND_DATA_RD)
> #define SNB_DMND_WRITE		(SNB_DMND_RFO|SNB_DMND_WB|SNB_STRM_ST)
> #define SNB_DMND_PREFETCH	(SNB_PF_DATA_RD|SNB_PF_DATA_RFO)
> 
> Is what I came up with, but I'm stumped on how to construct:
> 
> #define SNB_L3_HIT

All the LLC hits together.

Or it can be done with the PEBS memory latency event (like Lin-Ming's patch) or 
with mem_load_uops_retired (but then only for loads)

> #define SNB_L3_MISS

Don't set any of the LLC bits


> 
> #define SNB_ALL_DRAM

Just don't set NON_DRAM


> #define SNB_REMOTE_DRAM

The current client SNBs for which those tables are don't have remote
DRAM.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 22:57           ` Peter Zijlstra
@ 2011-04-23  0:00             ` Andi Kleen
  2011-04-23  7:50               ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Andi Kleen @ 2011-04-23  0:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Sat, Apr 23, 2011 at 12:57:42AM +0200, Peter Zijlstra wrote:
> On Fri, 2011-04-22 at 23:54 +0200, Peter Zijlstra wrote:
> > On Fri, 2011-04-22 at 23:37 +0200, Peter Zijlstra wrote:
> > > The below needs filling out for !x86 (which I filled out with
> > > unsupported events) and x86 needs the offcore bits fixed to auto select
> > > between the two offcore events.
> > 
> > Urgh, so SNB has different MSR_OFFCORE_RESPONSE bits and needs another table.
> 
> Also, NHM offcore bits were wrong... it implemented _ACCESS as _HIT and

What is ACCESS if not a HIT?

> counted OTHER_CORE_HIT* as MISS even though its clearly documented as an
> L3 hit.

When the other core owns the cache line it has to be fetched from there.
That's not a LLC hit.

-Andi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 23:54             ` Andi Kleen
@ 2011-04-23  7:49               ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-23  7:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 16:54 -0700, Andi Kleen wrote:
> > 
> > #define SNB_PF_LLC_DATA_RD	(1 << 7)
> > #define SNB_PF_LLC_RFO		(1 << 8)
> > #define SNB_PF_LLC_IFETCH	(1 << 9)
> > #define SNB_BUS_LOCKS		(1 << 10)
> > #define SNB_STRM_ST		(1 << 11)
> >         			/* hole */
> > #define SNB_OFFCORE_OTHER	(1 << 15)
> > #define SNB_COMMON		(1 << 16)
> > #define SNB_NO_SUPP		(1 << 17)
> > #define SNB_LLC_HITM		(1 << 18)
> > #define SNB_LLC_HITE		(1 << 19)
> > #define SNB_LLC_HITS		(1 << 20)
> > #define SNB_LLC_HITF		(1 << 21)
> > 				/* hole */
> > #define SNB_SNP_NONE		(1 << 31)
> > #define SNB_SNP_NOT_NEEDED	(1 << 32)
> > #define SNB_SNP_MISS		(1 << 33)
> > #define SNB_SNP_NO_FWD		(1 << 34)
> > #define SNB_SNP_FWD		(1 << 35)
> > #define SNB_HITM		(1 << 36)
> > #define SNB_NON_DRAM		(1 << 37)
> > 
> > #define SNB_DMND_READ		(SNB_DMND_DATA_RD)
> > #define SNB_DMND_WRITE		(SNB_DMND_RFO|SNB_DMND_WB|SNB_STRM_ST)
> > #define SNB_DMND_PREFETCH	(SNB_PF_DATA_RD|SNB_PF_DATA_RFO)
> > 
> > Is what I came up with, but I'm stumped on how to construct:
> > 
> > #define SNB_L3_HIT
> 
> All the LLC hits together.

Bits 18-21 ?

> Or it can be done with the PEBS memory latency event (like Lin-Ming's patch) or 
> with mem_load_uops_retired (but then only for loads)
> 
> > #define SNB_L3_MISS
> 
> Don't set any of the LLC bits

So a 0 for the response type field? That's not valid. You have to set
some bit between 16-37.

> 
> > 
> > #define SNB_ALL_DRAM
> 
> Just don't set NON_DRAM

So bits 17-21|31-36 for the response type field?

That seems wrong as that would include what we previously defined to be
L3_HIT, which never makes it to DRAM.

> > #define SNB_REMOTE_DRAM
> 
> The current client SNBs for which those tables are don't have remote
> DRAM.

So what you're telling us is that simply because Intel hasn't shipped a
multi-socket SNB system yet they either:

  1) omitted a few bits from that table,
  2) have a completely different offcore response msr just for kicks?

Feh!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-23  0:00             ` Andi Kleen
@ 2011-04-23  7:50               ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-23  7:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Fri, 2011-04-22 at 17:00 -0700, Andi Kleen wrote:
> On Sat, Apr 23, 2011 at 12:57:42AM +0200, Peter Zijlstra wrote:
> > On Fri, 2011-04-22 at 23:54 +0200, Peter Zijlstra wrote:
> > > On Fri, 2011-04-22 at 23:37 +0200, Peter Zijlstra wrote:
> > > > The below needs filling out for !x86 (which I filled out with
> > > > unsupported events) and x86 needs the offcore bits fixed to auto select
> > > > between the two offcore events.
> > > 
> > > Urgh, so SNB has different MSR_OFFCORE_RESPONSE bits and needs another table.
> > 
> > Also, NHM offcore bits were wrong... it implemented _ACCESS as _HIT and
> 
> What is ACCESS if not a HIT?

An ACCESS is all requests for data that comes in, after which you either
HIT or MISS in which case you have to ask someone else down the line.

> > counted OTHER_CORE_HIT* as MISS even though its clearly documented as an
> > L3 hit.
> 
> When the other core owns the cache line it has to be fetched from there.
> That's not a LLC hit.

Then _why_ are they described in 30.6.1.3, table 30-15, as:

OTHER_CORE_HIT_SNP	 9	(R/W). L3 Hit: ....
OTHER_CORE_HITM		10	(R/W). L3 Hit: ...



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22 21:37       ` Peter Zijlstra
  2011-04-22 21:54         ` Peter Zijlstra
@ 2011-04-23  8:13         ` Ingo Molnar
  2011-07-01 15:23         ` [tip:perf/core] perf, arch: Add generic NODE cache events tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-23  8:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2011-04-22 at 10:06 +0200, Ingo Molnar wrote:
> > 
> > I'm about to push out the patch attached below - it lays out the arguments in 
> > detail. I don't think we have time to fix this properly for .39 - but memory 
> > profiling could be a nice feature for v2.6.40. 
> 
> Does something like the below provide enough generic infrastructure to
> allow the raw offcore bits again?

Yeah, this looks like a pretty good start - this is roughly the approach i 
outlined to Stephane and Andi, generic cache events extended with one more 
'node' level.

Andi, Stephane, if you'd like to see the Intel offcore bits supported in 2.6.40 
(or 2.6.41) please help out Peter with review, testing, tools/perf/ 
integration, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  8:06     ` Ingo Molnar
  2011-04-22 21:37       ` Peter Zijlstra
@ 2011-04-25 17:12       ` Vince Weaver
  2011-04-25 17:54         ` Ingo Molnar
  2011-04-26  9:25         ` Peter Zijlstra
  1 sibling, 2 replies; 39+ messages in thread
From: Vince Weaver @ 2011-04-25 17:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra


sorry for the late reply on this thread, it happened inconveniently over 
the long weekend.


On Fri, 22 Apr 2011, Ingo Molnar wrote:

> But this kind of usability is absolutely unacceptable - users should not
> be expected to type in magic, CPU and model specific incantations to get
> access to useful hardware functionality.

That's why people use libpfm4.  or PAPI.  And they do.

Current PAPI snapshots support offcore response on recent git kernels.
With full names, no hex values, thanks to libpfm4.

All the world is not perf.

> The proper solution is to expose useful offcore functionality via
> generalized events - that way users do not have to care which specific
> CPU model they are using, they can use the conceptual event and not some
> model specific quirky hexa number.

No no no no.

Blocking access to raw events is the wrong idea.  If anything, the whole 
"generic events" thing in the kernel should be ditched.  Wrong events are 
used at times (see AMD branch events a few releases back, now Nehalem 
cache events).  This all belongs in userspace, as was pointed out at the 
start.  The kernel has no business telling users which perf events are 
interesting, or limiting them!  What is this, windows?

If you do block access to any raw events, we're going to have to start 
recommending people ditch perf_events and start patching the kernel with 
perfctr again.  We already do for P4/netburst users, as Pentium 4 support 
is currently hosed due to NMI event conflicts.

Also with perfctr it's much easier to get low-latency access to the 
counters.  See:
  http://web.eecs.utk.edu/~vweaver1/projects/papi-cost/

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 17:12       ` [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Vince Weaver
@ 2011-04-25 17:54         ` Ingo Molnar
  2011-04-25 21:46           ` Vince Weaver
  2011-04-26  9:25         ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2011-04-25 17:54 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> [...] The kernel has no business telling users which perf events are 
> interesting, or limiting them! [...]

The policy is very simple and common-sense: if a given piece of PMU 
functionality is useful enough to be exposed via a raw interface, then
it must be useful enough to be generalized as well.

> [...]  What is this, windows?

FYI, this is how the Linux kernel has operated from day 1 on: we support 
hardware features to abstract useful highlevel functionality out of it.
I would not expect this to change anytime soon.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 17:54         ` Ingo Molnar
@ 2011-04-25 21:46           ` Vince Weaver
  2011-04-25 22:12             ` Andi Kleen
                               ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Vince Weaver @ 2011-04-25 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra

On Mon, 25 Apr 2011, Ingo Molnar wrote:

> 
> * Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> 
> > [...] The kernel has no business telling users which perf events are 
> > interesting, or limiting them! [...]
> 
> The policy is very simple and common-sense: if a given piece of PMU 
> functionality is useful enough to be exposed via a raw interface, then
> it must be useful enough to be generalized as well.

what does that even mean?  How do you "generalize" a functionality like 
writing a value to an auxiliary MSR register?

The PAPI tool was using the perf_events interface in the 2.6.39-git 
kernels to collect offcore response results by properly setting the 
config1 register on Nehalem and Westmere machines.

Now it has been disabled for unclear reasons.

Could you at least have some sort of relevant errno value set in this 
case?  It's a real pain in userspace code to try to sort out the 
perf_event return values to find out if a feature is supported,
unsupported (lack of hardware), unsupported (not implemented yet),
unsupported (disabled due to whim of kernel developer), unsupported
(because you have some sort of configuration conflict).

> > [...]  What is this, windows?
> 
> FYI, this is how the Linux kernel has operated from day 1 on: we support 
> hardware features to abstract useful highlevel functionality out of it.
> I would not expect this to change anytime soon.

I started using Linux because it actually let me use my hardware without 
interfering with what I was trying to do.  Not because it disabled access 
to the hardware due to some perceived lack of generalization in an extra 
unncessary software translation layer.

Vince
vweaver1@eecs.utk.edu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 21:46           ` Vince Weaver
@ 2011-04-25 22:12             ` Andi Kleen
  2011-04-26  7:23               ` Ingo Molnar
  2011-04-26  7:38             ` Ingo Molnar
  2011-04-26  9:49             ` Peter Zijlstra
  2 siblings, 1 reply; 39+ messages in thread
From: Andi Kleen @ 2011-04-25 22:12 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra,
	torvalds

> The PAPI tool was using the perf_events interface in the 2.6.39-git 
> kernels to collect offcore response results by properly setting the 
> config1 register on Nehalem and Westmere machines.

I already had some users for this functionality too. Offcore
events are quite useful for various analysis: basically every time
you have a memory performance problem -- especially a NUMA
problem -- they can help you a lot tracking it down.

They answer questions like "who accesses memory on another node"

As far as I'm concerned b52c55c6a25e4515b5e075a989ff346fc251ed09 
is a bad feature regression.

> 
> Now it has been disabled for unclear reasons.

Also unfortunately only partial. Previously you could at least
write the MSR from user space through /dev/cpu/*/msr, but now the kernel 
randomly rewrites it if anyone else uses cache events.

Right now I have some frontend scripts which are doing this,
but it's really quite nasty.

It's very sad we have to go through this.

-Andi


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 22:12             ` Andi Kleen
@ 2011-04-26  7:23               ` Ingo Molnar
  0 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-26  7:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Vince Weaver, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra,
	torvalds


* Andi Kleen <ak@linux.intel.com> wrote:

> > Now it has been disabled for unclear reasons.
> 
> Also unfortunately only partial. Previously you could at least write the MSR 
> from user space through /dev/cpu/*/msr, but now the kernel randomly rewrites 
> it if anyone else uses cache events.

Ugh, that's an unbelievable hack - if you hack an active PMU via writing to it 
via /dev/cpu/*/msr and it breaks you really get to keep the pieces. There's a 
reason why those devices are root only - it's as if you wrote to a filesystem 
that is already mounted!

If your user-space twiddling scripts go bad who knows what state the CPU gets 
into and you might be reporting bogus bugs. I think writing to those msrs 
directly should probably taint the kernel: i'll prepare a patch for that.

> It's very sad we have to go through this.

Not really, it took Peter 10 minutes to come up with an RFC patch to extend the 
cache events in a meaningful way - and that was actually more useful to users 
than all prior offcore patches combined. So the kernel already won from this 
episode.

We are not at all interested in hiding PMU functionality and keeping it 
unstructured, and just passing through some opaque raw ABI to user-space.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 21:46           ` Vince Weaver
  2011-04-25 22:12             ` Andi Kleen
@ 2011-04-26  7:38             ` Ingo Molnar
  2011-04-26 20:51               ` Vince Weaver
  2011-04-26  9:49             ` Peter Zijlstra
  2 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2011-04-26  7:38 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Mon, 25 Apr 2011, Ingo Molnar wrote:
> 
> > 
> > * Vince Weaver <vweaver1@eecs.utk.edu> wrote:
> > 
> > > [...] The kernel has no business telling users which perf events are 
> > > interesting, or limiting them! [...]
> > 
> > The policy is very simple and common-sense: if a given piece of PMU 
> > functionality is useful enough to be exposed via a raw interface, then
> > it must be useful enough to be generalized as well.
> 
> what does that even mean?  How do you "generalize" a functionality like 
> writing a value to an auxiliary MSR register?

Here are a few examples:

 - the pure act of task switching sometimes involves writing to MSRs. How is it
   generalized? The concept of 'processes/threads' is offered to user-space and
   thus this functionality is generalized - the raw MSRs are not just passed
   through to user-space.

 - a wide range of VMX (virtualization) functionality on Intel CPUs operates via 
   writing special values to specific MSR registers. How is it 'generalized'? A 
   meaningful, structured ABI is provided to user-space in form of the KVM 
   device and associated semantics. The raw MSRs are not just passed through to
   user-space.

 - the ability of CPUs to change frequency is offered via writing special
   values to special MSRs. How is this generalized? The cpufreq subsystem 
   offers a frequency/cpu API and associated abstractions - the raw MSRs are 
   not just passed through to user-space.

 - in the context of perf events we generalize the concept of an 'event' and
   we abstract out common, CPU model neutral CPU hardware concepts like 
   'cycles', 'instructions', 'branches' and a simplified cache hierarchy - and 
   offer those events as generic events to user-space. We do not just pass the 
   raw MSRs through to user-space.

 - [ etc. - a lot of useful CPU functionality is MSR driven, the PMU is nothing
     special there. ]

The kernel development process is in essence an abstraction engine, and if you 
expect something else you'll probably be facing a lot of frustrating episodes 
in the future as well where others try to abstract out meaningful 
generalizations.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 17:12       ` [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Vince Weaver
  2011-04-25 17:54         ` Ingo Molnar
@ 2011-04-26  9:25         ` Peter Zijlstra
  2011-04-26 20:33           ` Vince Weaver
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-26  9:25 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Mon, 2011-04-25 at 13:12 -0400, Vince Weaver wrote:
> On Fri, 22 Apr 2011, Ingo Molnar wrote:
> 
> > But this kind of usability is absolutely unacceptable - users should not
> > be expected to type in magic, CPU and model specific incantations to get
> > access to useful hardware functionality.
> 
> That's why people use libpfm4.  or PAPI.  And they do.

And how is typing in hex numbers different from typing in model specific
event names? All the same to me, you still need to understand your micro
architecture very thoroughly and read the SDMs.

PAPI actually has 'generalized' events, but I guess you're going to tell
me nobody uses those since they're not useful.

> Current PAPI snapshots support offcore response on recent git kernels.
> With full names, no hex values, thanks to libpfm4.
> 
> All the world is not perf.

I know, all the world is interested in investing tons of time learning
about their one architecture and extract the last few percent of
performance.

And that is fine for those few people who can afford it, but generally
optimizing for a single specific platform isn't cost effective.

I looks like you're all so stuck in your HPC/lowlevel way of things
you're not even realizing there's much more to be gained by providing
easy and useful tools to the general public, stuff that works similarly
across architectures.

> > The proper solution is to expose useful offcore functionality via
> > generalized events - that way users do not have to care which specific
> > CPU model they are using, they can use the conceptual event and not some
> > model specific quirky hexa number.
> 
> No no no no.
> 
> Blocking access to raw events is the wrong idea.  If anything, the whole 
> "generic events" thing in the kernel should be ditched.  Wrong events are 
> used at times (see AMD branch events a few releases back, now Nehalem 
> cache events).  This all belongs in userspace, as was pointed out at the 
> start.  The kernel has no business telling users which perf events are 
> interesting, or limiting them!  What is this, windows?

The kernel has no place scheduling pmcs either I expect, or scheduling
tasks for that matter.

We all know you don't believe in upgrading kernels or in kernels very
much at all.

> If you do block access to any raw events, we're going to have to start 
> recommending people ditch perf_events and start patching the kernel with 
> perfctr again.  We already do for P4/netburst users, as Pentium 4 support 
> is currently hosed due to NMI event conflicts.

Very constructive attitude, instead of helping you simply subvert and
route around, thanks man! 

You could of course a) simply disable the NMI watchdog, or b) improve
the space-heater (aka. P4) PMU implementation to use alternative
encodings -- from what I understood the problem with P4 is that there's
multiple ways to encode the same event and currently if you take one it
doesn't try others.

> Also with perfctr it's much easier to get low-latency access to the 
> counters.  See:
>   http://web.eecs.utk.edu/~vweaver1/projects/papi-cost/

And why is that? is that the lack of userspace rdpmc? That should be
possible with perf, powerpc actually does that already. Various people
mentioned wanting to make this work on x86 but I've yet to see a patch.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-25 21:46           ` Vince Weaver
  2011-04-25 22:12             ` Andi Kleen
  2011-04-26  7:38             ` Ingo Molnar
@ 2011-04-26  9:49             ` Peter Zijlstra
  2 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2011-04-26  9:49 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Mon, 2011-04-25 at 17:46 -0400, Vince Weaver wrote:
> > The policy is very simple and common-sense: if a given piece of PMU 
> > functionality is useful enough to be exposed via a raw interface, then
> > it must be useful enough to be generalized as well.
> 
> what does that even mean?  How do you "generalize" a functionality like 
> writing a value to an auxiliary MSR register? 

Come-on Vince, I know you're smarter than that!

The external register is simply an extension of the configuration space,
instead of the normal evsel msr you get evsel:offcore pairs. After that
its simply scheduling them right.

It simply adds more events to the PMU (in a rather sad way, it would
have been so much nicer if Intel had simply extended the evsel MSR for
every PMC, they could have also used that for the load-latency thing
etc.)

Now, these extra events offered are L3 and NUMA events, the 'common'
interesting set is mostly covered by Andi's LLC mods and my NODE
extension, after that there's mostly details left in offcore.

So the writing of an extra MSR is totally irrelevant, its the extra
events that are.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26  9:25         ` Peter Zijlstra
@ 2011-04-26 20:33           ` Vince Weaver
  2011-04-26 21:19             ` Cyrill Gorcunov
  2011-04-27  6:43             ` Ingo Molnar
  0 siblings, 2 replies; 39+ messages in thread
From: Vince Weaver @ 2011-04-26 20:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Tue, 26 Apr 2011, Peter Zijlstra wrote:

> > That's why people use libpfm4.  or PAPI.  And they do.
> 
> And how is typing in hex numbers different from typing in model specific
> event names? 

Reall... quick, tell me what event 0x53cf28 corresponds to on a core2.

Now if I said L2_IFETCH:BOTH_CORES you know several things about what it 
is.

Plus, you can do a quick search in the Intel Arch manual and find more 
info.  With the hex value you have to do some shifting and masking by hand 
before looking up.

An even worse problem:

Quick... tell me what actual hardware event L1-dcache-loads corresponds
to on an L2.  Can you tell without digging through kernel source code?

Does that event include prefetches?  Does it include speculative events?
Does it count page walks?  Does it overcount by an amount equal to the 
number of hardware interrupts?   If I use the equivelent event on an 
AMD64, will all the same hold?

> PAPI actually has 'generalized' events, but I guess you're going to tell
> me nobody uses those since they're not useful.

Of course people use them.  But we don't _force_ people to use them.  We 
don't disable access to raw events.  Although alarmingly it seems like the 
kernel is going to start to, possibly meaning even our users can't use our
'generalized' events if for example they incorporate OFFCORE_RESPONSE.

Another issue:  if a problem is found with one of the PAPI events, they 
can update and recompile and run out of their own account at will.

If there's a problem with a kernel generalized events, you have to 
reinstall a kernel.  Something many users can't do.  For example, your 
Nehalem cache fixes will be in 2.6.39.  How long until that appears in a 
stock distro?  How long until that appears in an RHEL release?

> > All the world is not perf.
> 
> I know, all the world is interested in investing tons of time learning
> about their one architecture and extract the last few percent of
> performance.

There are people out there who have been using perf counters on UNIX/Linux 
machines for decades.  They know what events they want to measure.  They 
are not interested in having the kernel tell them they can't do it.

> I looks like you're all so stuck in your HPC/lowlevel way of things
> you're not even realizing there's much more to be gained by providing
> easy and useful tools to the general public, stuff that works similarly
> across architectures.

We're not saying people can't use perf.  Who knows, maybe PAPI will go 
away becayse perf is so good.  It's just silly to block out access to RAW 
events on the argument that "it's too hard".  Again, are we Microsoft 
here?

> Very constructive attitude, instead of helping you simply subvert and
> route around, thanks man! 

I spent a lot of time trying to fix P4 support back in the 2.6.35 days.
I only have so much time to spend on this stuff. 

When people complain about p4 support, I direct them to Cyrill et al.  I 
can't force them to become kernel developers.  Usually they want immediate 
results, which they can get with perfctr.

People want offcore response.  People want uncore access.  People want raw 
event access.  I can tell them "send a patch to the kernel, it'll
languish in obscurity for years and maybe in 2.6.4x you'll see it".  Or 
they can have support today with an outside patch.  Which do you think 
they choose?

> And why is that? is that the lack of userspace rdpmc? That should be
> possible with perf, powerpc actually does that already. Various people
> mentioned wanting to make this work on x86 but I've yet to see a patch.

We at the PAPI project welcome any patches you'd care to contribute to our 
project too, to make things better.  It goes both ways you know.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26  7:38             ` Ingo Molnar
@ 2011-04-26 20:51               ` Vince Weaver
  2011-04-27  6:52                 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Vince Weaver @ 2011-04-26 20:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra

On Tue, 26 Apr 2011, Ingo Molnar wrote:

> The kernel development process is in essence an abstraction engine, and if you 
> expect something else you'll probably be facing a lot of frustrating episodes 
> in the future as well where others try to abstract out meaningful 
> generalizations.

yes, but you are taking abstraction to the extreme.

A filesystem abstracts out the access to raw disk... but under Linux we 
still allow raw access to /dev/sda

TCP/IP abstracts out the access to the network... but under Linux we still 
allow creating raw packets.

It is fine to have some sort of high-level abstraction of perf events for 
those who don't have PhDs in computer architecture.  Fine.  But don't get 
in the way of people who know what they are doing.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26 20:33           ` Vince Weaver
@ 2011-04-26 21:19             ` Cyrill Gorcunov
  2011-04-26 21:25               ` Don Zickus
  2011-04-27  6:43             ` Ingo Molnar
  1 sibling, 1 reply; 39+ messages in thread
From: Cyrill Gorcunov @ 2011-04-26 21:19 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel, Andi Kleen, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Don Zickus

On 04/27/2011 12:33 AM, Vince Weaver wrote:
...
> 
> I spent a lot of time trying to fix P4 support back in the 2.6.35 days.
> I only have so much time to spend on this stuff. 
> 
> When people complain about p4 support, I direct them to Cyrill et al.  I 
> can't force them to become kernel developers.  Usually they want immediate 
> results, which they can get with perfctr.
> 

  Vince I've not read the whole thread so no idea what is all about, but if you
have some p4 machines and have some will to help -- mind to test the patch below,
it should fix nmi-watchdog and cycles conflict. It's utter raw RFC (and i know there
is a nit i should update) but still might be interesting to see the results.
Untested.
-- 
perf, x86: P4 PMU -- Introduce alternate events v3

Alternate events are used to increase perf subsystem counter usage.
In general the idea is to find an "alternate" event (if there is
one) which do count the same magnitude as former event but use
different counter allowing them to run simultaneously with original
event.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
 arch/x86/include/asm/perf_event_p4.h |    6 ++
 arch/x86/kernel/cpu/perf_event_p4.c  |   74 ++++++++++++++++++++++++++++++++++-
 2 files changed, 78 insertions(+), 2 deletions(-)

Index: linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
=====================================================================
--- linux-2.6.git.orig/arch/x86/include/asm/perf_event_p4.h
+++ linux-2.6.git/arch/x86/include/asm/perf_event_p4.h
@@ -36,6 +36,10 @@
 #define P4_ESCR_T1_OS		0x00000002U
 #define P4_ESCR_T1_USR		0x00000001U

+#define P4_ESCR_USR_MASK			\
+	(P4_ESCR_T0_OS | P4_ESCR_T0_USR |	\
+	P4_ESCR_T1_OS | P4_ESCR_T1_USR)
+
 #define P4_ESCR_EVENT(v)	((v) << P4_ESCR_EVENT_SHIFT)
 #define P4_ESCR_EMASK(v)	((v) << P4_ESCR_EVENTMASK_SHIFT)
 #define P4_ESCR_TAG(v)		((v) << P4_ESCR_TAG_SHIFT)
@@ -839,5 +843,7 @@ enum P4_PEBS_METRIC {
  *       31:                    reserved (HT thread)
  */

+#define P4_INVALID_CONFIG	(u64)~0
+
 #endif /* PERF_EVENT_P4_H */

Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -609,6 +609,31 @@ static u64 p4_general_events[PERF_COUNT_
 	p4_config_pack_cccr(P4_CCCR_EDGE | P4_CCCR_COMPARE),
 };

+/*
+ * Alternate events allow us to find substitution for an event if
+ * it's already borrowed, so they may be considered as event aliases.
+ */
+struct p4_alt_event {
+	unsigned int	event;
+	u64		config;
+} p4_alternate_events[]= {
+	{
+		.event		= P4_EVENT_GLOBAL_POWER_EVENTS,
+		.config		=
+			p4_config_pack_escr(P4_ESCR_EVENT(P4_EVENT_EXECUTION_EVENT)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS0)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS1)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS2)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS3)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS0)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS1)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS2)		|
+				P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS3))		|
+			p4_config_pack_cccr(P4_CCCR_THRESHOLD(15) | P4_CCCR_COMPLEMENT		|
+				P4_CCCR_COMPARE),
+	},
+};
+
 static struct p4_event_bind *p4_config_get_bind(u64 config)
 {
 	unsigned int evnt = p4_config_unpack_event(config);
@@ -620,6 +645,18 @@ static struct p4_event_bind *p4_config_g
 	return bind;
 }

+static u64 p4_find_alternate_config(unsigned int evnt)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(p4_alternate_events); i++) {
+		if (evnt == p4_alternate_events[i].event)
+			return p4_alternate_events[i].config;
+	}
+
+	return P4_INVALID_CONFIG;
+}
+
 static u64 p4_pmu_event_map(int hw_event)
 {
 	struct p4_event_bind *bind;
@@ -1133,8 +1170,41 @@ static int p4_pmu_schedule_events(struct
 		}

 		cntr_idx = p4_next_cntr(thread, used_mask, bind);
-		if (cntr_idx == -1 || test_bit(escr_idx, escr_mask))
-			goto done;
+		if (cntr_idx == -1 || test_bit(escr_idx, escr_mask)) {
+
+			/*
+			 * So the former event already accepted to run
+			 * and the only way to success here is to use
+			 * an alternate event.
+			 */
+			const u64 usr_mask = p4_config_pack_escr(P4_ESCR_USR_MASK);
+			u64 alt_config;
+			unsigned int event;
+
+			event		= p4_config_unpack_event(hwc->config);
+			alt_config	= p4_find_alternate_config(event);
+
+			if (alt_config == P4_INVALID_CONFIG)
+				goto done;
+
+			bind = p4_config_get_bind(alt_config);
+			escr_idx = p4_get_escr_idx(bind->escr_msr[thread]);
+			if (unlikely(escr_idx == -1))
+				goto done;
+
+			cntr_idx = p4_next_cntr(thread, used_mask, bind);
+			if (cntr_idx == -1 || test_bit(escr_idx, escr_mask))
+				goto done;
+
+			/*
+			 * This is a destructive operation we're going
+			 * to make. We substitute the former config with
+			 * alternate one to continue tracking it after.
+			 * Be carefull and don't kill the custom bits
+			 * in the former config.
+			 */
+			hwc->config = (hwc->config & usr_mask) | alt_config;
+		}

 		p4_pmu_swap_config_ts(hwc, cpu);
 		if (assign)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26 21:19             ` Cyrill Gorcunov
@ 2011-04-26 21:25               ` Don Zickus
  2011-04-26 21:33                 ` Cyrill Gorcunov
  0 siblings, 1 reply; 39+ messages in thread
From: Don Zickus @ 2011-04-26 21:25 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Vince Weaver, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Wed, Apr 27, 2011 at 01:19:07AM +0400, Cyrill Gorcunov wrote:
>   Vince I've not read the whole thread so no idea what is all about, but if you
> have some p4 machines and have some will to help -- mind to test the patch below,
> it should fix nmi-watchdog and cycles conflict. It's utter raw RFC (and i know there
> is a nit i should update) but still might be interesting to see the results.
> Untested.
> -- 
> perf, x86: P4 PMU -- Introduce alternate events v3

Unfortunately it just panic'd for me when I ran

perf record grep -r don /

Thoughts?

Cheers,
Don

redfish.lab.bos.redhat.com login: BUG: unable to handle kernel NULL
pointer dereference at 0000000000000008
IP: [<ffffffff8101ff60>] p4_pmu_schedule_events+0xb0/0x4c0
PGD 2c603067 PUD 2d617067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/online
CPU 2 
Modules linked in: autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log
uinput ppdev e1000 parport_pc parport sg dcdbas pcspkr snd_intel8x0
snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm sn]

Pid: 1734, comm: grep Not tainted 2.6.39-rc3usb3-latest+ #339 Dell Inc.
Precision WorkStation 470    /0P7996
RIP: 0010:[<ffffffff8101ff60>]  [<ffffffff8101ff60>]
p4_pmu_schedule_events+0xb0/0x4c0
RSP: 0018:ffff88003fb03b18  EFLAGS: 00010016
RAX: 000000000000003c RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88003c30de00 RSI: 0000000000000004 RDI: 000000000000000f
RBP: ffff88003fb03bb8 R08: 0000000000000001 R09: 0000000000000001
R10: 000000000000006d R11: ffff88003acb4ae8 R12: ffff88002d490c00
R13: ffff88003fb03b78 R14: 0000000000000001 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88003fb00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000002d728000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process grep (pid: 1734, threadinfo ffff88002d648000, task
ffff88003acb4240)
Stack:
 ffff880000000014 ffff88003acb4b10 b00002030803c000 0000000000000003
 0000000200000001 ffff88003fb03bc8 0000000100000002 ffff88003fb03bcc
 0000000181a24ee0 ffff88003fb0cd48 0000000000000008 0000000000000000
Call Trace:
 <IRQ> 
 [<ffffffff8101b9e1>] ? x86_pmu_add+0xb1/0x170
 [<ffffffff8101b8bf>] x86_pmu_commit_txn+0x5f/0xb0
 [<ffffffff810ff0c4>] ? perf_event_update_userpage+0xa4/0xe0
 [<ffffffff810ff020>] ? perf_output_end+0x60/0x60
 [<ffffffff81100dca>] group_sched_in+0x8a/0x160
 [<ffffffff8110100b>] ctx_sched_in+0x16b/0x1d0
 [<ffffffff811017ce>] perf_event_task_tick+0x1de/0x260
 [<ffffffff8104fc1e>] scheduler_tick+0xde/0x2b0
 [<ffffffff81096e20>] ? tick_nohz_handler+0x100/0x100


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26 21:25               ` Don Zickus
@ 2011-04-26 21:33                 ` Cyrill Gorcunov
  0 siblings, 0 replies; 39+ messages in thread
From: Cyrill Gorcunov @ 2011-04-26 21:33 UTC (permalink / raw)
  To: Don Zickus
  Cc: Vince Weaver, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On 04/27/2011 01:25 AM, Don Zickus wrote:
> On Wed, Apr 27, 2011 at 01:19:07AM +0400, Cyrill Gorcunov wrote:
>>   Vince I've not read the whole thread so no idea what is all about, but if you
>> have some p4 machines and have some will to help -- mind to test the patch below,
>> it should fix nmi-watchdog and cycles conflict. It's utter raw RFC (and i know there
>> is a nit i should update) but still might be interesting to see the results.
>> Untested.
>> -- 
>> perf, x86: P4 PMU -- Introduce alternate events v3
> 
> Unfortunately it just panic'd for me when I ran
> 
> perf record grep -r don /
> 
> Thoughts?
> 
> Cheers,
> Don
> 
> redfish.lab.bos.redhat.com login: BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000008
> IP: [<ffffffff8101ff60>] p4_pmu_schedule_events+0xb0/0x4c0
> PGD 2c603067 PUD 2d617067 PMD 0 
> Oops: 0000 [#1] SMP 
> last sysfs file: /sys/devices/system/cpu/online
> CPU 2 
> Modules linked in: autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log
> uinput ppdev e1000 parport_pc parport sg dcdbas pcspkr snd_intel8x0
> snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm sn]
> 
> Pid: 1734, comm: grep Not tainted 2.6.39-rc3usb3-latest+ #339 Dell Inc.
> Precision WorkStation 470    /0P7996
> RIP: 0010:[<ffffffff8101ff60>]  [<ffffffff8101ff60>]
> p4_pmu_schedule_events+0xb0/0x4c0
> RSP: 0018:ffff88003fb03b18  EFLAGS: 00010016
> RAX: 000000000000003c RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88003c30de00 RSI: 0000000000000004 RDI: 000000000000000f
> RBP: ffff88003fb03bb8 R08: 0000000000000001 R09: 0000000000000001
> R10: 000000000000006d R11: ffff88003acb4ae8 R12: ffff88002d490c00
> R13: ffff88003fb03b78 R14: 0000000000000001 R15: 0000000000000001
> FS:  0000000000000000(0000) GS:ffff88003fb00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 000000002d728000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process grep (pid: 1734, threadinfo ffff88002d648000, task
> ffff88003acb4240)
> Stack:
>  ffff880000000014 ffff88003acb4b10 b00002030803c000 0000000000000003
>  0000000200000001 ffff88003fb03bc8 0000000100000002 ffff88003fb03bcc
>  0000000181a24ee0 ffff88003fb0cd48 0000000000000008 0000000000000000
> Call Trace:
>  <IRQ> 
>  [<ffffffff8101b9e1>] ? x86_pmu_add+0xb1/0x170
>  [<ffffffff8101b8bf>] x86_pmu_commit_txn+0x5f/0xb0
>  [<ffffffff810ff0c4>] ? perf_event_update_userpage+0xa4/0xe0
>  [<ffffffff810ff020>] ? perf_output_end+0x60/0x60
>  [<ffffffff81100dca>] group_sched_in+0x8a/0x160
>  [<ffffffff8110100b>] ctx_sched_in+0x16b/0x1d0
>  [<ffffffff811017ce>] perf_event_task_tick+0x1de/0x260
>  [<ffffffff8104fc1e>] scheduler_tick+0xde/0x2b0
>  [<ffffffff81096e20>] ? tick_nohz_handler+0x100/0x100
> 

Ouch, I bet p4_config_get_bind returned NULL and here we are. Weird,
seems I've missed something. Don, I'll continue tomorrow, ok? (kinda
sleep already).

-- 
    Cyrill

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26 20:33           ` Vince Weaver
  2011-04-26 21:19             ` Cyrill Gorcunov
@ 2011-04-27  6:43             ` Ingo Molnar
  2011-04-28 22:10               ` Vince Weaver
  1 sibling, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2011-04-27  6:43 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel,
	Andi Kleen, Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Tue, 26 Apr 2011, Peter Zijlstra wrote:
> 
> > > That's why people use libpfm4.  or PAPI.  And they do.
> > 
> > And how is typing in hex numbers different from typing in model specific
> > event names? 
> 
> Reall... quick, tell me what event 0x53cf28 corresponds to on a core2.
> 
> Now if I said L2_IFETCH:BOTH_CORES you know several things about what it is.

Erm, that assumes you already know that magic incantation. Most of the users 
who want to do measurements and profiling do not know that. So there's little 
difference between:

 - someone shows them the 0x53cf28 magic code
 - someone shows them the L2_IFETCH:BOTH_CORES magic symbol

So you while hexa values have like 10% utility, the stupid, vendor-specific 
event names you are pushing here have like 15% utility.

In perf we are aiming for 100% utility, where if someone knows something about 
CPUs and can type 'cycles', 'instructions' or 'branches', will get the obvious 
result.

This is not a difficult usability concept really.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-26 20:51               ` Vince Weaver
@ 2011-04-27  6:52                 ` Ingo Molnar
  2011-04-28 22:16                   ` Vince Weaver
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2011-04-27  6:52 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Tue, 26 Apr 2011, Ingo Molnar wrote:
> 
> > The kernel development process is in essence an abstraction engine, and if 
> > you expect something else you'll probably be facing a lot of frustrating 
> > episodes in the future as well where others try to abstract out meaningful 
> > generalizations.
> 
> yes, but you are taking abstraction to the extreme.

Firstly, that claim is a far cry from your original claim:

   ' How do you "generalize" a functionality like writing a value to an auxiliary
     MSR register? '

... so i guess you conceded the point at least partially, without actually 
openly and honestly conceding the point?

Secondly, you are still quite wrong even with your revised opinion. Being able 
to type '-e cycles' and '-e instructions' in perf and get ... cycles and 
instructions counts/events, and the kernel helping that kind of approach is not 
'abstraction to the extreme', it's called 'common sense'.

The fact that perfmon and oprofile works via magic vendor-specific event string 
incantations is one of the many design failures of those projects - not a 
virtue.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-27  6:43             ` Ingo Molnar
@ 2011-04-28 22:10               ` Vince Weaver
  0 siblings, 0 replies; 39+ messages in thread
From: Vince Weaver @ 2011-04-28 22:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel,
	Andi Kleen, Stephane Eranian, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner

On Wed, 27 Apr 2011, Ingo Molnar wrote:

> 
> Erm, that assumes you already know that magic incantation. Most of the users 
> who want to do measurements and profiling do not know that. So there's little 
> difference between:
> 
>  - someone shows them the 0x53cf28 magic code
>  - someone shows them the L2_IFETCH:BOTH_CORES magic symbol
> 
> So you while hexa values have like 10% utility, the stupid, vendor-specific 
> event names you are pushing here have like 15% utility.
> 
> In perf we are aiming for 100% utility, where if someone knows something about 
> CPUs and can type 'cycles', 'instructions' or 'branches', will get the obvious 
> result.
> 
> This is not a difficult usability concept really.

yes, and this functionality belongs in the perf tool itself (or some other 
user tool, like libpfm4, or PAPI).  Not in the kernel.

How much larger are you willing to make the kernel to hold your 
generalized events?  PAPI has at least 128 that people have found useful 
enough to add over the years.  There are probably more.

I notice the kernel doesn't have any FP or SSE/Vector counts yet.  Or 
uops.  Or hw-interrupt counts.  Fused multiply-add?  
How about GPU counters (PAPI is starting to support these)?  Network 
counters?  Infiniband?

You're being lazy and pushing "perf" functionality into the kernel.  It 
belongs in userspace.  

It's not the kernel's job to make things easy for users.  Its job is to 
make things possible, and get out of the way.

It's already bad enough that your generalized events can change from 
kernel version to kernel version without warning.  By being in the kernel, 
aren't they a stable ABI that can't be changed?

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-27  6:52                 ` Ingo Molnar
@ 2011-04-28 22:16                   ` Vince Weaver
  2011-04-28 23:30                     ` Thomas Gleixner
                                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Vince Weaver @ 2011-04-28 22:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra

On Wed, 27 Apr 2011, Ingo Molnar wrote:

> Secondly, you are still quite wrong even with your revised opinion. Being able 
> to type '-e cycles' and '-e instructions' in perf and get ... cycles and 
> instructions counts/events, and the kernel helping that kind of approach is not 
> 'abstraction to the extreme', it's called 'common sense'.

by your logic I should be able to delete a file by saying
  echo "delete /tmp/tempfile" > /dev/sdc1
because using unlink() is too low of an abstraction and confusing to the 
user.

> The fact that perfmon and oprofile works via magic vendor-specific event string 
> incantations is one of the many design failures of those projects - not a 
> virtue.

Well we disagree.  I think one of perf_events biggest failings (among 
many) is that these generalized event definitions are shoved into the 
kernel.  At least it bloats the kernel in an option commonly turned on by 
vendors.  At worst it gives users a full sense of security in thinking 
these counters are A). Portable across architectures and B). Actually 
measure what they say they do.

I know it is fun to reinvent the wheel, but you ignored decades of 
experience in dealing with perf-counters when you ran off and invented 
perf_events.  It will bite you eventually.

Vince

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-28 22:16                   ` Vince Weaver
@ 2011-04-28 23:30                     ` Thomas Gleixner
  2011-04-29  2:28                     ` Andi Kleen
  2011-04-29 19:32                     ` Ingo Molnar
  2 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2011-04-28 23:30 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Peter Zijlstra

Vince,

On Thu, 28 Apr 2011, Vince Weaver wrote:

> On Wed, 27 Apr 2011, Ingo Molnar wrote:
> 
> > Secondly, you are still quite wrong even with your revised opinion. Being able 
> > to type '-e cycles' and '-e instructions' in perf and get ... cycles and 
> > instructions counts/events, and the kernel helping that kind of approach is not 
> > 'abstraction to the extreme', it's called 'common sense'.
> 
> by your logic I should be able to delete a file by saying
>   echo "delete /tmp/tempfile" > /dev/sdc1
> because using unlink() is too low of an abstraction and confusing to the 
> user.

Your definition of 'common sense' seems to be rather backwards.

> > The fact that perfmon and oprofile works via magic vendor-specific event string 
> > incantations is one of the many design failures of those projects - not a 
> > virtue.
> 
> Well we disagree.  I think one of perf_events biggest failings (among 
> many) is that these generalized event definitions are shoved into the 

Put the failings on the table if you think there are any real
ones.

The generalized event definitions are debatable, but Ingo's argument
that they fulfil the common sense level is definitely a strong enough
one to keep them.

The problem at hand which ignited this flame war is definitely
borderline and I don't agree with Ingo that it should not made be
available right now in the raw form. That's an hardware enablement
feature which can be useful even if tools/perf has not support for it
and we have no generalized event for it. That's two different
stories. perf has always allowed to use raw events and I don't see a
reason why we should not do that in this case if it enables a subset
of the perf userbase to make use of it.

> kernel.  At least it bloats the kernel in an option commonly turned on by 

Well compared to the back then proposed perfmon kernel bloat, that's
really nothing you should whine about.

> vendors.  At worst it gives users a full sense of security in thinking 
> these counters are A). Portable across architectures and B). Actually 
> measure what they say they do.

Again, in the common sense approach they actually do what they
say. 

For real experts like you there are still the raw events to get the
real thing which is meaningful for those who understand what 'cycles'
and 'instructions' really mean. Cough, cough....

> I know it is fun to reinvent the wheel, but you ignored decades of 
> experience in dealing with perf-counters when you ran off and invented 
> perf_events.  It will bite you eventually.

Stop this whining already. I thoroughly reviewed the outcome of
"decades of experience" and I still shudder when I get reminded of
that exercise.

Yes, we invented perf_events because the proposed perfmon kernel
patches were an outright horror full of cobbled together experience
dump along with a nice bunch of unfixable security holes, locking
issues and permisson problems plus a completely nonobvious userspace
interface. Short a complete design failure.

So perf_events were not the reinvention of the wheel. It was a sane
design decision to make performance counters available _AND_ useful
for a broad audience and a broad range of use cases.

If the only substantial complaint about perf you can bring up is the
detail of generalized events, then we can agree that we disagree and
stop wasting electrons right now.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-28 22:16                   ` Vince Weaver
  2011-04-28 23:30                     ` Thomas Gleixner
@ 2011-04-29  2:28                     ` Andi Kleen
  2011-04-29 19:32                     ` Ingo Molnar
  2 siblings, 0 replies; 39+ messages in thread
From: Andi Kleen @ 2011-04-29  2:28 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra

> I know it is fun to reinvent the wheel, but you ignored decades of 
> experience in dealing with perf-counters when you ran off and invented 
> perf_events.  It will bite you eventually.

s/eventually//

A good example for that is  perf events counted completely bogus LLC 
generalized cache events for several releases (before the offcore patches 
went in).

And BTW they now completely changed again with Peter's changed, counting
something quite different.

-Andi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-28 22:16                   ` Vince Weaver
  2011-04-28 23:30                     ` Thomas Gleixner
  2011-04-29  2:28                     ` Andi Kleen
@ 2011-04-29 19:32                     ` Ingo Molnar
  2 siblings, 0 replies; 39+ messages in thread
From: Ingo Molnar @ 2011-04-29 19:32 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Lin Ming,
	Arnaldo Carvalho de Melo, Thomas Gleixner, Peter Zijlstra


* Vince Weaver <vweaver1@eecs.utk.edu> wrote:

> On Wed, 27 Apr 2011, Ingo Molnar wrote:
> 
> > Secondly, you are still quite wrong even with your revised opinion. Being able 
> > to type '-e cycles' and '-e instructions' in perf and get ... cycles and 
> > instructions counts/events, and the kernel helping that kind of approach is not 
> > 'abstraction to the extreme', it's called 'common sense'.
> 
> by your logic I should be able to delete a file by saying
>
>   echo "delete /tmp/tempfile" > /dev/sdc1

> because using unlink() is too low of an abstraction and confusing to the 
> user.

Erm, unlink() does not pass magic hexa constants to the disk controller.

unlink() is a high level interface that works across a vast range of disk 
controllers, disks, network mounted filesystems, in-RAM filesystems, in-ROM 
filesystems, clustered filesystems and other mediums.

Just like that we can tell perf to count 'cycles', 'branches' or 
'branch-misses' - all of these are relatively high level concepts (in the scope 
of CPUs) that work across a vast range of CPU types and models.

Similarly, for offcore we want to introduce the concept of 'node local' versus 
'remote' memory - perhaps with some events for inter-CPU traffic as well - 
because that probably covers most of the NUMA related memory profiling needs.

Raw events are to perf what ioctls are to the VFS: small details nobody felt 
worth generalizing. My point in this discussion is that we do not offer new 
filesystems that support *only* ioctl calls ... Is this simple concept so hard 
to understand?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [tip:perf/core] perf, arch: Add generic NODE cache events
  2011-04-22 21:37       ` Peter Zijlstra
  2011-04-22 21:54         ` Peter Zijlstra
  2011-04-23  8:13         ` Ingo Molnar
@ 2011-07-01 15:23         ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-07-01 15:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, anton, hpa, mingo, dengcheng.zhu,
	will.deacon, a.p.zijlstra, peterz, lethal, davem, robert.richter,
	ddaney, tglx, mingo

Commit-ID:  89d6c0b5bdbb1927775584dcf532d98b3efe1477
Gitweb:     http://git.kernel.org/tip/89d6c0b5bdbb1927775584dcf532d98b3efe1477
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 22 Apr 2011 23:37:06 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 1 Jul 2011 11:06:38 +0200

perf, arch: Add generic NODE cache events

Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..

The below needs filling out for !x86 (which I filled out with
unsupported events).

I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.

Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/arm/kernel/perf_event_v6.c        |   28 +++++++++++++++
 arch/arm/kernel/perf_event_v7.c        |   28 +++++++++++++++
 arch/arm/kernel/perf_event_xscale.c    |   14 +++++++
 arch/mips/kernel/perf_event_mipsxx.c   |   28 +++++++++++++++
 arch/powerpc/kernel/e500-pmu.c         |    5 +++
 arch/powerpc/kernel/mpc7450-pmu.c      |    5 +++
 arch/powerpc/kernel/power4-pmu.c       |    5 +++
 arch/powerpc/kernel/power5+-pmu.c      |    5 +++
 arch/powerpc/kernel/power5-pmu.c       |    5 +++
 arch/powerpc/kernel/power6-pmu.c       |    5 +++
 arch/powerpc/kernel/power7-pmu.c       |    5 +++
 arch/powerpc/kernel/ppc970-pmu.c       |    5 +++
 arch/sh/kernel/cpu/sh4/perf_event.c    |   15 ++++++++
 arch/sh/kernel/cpu/sh4a/perf_event.c   |   15 ++++++++
 arch/sparc/kernel/perf_event.c         |   42 ++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_amd.c   |   14 +++++++
 arch/x86/kernel/cpu/perf_event_intel.c |   59 +++++++++++++++++++++++++++++++-
 arch/x86/kernel/cpu/perf_event_p4.c    |   14 +++++++
 include/linux/perf_event.h             |    3 +-
 19 files changed, 298 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index 38dc4da..dd7f3b9 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -173,6 +173,20 @@ static const unsigned armv6_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 enum armv6mpcore_perf_types {
@@ -310,6 +324,20 @@ static const unsigned armv6mpcore_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]  = CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]    = CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 static inline unsigned long
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 6e5f875..e20ca9c 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -255,6 +255,20 @@ static const unsigned armv7_a8_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
@@ -371,6 +385,20 @@ static const unsigned armv7_a9_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 /*
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 99b6b85..3c43974 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -144,6 +144,20 @@ static const unsigned xscale_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
 		},
 	},
+	[C(NODE)] = {
+		[C(OP_READ)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_WRITE)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+		[C(OP_PREFETCH)] = {
+			[C(RESULT_ACCESS)]	= CACHE_OP_UNSUPPORTED,
+			[C(RESULT_MISS)]	= CACHE_OP_UNSUPPORTED,
+		},
+	},
 };
 
 #define	XSCALE_PMU_ENABLE	0x001
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 75266ff..e5ad09a 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -377,6 +377,20 @@ static const struct mips_perf_event mipsxxcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 /* 74K core has completely different cache event map. */
@@ -480,6 +494,20 @@ static const struct mips_perf_event mipsxx74Kcore_cache_map
 		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_WRITE)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+	[C(OP_PREFETCH)] = {
+		[C(RESULT_ACCESS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+		[C(RESULT_MISS)]	= { UNSUPPORTED_PERF_EVENT_ID },
+	},
+},
 };
 
 #ifdef CONFIG_MIPS_MT_SMP
diff --git a/arch/powerpc/kernel/e500-pmu.c b/arch/powerpc/kernel/e500-pmu.c
index b150b51..cb2e294 100644
--- a/arch/powerpc/kernel/e500-pmu.c
+++ b/arch/powerpc/kernel/e500-pmu.c
@@ -75,6 +75,11 @@ static int e500_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1 	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static int num_events = 128;
diff --git a/arch/powerpc/kernel/mpc7450-pmu.c b/arch/powerpc/kernel/mpc7450-pmu.c
index 2cc5e03..845a584 100644
--- a/arch/powerpc/kernel/mpc7450-pmu.c
+++ b/arch/powerpc/kernel/mpc7450-pmu.c
@@ -388,6 +388,11 @@ static int mpc7450_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 struct power_pmu mpc7450_pmu = {
diff --git a/arch/powerpc/kernel/power4-pmu.c b/arch/powerpc/kernel/power4-pmu.c
index ead8b3c..e9dbc2d 100644
--- a/arch/powerpc/kernel/power4-pmu.c
+++ b/arch/powerpc/kernel/power4-pmu.c
@@ -587,6 +587,11 @@ static int power4_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power4_pmu = {
diff --git a/arch/powerpc/kernel/power5+-pmu.c b/arch/powerpc/kernel/power5+-pmu.c
index eca0ac5..f58a2bd 100644
--- a/arch/powerpc/kernel/power5+-pmu.c
+++ b/arch/powerpc/kernel/power5+-pmu.c
@@ -653,6 +653,11 @@ static int power5p_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5p_pmu = {
diff --git a/arch/powerpc/kernel/power5-pmu.c b/arch/powerpc/kernel/power5-pmu.c
index d5ff0f6..b1acab6 100644
--- a/arch/powerpc/kernel/power5-pmu.c
+++ b/arch/powerpc/kernel/power5-pmu.c
@@ -595,6 +595,11 @@ static int power5_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power5_pmu = {
diff --git a/arch/powerpc/kernel/power6-pmu.c b/arch/powerpc/kernel/power6-pmu.c
index 3160392..b24a3a2 100644
--- a/arch/powerpc/kernel/power6-pmu.c
+++ b/arch/powerpc/kernel/power6-pmu.c
@@ -516,6 +516,11 @@ static int power6_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1		},
 		[C(OP_PREFETCH)] = {	-1,		-1		},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1		},
+		[C(OP_WRITE)] = {	-1,		-1		},
+		[C(OP_PREFETCH)] = {	-1,		-1		},
+	},
 };
 
 static struct power_pmu power6_pmu = {
diff --git a/arch/powerpc/kernel/power7-pmu.c b/arch/powerpc/kernel/power7-pmu.c
index 593740f..6d9dccb 100644
--- a/arch/powerpc/kernel/power7-pmu.c
+++ b/arch/powerpc/kernel/power7-pmu.c
@@ -342,6 +342,11 @@ static int power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu power7_pmu = {
diff --git a/arch/powerpc/kernel/ppc970-pmu.c b/arch/powerpc/kernel/ppc970-pmu.c
index 9a6e093..b121de9 100644
--- a/arch/powerpc/kernel/ppc970-pmu.c
+++ b/arch/powerpc/kernel/ppc970-pmu.c
@@ -467,6 +467,11 @@ static int ppc970_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 		[C(OP_WRITE)] = {	-1,		-1	},
 		[C(OP_PREFETCH)] = {	-1,		-1	},
 	},
+	[C(NODE)] = {		/* 	RESULT_ACCESS	RESULT_MISS */
+		[C(OP_READ)] = {	-1,		-1	},
+		[C(OP_WRITE)] = {	-1,		-1	},
+		[C(OP_PREFETCH)] = {	-1,		-1	},
+	},
 };
 
 static struct power_pmu ppc970_pmu = {
diff --git a/arch/sh/kernel/cpu/sh4/perf_event.c b/arch/sh/kernel/cpu/sh4/perf_event.c
index 748955d..fa4f724 100644
--- a/arch/sh/kernel/cpu/sh4/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4/perf_event.c
@@ -180,6 +180,21 @@ static const int sh7750_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh7750_event_map(int event)
diff --git a/arch/sh/kernel/cpu/sh4a/perf_event.c b/arch/sh/kernel/cpu/sh4a/perf_event.c
index 17e6beb..84a2c39 100644
--- a/arch/sh/kernel/cpu/sh4a/perf_event.c
+++ b/arch/sh/kernel/cpu/sh4a/perf_event.c
@@ -205,6 +205,21 @@ static const int sh4a_cache_events
 			[ C(RESULT_MISS)   ] = -1,
 		},
 	},
+
+	[ C(NODE) ] = {
+		[ C(OP_READ) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_WRITE) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+		[ C(OP_PREFETCH) ] = {
+			[ C(RESULT_ACCESS) ] = -1,
+			[ C(RESULT_MISS)   ] = -1,
+		},
+	},
 };
 
 static int sh4a_event_map(int event)
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 0b32f2e..62a0343 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -246,6 +246,20 @@ static const cache_map_t ultra3_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu ultra3_pmu = {
@@ -361,6 +375,20 @@ static const cache_map_t niagara1_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara1_pmu = {
@@ -473,6 +501,20 @@ static const cache_map_t niagara2_cache_map = {
 		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
 	},
 },
+[C(NODE)] = {
+	[C(OP_READ)] = {
+		[C(RESULT_ACCESS)] = { CACHE_OP_UNSUPPORTED },
+		[C(RESULT_MISS)  ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = { CACHE_OP_UNSUPPORTED },
+		[ C(RESULT_MISS)   ] = { CACHE_OP_UNSUPPORTED },
+	},
+},
 };
 
 static const struct sparc_pmu niagara2_pmu = {
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index fe29c1d..941caa2 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -89,6 +89,20 @@ static __initconst const u64 amd_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0xb8e9, /* CPU Request to Memory, l+r */
+		[ C(RESULT_MISS)   ] = 0x98e9, /* CPU Request to Memory, r   */
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 5c44862..bf6f92f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -226,6 +226,21 @@ static __initconst const u64 snb_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
+
 };
 
 static __initconst const u64 westmere_hw_cache_event_ids
@@ -327,6 +342,20 @@ static __initconst const u64 westmere_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 /*
@@ -379,7 +408,21 @@ static __initconst const u64 nehalem_hw_cache_extra_regs
 		[ C(RESULT_ACCESS) ] = NHM_DMND_PREFETCH|NHM_L3_ACCESS,
 		[ C(RESULT_MISS)   ] = NHM_DMND_PREFETCH|NHM_L3_MISS,
 	},
- }
+ },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_READ|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_READ|NHM_REMOTE_DRAM,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_WRITE|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_WRITE|NHM_REMOTE_DRAM,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = NHM_DMND_PREFETCH|NHM_ALL_DRAM,
+		[ C(RESULT_MISS)   ] = NHM_DMND_PREFETCH|NHM_REMOTE_DRAM,
+	},
+ },
 };
 
 static __initconst const u64 nehalem_hw_cache_event_ids
@@ -481,6 +524,20 @@ static __initconst const u64 nehalem_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = 0x01b7,
+		[ C(RESULT_MISS)   ] = 0x01b7,
+	},
+ },
 };
 
 static __initconst const u64 core2_hw_cache_event_ids
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index d6e6a67..fb901c5 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -554,6 +554,20 @@ static __initconst const u64 p4_hw_cache_event_ids
 		[ C(RESULT_MISS)   ] = -1,
 	},
  },
+ [ C(NODE) ] = {
+	[ C(OP_READ) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_WRITE) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+	[ C(OP_PREFETCH) ] = {
+		[ C(RESULT_ACCESS) ] = -1,
+		[ C(RESULT_MISS)   ] = -1,
+	},
+ },
 };
 
 static u64 p4_general_events[PERF_COUNT_HW_MAX] = {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 069315e..a5f54b9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -61,7 +61,7 @@ enum perf_hw_id {
 /*
  * Generalized hardware cache events:
  *
- *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU } x
+ *       { L1-D, L1-I, LLC, ITLB, DTLB, BPU, NODE } x
  *       { read, write, prefetch } x
  *       { accesses, misses }
  */
@@ -72,6 +72,7 @@ enum perf_hw_cache_id {
 	PERF_COUNT_HW_CACHE_DTLB		= 3,
 	PERF_COUNT_HW_CACHE_ITLB		= 4,
 	PERF_COUNT_HW_CACHE_BPU			= 5,
+	PERF_COUNT_HW_CACHE_NODE		= 6,
 
 	PERF_COUNT_HW_CACHE_MAX,		/* non-ABI */
 };

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  9:23 ` Ingo Molnar
@ 2011-04-22  9:41   ` Stephane Eranian
  0 siblings, 0 replies; 39+ messages in thread
From: Stephane Eranian @ 2011-04-22  9:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Peter Zijlstra, eranian, Arun Sharma

On Fri, Apr 22, 2011 at 11:23 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Stephane Eranian <eranian@google.com> wrote:
>
>> On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> >> This needs to be a *lot* more user friendly. Users do not want to type in
>> >> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> >> era really.
>> >>
>> >> Unless there's proper generalized and human usable support i'm leaning
>> >> towards turning off the offcore user-space accessible raw bits for now, and
>> >> use them only kernel-internally, for the cache events.
>>
>> Generic cache events are a myth. They are not usable. I keep getting
>> questions from users because nobody knows what they are actually counting,
>> thus nobody knows how to interpret the counts. You cannot really hide the
>> micro-architecture if you want to make any sensible measurements.
>
> Well:
>
>  aldebaran:~> perf stat --repeat 10 -e instructions -e L1-dcache-loads -e L1-dcache-load-misses -e LLC-misses ./hackbench 10
>  Time: 0.125
>  Time: 0.136
>  Time: 0.180
>  Time: 0.103
>  Time: 0.097
>  Time: 0.125
>  Time: 0.104
>  Time: 0.125
>  Time: 0.114
>  Time: 0.158
>
>  Performance counter stats for './hackbench 10' (10 runs):
>
>     2,102,556,398 instructions             #      0.000 IPC     ( +-   1.179% )
>       843,957,634 L1-dcache-loads            ( +-   1.295% )
>       130,007,361 L1-dcache-load-misses      ( +-   3.281% )
>         6,328,938 LLC-misses                 ( +-   3.969% )
>
>        0.146160287  seconds time elapsed   ( +-   5.851% )
>
> It's certainly useful if you want to get ballpark figures about cache behavior
> of an app and want to do comparisons.
>
What can you conclude from the above counts?
Are they good or bad? If they are bad, how do you go about fixing the app?

> There are inconsistencies in our generic cache events - but that's not really a
> reason to obcure their usage behind nonsensical microarchitecture-specific
> details.
>
The actual events are a reflection of the micro-architecture. They indirectly
describe how it works. It is not clear to me that you can really improve your
app without some exposure to the micro-architecture.

So if you want to have generic events, I am fine with this, but you should not
block access to actual events pretending they are useless. Some people are
certainly interested in using them and learning about the micro-architecture
of their processor.


> But i'm definitely in favor of making these generalized events more consistent
> across different CPU types. Can you list examples of inconsistencies that we
> should resolve? (and which you possibly consider impossible to resolve, right?)
>
To make generic events more uniform across processors, one would have to have
precise definitions as to what they are supposed to count. Once you
have that, then
we may have a better chance at finding consistent mappings for each processor.
I have not yet seen such definitions.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
  2011-04-22  8:47 Stephane Eranian
@ 2011-04-22  9:23 ` Ingo Molnar
  2011-04-22  9:41   ` Stephane Eranian
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2011-04-22  9:23 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Peter Zijlstra, eranian, Arun Sharma


* Stephane Eranian <eranian@google.com> wrote:

> On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * Ingo Molnar <mingo@elte.hu> wrote:
> >
> >> This needs to be a *lot* more user friendly. Users do not want to type in
> >> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
> >> era really.
> >>
> >> Unless there's proper generalized and human usable support i'm leaning
> >> towards turning off the offcore user-space accessible raw bits for now, and
> >> use them only kernel-internally, for the cache events.
>
> Generic cache events are a myth. They are not usable. I keep getting 
> questions from users because nobody knows what they are actually counting, 
> thus nobody knows how to interpret the counts. You cannot really hide the 
> micro-architecture if you want to make any sensible measurements.

Well:

 aldebaran:~> perf stat --repeat 10 -e instructions -e L1-dcache-loads -e L1-dcache-load-misses -e LLC-misses ./hackbench 10
 Time: 0.125
 Time: 0.136
 Time: 0.180
 Time: 0.103
 Time: 0.097
 Time: 0.125
 Time: 0.104
 Time: 0.125
 Time: 0.114
 Time: 0.158

 Performance counter stats for './hackbench 10' (10 runs):

     2,102,556,398 instructions             #      0.000 IPC     ( +-   1.179% )
       843,957,634 L1-dcache-loads            ( +-   1.295% )
       130,007,361 L1-dcache-load-misses      ( +-   3.281% )
         6,328,938 LLC-misses                 ( +-   3.969% )

        0.146160287  seconds time elapsed   ( +-   5.851% )

It's certainly useful if you want to get ballpark figures about cache behavior 
of an app and want to do comparisons.

There are inconsistencies in our generic cache events - but that's not really a 
reason to obcure their usage behind nonsensical microarchitecture-specific 
details.

But i'm definitely in favor of making these generalized events more consistent 
across different CPU types. Can you list examples of inconsistencies that we 
should resolve? (and which you possibly consider impossible to resolve, right?)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/1] perf tools: Add missing user space support for config1/config2
@ 2011-04-22  8:47 Stephane Eranian
  2011-04-22  9:23 ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: Stephane Eranian @ 2011-04-22  8:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Andi Kleen,
	Peter Zijlstra, Lin Ming, Arnaldo Carvalho de Melo,
	Thomas Gleixner, Peter Zijlstra, eranian, Arun Sharma

On Fri, Apr 22, 2011 at 10:06 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
>> This needs to be a *lot* more user friendly. Users do not want to type in
>> stupid hexa magic numbers to get profiling. We have moved beyond the oprofile
>> era really.
>>
>> Unless there's proper generalized and human usable support i'm leaning
>> towards turning off the offcore user-space accessible raw bits for now, and
>> use them only kernel-internally, for the cache events.
>
Generic cache events are a myth. They are not usable. I keep getting questions
from users because nobody knows what they are actually counting, thus nobody
knows how to interpret the counts. You cannot really hide the micro-architecture
if you want to make any sensible measurements.

I agree with the poor usability of perf when you have to pass hex
values for events.
But that's why I have a user level library to map event strings to
event codes for perf.
Arun Sharma posted a patch a while ago to connect this library with perf, so far
it's been ignored, it seems:
    perf stat -e offcore_response_0:dmd_data_rd foo


> I'm about to push out the patch attached below - it lays out the arguments in
> detail. I don't think we have time to fix this properly for .39 - but memory
> profiling could be a nice feature for v2.6.40.
>
You will not be able to do any reasonable memory profiling using
offcore response
events. Dont' expect a profile to point to the missing loads. If
you're lucky it would
point to the use instruction.


> --------------------->
> From b52c55c6a25e4515b5e075a989ff346fc251ed09 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@elte.hu>
> Date: Fri, 22 Apr 2011 08:44:38 +0200
> Subject: [PATCH] x86, perf event: Turn off unstructured raw event access to offcore registers
>
> Andi Kleen pointed out that the Intel offcore support patches were merged
> without user-space tool support to the functionality:
>
>  |
>  | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
>  | user space bits were not. This made it impossible to set the extra mask
>  | and actually do the OFFCORE profiling
>  |
>
> Andi submitted a preliminary patch for user-space support, as an
> extension to perf's raw event syntax:
>
>  |
>  | Some raw events -- like the Intel OFFCORE events -- support additional
>  | parameters. These can be appended after a ':'.
>  |
>  | For example on a multi socket Intel Nehalem:
>  |
>  |    perf stat -e r1b7:20ff -a sleep 1
>  |
>  | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
>  | that measures any access to DRAM on another socket.
>  |
>
> But this kind of usability is absolutely unacceptable - users should not
> be expected to type in magic, CPU and model specific incantations to get
> access to useful hardware functionality.
>
> The proper solution is to expose useful offcore functionality via
> generalized events - that way users do not have to care which specific
> CPU model they are using, they can use the conceptual event and not some
> model specific quirky hexa number.
>
> We already have such generalization in place for CPU cache events,
> and it's all very extensible.
>
> "Offcore" events measure general DRAM access patters along various
> parameters. They are particularly useful in NUMA systems.
>
> We want to support them via generalized DRAM events: either as the
> fourth level of cache (after the last-level cache), or as a separate
> generalization category.
>
> That way user-space support would be very obvious, memory access
> profiling could be done via self-explanatory commands like:
>
>  perf record -e dram ./myapp
>  perf record -e dram-remote ./myapp
>
> ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
> accesses.
>
> These generalized events would work on all CPUs and architectures that
> have comparable PMU features.
>
> ( Note, these are just examples: actual implementation could have more
>  sophistication and more parameter - as long as they center around
>  similarly simple usecases. )
>
> Now we do not want to revert *all* of the current offcore bits, as they
> are still somewhat useful for generic last-level-cache events, implemented
> in this commit:
>
>  e994d7d23a0b: perf: Fix LLC-* events on Intel Nehalem/Westmere
>
> But we definitely do not yet want to expose the unstructured raw events
> to user-space, until better generalization and usability is implemented
> for these hardware event features.
>
> ( Note: after generalization has been implemented raw offcore events can be
>  supported as well: there can always be an odd event that is marginally
>  useful but not useful enough to generalize. DRAM profiling is definitely
>  *not* such a category so generalization must be done first. )
>
> Furthermore, PERF_TYPE_RAW access to these registers was not intended
> to go upstream without proper support - it was a side-effect of the above
> e994d7d23a0b commit, not mentioned in the changelog.
>
> As v2.6.39 is nearing release we go for the simplest approach: disable
> the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
> kernel and becomes an ABI.
>
> Once proper structure is implemented for these hardware events and users
> are offered usable solutions we can revisit this issue.
>
> Reported-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/cpu/perf_event.c |    6 +++++-
>  1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index eed3673a..632e5dc 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -586,8 +586,12 @@ static int x86_setup_perfctr(struct perf_event *event)
>                        return -EOPNOTSUPP;
>        }
>
> +       /*
> +        * Do not allow config1 (extended registers) to propagate,
> +        * there's no sane user-space generalization yet:
> +        */
>        if (attr->type == PERF_TYPE_RAW)
> -               return x86_pmu_extra_regs(event->attr.config, event);
> +               return 0;
>
>        if (attr->type == PERF_TYPE_HW_CACHE)
>                return set_ext_hw_attr(hwc, event);
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2011-07-01 15:24 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-21 17:41 [GIT PULL 0/1] perf/urgent Fix missing support for config1/config2 Arnaldo Carvalho de Melo
2011-04-21 17:41 ` [PATCH 1/1] perf tools: Add missing user space " Arnaldo Carvalho de Melo
2011-04-22  6:34   ` Ingo Molnar
2011-04-22  8:06     ` Ingo Molnar
2011-04-22 21:37       ` Peter Zijlstra
2011-04-22 21:54         ` Peter Zijlstra
2011-04-22 22:19           ` Peter Zijlstra
2011-04-22 23:54             ` Andi Kleen
2011-04-23  7:49               ` Peter Zijlstra
2011-04-22 22:57           ` Peter Zijlstra
2011-04-23  0:00             ` Andi Kleen
2011-04-23  7:50               ` Peter Zijlstra
2011-04-23  8:13         ` Ingo Molnar
2011-07-01 15:23         ` [tip:perf/core] perf, arch: Add generic NODE cache events tip-bot for Peter Zijlstra
2011-04-25 17:12       ` [PATCH 1/1] perf tools: Add missing user space support for config1/config2 Vince Weaver
2011-04-25 17:54         ` Ingo Molnar
2011-04-25 21:46           ` Vince Weaver
2011-04-25 22:12             ` Andi Kleen
2011-04-26  7:23               ` Ingo Molnar
2011-04-26  7:38             ` Ingo Molnar
2011-04-26 20:51               ` Vince Weaver
2011-04-27  6:52                 ` Ingo Molnar
2011-04-28 22:16                   ` Vince Weaver
2011-04-28 23:30                     ` Thomas Gleixner
2011-04-29  2:28                     ` Andi Kleen
2011-04-29 19:32                     ` Ingo Molnar
2011-04-26  9:49             ` Peter Zijlstra
2011-04-26  9:25         ` Peter Zijlstra
2011-04-26 20:33           ` Vince Weaver
2011-04-26 21:19             ` Cyrill Gorcunov
2011-04-26 21:25               ` Don Zickus
2011-04-26 21:33                 ` Cyrill Gorcunov
2011-04-27  6:43             ` Ingo Molnar
2011-04-28 22:10               ` Vince Weaver
2011-04-22 16:22     ` Andi Kleen
2011-04-22 19:54       ` Ingo Molnar
2011-04-22  8:47 Stephane Eranian
2011-04-22  9:23 ` Ingo Molnar
2011-04-22  9:41   ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).