All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/16] perf: add memory access sampling support
@ 2012-11-05 13:50 Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 01/16] perf/x86: improve sysfs event mapping with event string Stephane Eranian
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch series had a new feature to the kernel perf_events
interface and corresponding user level tool, perf.

With this patch, it is possible to sample (not trace) memory
accesses (load, store). For loads, the instruction and data
addresses are captured along with the latency and data source.
For stores, the instruction and data addresses are capture
along with limited cache and TLB information.

For load data source, the memory hierarchy level, the tlb, snoop
and lock information is captured.

Although the perf_event interface is extended in a generic manner,
sampling memory accesses requires HW support. The current patches
implement the feature on Intel processors starting with Nehalem.
The patches leverage the PEBS Load Latency and Precise Store
mechanisms. Precise Store is present only on Sandy Bridge and
Ivy Bridge based processors.

The perf tool is extended to make capturing and analyzing the
data easier with a new command: perf mem.

$ perf mem -t load rec triad
$ perf mem -t load rep --stdio
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead      Samples     Cost  Memory access Symbol      Shared Obj  Data Symbol             Data Object Snoop  TLB access   Locked
# ........  ...........  .......  ............. ..........  ........... ......................  ........... ...... ............ ......
#
     0.10%            1      986  LFB hit       [.] triad   triad       [.] 0x00007f67dffe8038  [unknown]    None  L1 or L2 hit  No    
     0.09%            1      890  LFB hit       [.] triad   triad       [.] 0x00007f67df91a750  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      826  LFB hit       [.] triad   triad       [.] 0x00007f67e288fba8  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      825  LFB hit       [.] triad   triad       [.] 0x00007f67dea28c80  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      787  LFB hit       [.] triad   triad       [.] 0x00007f67df055a60  [unknown]    None  L1 or L2 hit  No    

The perf mem command is a wrapper around perf record/report. It passes the
right options to the report and record commands. Note that the TUI mode is
supported.

One powerful feature of perf is that users can toy with sort order to display
the information in different format or from a different angle. This is particularly
useful with memory sampling:

$ perf mem -t load rep --sort=mem
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : mem
#
# Overhead      Samples             Memory access
# ........  ...........  ........................
#
    85.26%        10633  LFB hit                 
     7.35%         8151  L1 hit                  
     3.13%          383  L3 hit                  
     3.09%          195  Local RAM hit           
     1.16%          259  L2 hit                  
     0.00%            4  Uncached hit            

Or if one is interested in the data view:
$ perf mem -t load rep --sort=symbol_daddr,cost
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : symbol_daddr,cost
#
# Overhead      Samples             Data Symbol     Cost
# ........  ...........  ......................  .......
#
     0.10%            1  [.] 0x00007f67dffe8038      986
     0.09%            1  [.] 0x00007f67df91a750      890
     0.08%            1  [.] 0x00007f67e288fba8      826


One note on the cost displayed: On Intel processors with PEBS Load Latency, as described
in the SDM, the cost encompasses the number of cycles from dispatch to Globally Observable
(GO) state. That means, that it includes OOO execution. It is not usual to see L1D Hits
with a cost of > 100 cycles. Always look at the memory level for an approximation of the
access penalty, then interpret the cost value accordingly.

There is no cost associated with stores.

CAVEAT: Note that the data addresses are not resolved correctly currently due to a
problem in perf data symbol resolution code which I have not been able to
uncover so far.

In v2, we leverage some of Andi Kleen's Haswell patches, namely the weighted
samples and perf tool event parser fixes. We  also introduce PERF_RECORD_MISC_DATA_MMAP
to tag mmaps for data vs. code. This helps the perf tool distinguish data. vs. code
mmaps (and therefore symbols). We have also integrated the feedback from v1. Note that in
v2 data symbol resolution is not yet fully operational, but there is a slight improvement.


Enjoy,

Signed-off-by: Stephane Eranian <eranian@google.com>


Andi Kleen (2):
  perf, core: Add a concept of a weightened sample
  perf, tools: Add arbitary aliases and support names with -

Stephane Eranian (14):
  perf/x86: improve sysfs event mapping with event string
  perf/x86: add flags to event constraints
  perf: add minimal support for PERF_SAMPLE_WEIGHT
  perf: add support for PERF_SAMPLE_ADDR in dump_sampple()
  perf: add generic memory sampling interface
  perf/x86: add memory profiling via PEBS Load Latency
  perf/x86: export PEBS load latency threshold register to sysfs
  perf/x86: add support for PEBS Precise Store
  perf tools: add mem access sampling core support
  perf report: add support for mem access profiling
  perf record: add support for mem access profiling
  perf tools: add new mem command for memory access profiling
  perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP
  perf tools: detect data vs. text mappings

 arch/x86/include/asm/msr-index.h              |    1 +
 arch/x86/kernel/cpu/perf_event.c              |   33 +--
 arch/x86/kernel/cpu/perf_event.h              |   61 ++++-
 arch/x86/kernel/cpu/perf_event_intel.c        |   66 ++++-
 arch/x86/kernel/cpu/perf_event_intel_ds.c     |  182 +++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 include/linux/perf_event.h                    |    5 +
 include/uapi/linux/perf_event.h               |   74 +++++-
 kernel/events/core.c                          |   15 ++
 tools/perf/Documentation/perf-mem.txt         |   48 ++++
 tools/perf/Makefile                           |    1 +
 tools/perf/builtin-mem.c                      |  238 ++++++++++++++++++
 tools/perf/builtin-record.c                   |    2 +
 tools/perf/builtin-report.c                   |  131 +++++++++-
 tools/perf/builtin.h                          |    1 +
 tools/perf/command-list.txt                   |    1 +
 tools/perf/perf.c                             |    1 +
 tools/perf/perf.h                             |    1 +
 tools/perf/util/event.h                       |    2 +
 tools/perf/util/evsel.c                       |   15 ++
 tools/perf/util/hist.c                        |   66 ++++-
 tools/perf/util/hist.h                        |   13 +
 tools/perf/util/machine.c                     |   10 +-
 tools/perf/util/parse-events.l                |    2 +
 tools/perf/util/session.c                     |   44 ++++
 tools/perf/util/session.h                     |    4 +
 tools/perf/util/sort.c                        |  324 ++++++++++++++++++++++++-
 tools/perf/util/sort.h                        |   10 +-
 tools/perf/util/symbol.h                      |    7 +
 29 files changed, 1312 insertions(+), 48 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-mem.txt
 create mode 100644 tools/perf/builtin-mem.c

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 01/16] perf/x86: improve sysfs event mapping with event string
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 02/16] perf/x86: add flags to event constraints Stephane Eranian
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c |   28 +++++++++++++---------------
 arch/x86/kernel/cpu/perf_event.h |   26 ++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..aeca46d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1316,20 +1316,22 @@ static struct attribute_group x86_pmu_format_group = {
 	.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-	struct device_attribute attr;
-	u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
  */
 static void __init filter_events(struct attribute **attrs)
 {
+	struct device_attribute *d;
+	struct perf_pmu_events_attr *pmu_attr;
 	int i, j;
 
 	for (i = 0; attrs[i]; i++) {
+		d = (struct device_attribute *)attrs[i];
+		pmu_attr = container_of(d, struct perf_pmu_events_attr, attr);
+		/* str trumps id */
+		if (pmu_attr->event_str)
+			continue;
 		if (x86_pmu.event_map(i))
 			continue;
 
@@ -1341,24 +1343,20 @@ static void __init filter_events(struct attribute **attrs)
 	}
 }
 
-static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 			  char *page)
 {
 	struct perf_pmu_events_attr *pmu_attr = \
 		container_of(attr, struct perf_pmu_events_attr, attr);
 
 	u64 config = x86_pmu.event_map(pmu_attr->id);
-	return x86_pmu.events_sysfs_show(page, config);
-}
 
-#define EVENT_VAR(_id)  event_attr_##_id
-#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+	/* string trumps id */
+	if (pmu_attr->event_str)
+		return sprintf(page, "%s", pmu_attr->event_str);
 
-#define EVENT_ATTR(_name, _id)					\
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {		\
-	.attr = __ATTR(_name, 0444, events_sysfs_show, NULL),	\
-	.id   =  PERF_COUNT_HW_##_id,				\
-};
+	return x86_pmu.events_sysfs_show(page, config);
+}
 
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 115c1ea..41479da 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -419,6 +419,29 @@ do {									\
 #define ERF_NO_HT_SHARING	1
 #define ERF_HAS_RSP_1		2
 
+#define EVENT_VAR(_id)  event_attr_##_id
+#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+
+#define EVENT_ATTR(_name, _id)					\
+static struct perf_pmu_events_attr EVENT_VAR(_id) = {		\
+	.attr = __ATTR(_name, 0444, events_sysfs_show, NULL),	\
+	.id   =  PERF_COUNT_HW_##_id,				\
+	.event_str = NULL,					\
+};
+
+#define EVENT_ATTR_STR(_name, v, str)				  \
+static struct perf_pmu_events_attr event_attr_##v = {		  \
+	.attr      = __ATTR(_name, 0444, events_sysfs_show, NULL),\
+	.id        =  0,					  \
+	.event_str =  str,					  \
+};
+
+struct perf_pmu_events_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str;
+};
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
@@ -633,6 +656,9 @@ int p6_pmu_init(void);
 
 int knc_pmu_init(void);
 
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 02/16] perf/x86: add flags to event constraints
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 01/16] perf/x86: improve sysfs event mapping with event string Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 03/16] perf, core: Add a concept of a weightened sample Stephane Eranian
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.

The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.

This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c              |    2 +-
 arch/x86/kernel/cpu/perf_event.h              |    8 +++++---
 arch/x86/kernel/cpu/perf_event_intel.c        |    5 ++++-
 arch/x86/kernel/cpu/perf_event_intel_ds.c     |    4 +++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 include/linux/perf_event.h                    |    1 +
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index aeca46d..80dd390 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1470,7 +1470,7 @@ static int __init init_hw_perf_events(void)
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-				   0, x86_pmu.num_counters, 0);
+				   0, x86_pmu.num_counters, 0, 0);
 
 	x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
 	x86_pmu_format_group.attrs = x86_pmu.format_attrs;
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 41479da..f78d6e8 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -59,6 +59,7 @@ struct event_constraint {
 	u64	cmask;
 	int	weight;
 	int	overlap;
+	int	flags;
 };
 
 struct amd_nb {
@@ -170,16 +171,17 @@ struct cpu_hw_events {
 	void				*kfree_on_online;
 };
 
-#define __EVENT_CONSTRAINT(c, n, m, w, o) {\
+#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
 	{ .idxmsk64 = (n) },		\
 	.code = (c),			\
 	.cmask = (m),			\
 	.weight = (w),			\
 	.overlap = (o),			\
+	.flags = f,			\
 }
 
 #define EVENT_CONSTRAINT(c, n, m)	\
-	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0)
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
 
 /*
  * The overlap flag marks event constraints with overlapping counter
@@ -203,7 +205,7 @@ struct cpu_hw_events {
  * and its counter masks must be kept at a minimum.
  */
 #define EVENT_CONSTRAINT_OVERLAP(c, n, m)	\
-	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1)
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1, 0)
 
 /*
  * Constraint on the Event code.
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 93b9e11..57d6527 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1367,8 +1367,10 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 
 	if (x86_pmu.event_constraints) {
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
-			if ((event->hw.config & c->cmask) == c->code)
+			if ((event->hw.config & c->cmask) == c->code) {
+				event->hw.flags |= c->flags;
 				return c;
+			}
 		}
 	}
 
@@ -1413,6 +1415,7 @@ intel_put_shared_regs_event_constraints(struct cpu_hw_events *cpuc,
 static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
 					struct perf_event *event)
 {
+	event->hw.flags = 0;
 	intel_put_shared_regs_event_constraints(cpuc, event);
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..f30d85b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -430,8 +430,10 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 
 	if (x86_pmu.pebs_constraints) {
 		for_each_event_constraint(c, x86_pmu.pebs_constraints) {
-			if ((event->hw.config & c->cmask) == c->code)
+			if ((event->hw.config & c->cmask) == c->code) {
+				event->hw.flags |= c->flags;
 				return c;
+			}
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 3cf3d97..cc39e64 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2438,7 +2438,7 @@ static int __init uncore_type_init(struct intel_uncore_type *type)
 
 	type->unconstrainted = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-				0, type->num_counters, 0);
+				0, type->num_counters, 0, 0);
 
 	for (i = 0; i < type->num_boxes; i++) {
 		pmus[i].func_id = -1;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..484cfbc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -128,6 +128,7 @@ struct hw_perf_event {
 			int		event_base_rdpmc;
 			int		idx;
 			int		last_cpu;
+			int		flags;
 
 			struct hw_perf_event_extra extra_reg;
 			struct hw_perf_event_extra branch_reg;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 03/16] perf, core: Add a concept of a weightened sample
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 01/16] perf/x86: improve sysfs event mapping with event string Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 02/16] perf/x86: add flags to event constraints Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 20:01   ` Arnaldo Carvalho de Melo
  2012-11-05 13:50 ` [PATCH v2 04/16] perf: add minimal support for PERF_SAMPLE_WEIGHT Stephane Eranian
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

From: Andi Kleen <ak@linux.intel.com>

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was.  This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms)  The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/perf_event.h      |    2 ++
 include/uapi/linux/perf_event.h |    8 ++++++--
 kernel/events/core.c            |    6 ++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 484cfbc..bb2429d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -584,6 +584,7 @@ struct perf_sample_data {
 	struct perf_branch_stack	*br_stack;
 	struct perf_regs_user		regs_user;
 	u64				stack_user_size;
+	u64				weight;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -597,6 +598,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
+	data->weight = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..c52caab 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_BRANCH_STACK		= 1U << 11,
 	PERF_SAMPLE_REGS_USER			= 1U << 12,
 	PERF_SAMPLE_STACK_USER			= 1U << 13,
+	PERF_SAMPLE_WEIGHT			= 1U << 14,
+
+	PERF_SAMPLE_MAX = 1U << 15,		/* non-ABI */
 
-	PERF_SAMPLE_MAX = 1U << 14,		/* non-ABI */
 };
 
 /*
@@ -198,8 +200,9 @@ enum perf_event_read_format {
 	PERF_FORMAT_TOTAL_TIME_RUNNING		= 1U << 1,
 	PERF_FORMAT_ID				= 1U << 2,
 	PERF_FORMAT_GROUP			= 1U << 3,
+	PERF_FORMAT_WEIGHT			= 1U << 4,
 
-	PERF_FORMAT_MAX = 1U << 4,		/* non-ABI */
+	PERF_FORMAT_MAX = 1U << 5,		/* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER0	64	/* sizeof first published struct */
@@ -559,6 +562,7 @@ enum perf_event_type {
 	 *	{ u64			stream_id;} && PERF_SAMPLE_STREAM_ID
 	 *	{ u32			cpu, res; } && PERF_SAMPLE_CPU
 	 *	{ u64			period;   } && PERF_SAMPLE_PERIOD
+	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
 	 *
 	 *	{ struct read_format	values;	  } && PERF_SAMPLE_READ
 	 *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..d633581 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_PERIOD)
 		size += sizeof(data->period);
 
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		size += sizeof(data->weight);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		size += event->read_size;
 
@@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PERIOD)
 		perf_output_put(handle, data->period);
 
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		perf_output_put(handle, data->weight);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		perf_output_read(handle, event);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 04/16] perf: add minimal support for PERF_SAMPLE_WEIGHT
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (2 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 03/16] perf, core: Add a concept of a weightened sample Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 05/16] perf, tools: Add arbitary aliases and support names with - Stephane Eranian
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Ensure we grab the weight from  raw sample struct
and that we can dump it via perf report -D.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/event.h   |    1 +
 tools/perf/util/evsel.c   |    5 +++++
 tools/perf/util/session.c |    3 +++
 3 files changed, 9 insertions(+)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 0d573ff..cf52977 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -90,6 +90,7 @@ struct perf_sample {
 	u64 period;
 	u32 cpu;
 	u32 raw_size;
+	u64 weight;
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 618d411..b6f3697 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1020,6 +1020,11 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		}
 	}
 
+	if (type & PERF_SAMPLE_WEIGHT) {
+		data->weight= *array;
+		array++;
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15abe40..962c04f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1006,6 +1006,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_STACK_USER)
 		stack_user__printf(&sample->user_stack);
+
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		printf(" ... weight: %"PRIu64"\n", sample->weight);
 }
 
 static struct machine *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 05/16] perf, tools: Add arbitary aliases and support names with -
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (3 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 04/16] perf: add minimal support for PERF_SAMPLE_WEIGHT Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-14  7:34   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2012-11-05 13:50 ` [PATCH v2 06/16] perf: add support for PERF_SAMPLE_ADDR in dump_sampple() Stephane Eranian
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

From: Andi Kleen <ak@linux.intel.com>

- Add missing scanner symbol for arbitrary aliases inside the config
region.
- looks nicer than _, so allow - in the event names. Used for various
of the arch perfmon and Haswell events.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.l |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..66959fa 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,6 +81,7 @@ num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
 name		[a-zA-Z_*?][a-zA-Z0-9_*?]*
+name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?]*
 modifier_event	[ukhpGH]{1,8}
 modifier_bp	[rwx]{1,3}
 
@@ -168,6 +169,7 @@ period			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
+{name_minus}		{ return str(yyscanner, PE_NAME); }
 }
 
 mem:			{ BEGIN(mem); return PE_PREFIX_MEM; }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 06/16] perf: add support for PERF_SAMPLE_ADDR in dump_sampple()
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (4 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 05/16] perf, tools: Add arbitary aliases and support names with - Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 07/16] perf: add generic memory sampling interface Stephane Eranian
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Was missing from current code yet PERF_SAMPLE_ADDR has
been present for a long time. Needed for PEBS-LL mode.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/session.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 962c04f..b6d7cad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1009,6 +1009,10 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_WEIGHT)
 		printf(" ... weight: %"PRIu64"\n", sample->weight);
+
+        if (sample_type & PERF_SAMPLE_ADDR)
+                printf(" ..... data: 0x%"PRIx64"\n", sample->addr);
+
 }
 
 static struct machine *
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 07/16] perf: add generic memory sampling interface
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (5 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 06/16] perf: add support for PERF_SAMPLE_ADDR in dump_sampple() Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds PERF_SAMPLE_COST and PERF_SAMPLE_DSRC.
The first collects a cost associated with the sampled
event. In case of memory access, the cost would be
the latency of the load, otherwise it defaults to
the sampling period.

PERF_SAMPLE_DSRC collects the data source, i.e., where
did the data associated with the sampled instruction
come from. Information is stored in a perf_mem_dsrc
structure. It contains opcode, mem level, tlb, snoop,
lock information, subject to availability in hardware.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h      |    2 ++
 include/uapi/linux/perf_event.h |   69 +++++++++++++++++++++++++++++++++++++--
 kernel/events/core.c            |    6 ++++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bb2429d..8fe4610 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -579,6 +579,7 @@ struct perf_sample_data {
 		u32	reserved;
 	}				cpu_entry;
 	u64				period;
+	union  perf_mem_dsrc		dsrc;
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
@@ -599,6 +600,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
 	data->weight = 0;
+	data->dsrc.val = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c52caab..b2a78bb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -133,9 +133,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_USER			= 1U << 12,
 	PERF_SAMPLE_STACK_USER			= 1U << 13,
 	PERF_SAMPLE_WEIGHT			= 1U << 14,
+	PERF_SAMPLE_DSRC			= 1U << 15,
 
-	PERF_SAMPLE_MAX = 1U << 15,		/* non-ABI */
-
+	PERF_SAMPLE_MAX = 1U << 16,		/* non-ABI */
 };
 
 /*
@@ -591,6 +591,8 @@ enum perf_event_type {
 	 * 	{ u64			size;
 	 * 	  char			data[size];
 	 * 	  u64			dyn_size; } && PERF_SAMPLE_STACK_USER
+	 *
+	 *	{ u64			dsrc;  } && PERF_SAMPLE_DSRC
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
@@ -616,4 +618,67 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+union perf_mem_dsrc {
+	__u64 val;
+	struct {
+		__u64   mem_op:5,	/* type of opcode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_snoop:5,	/* snoop mode */
+			mem_lock:2,	/* lock instr */
+			mem_dtlb:7,	/* tlb access */
+			mem_rsvd:31;
+	};
+};
+
+/* type of opcode (load/store/prefetch,code) */
+#define PERF_MEM_OP_NA		0x01 /* not available */
+#define PERF_MEM_OP_LOAD	0x02 /* load instruction */
+#define PERF_MEM_OP_STORE	0x04 /* store instruction */
+#define PERF_MEM_OP_PFETCH	0x08 /* prefetch */
+#define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
+#define PERF_MEM_OP_SHIFT	0
+
+/* memory hierarchy (memory level, hit or miss) */
+#define PERF_MEM_LVL_NA		0x01  /* not available */
+#define PERF_MEM_LVL_HIT	0x02  /* hit level */
+#define PERF_MEM_LVL_MISS	0x04  /* miss level  */
+#define PERF_MEM_LVL_L1		0x08  /* L1 */
+#define PERF_MEM_LVL_LFB	0x10  /* Line Fill Buffer */
+#define PERF_MEM_LVL_L2		0x20  /* L2 hit */
+#define PERF_MEM_LVL_L3		0x40  /* L3 hit */
+#define PERF_MEM_LVL_LOC_RAM	0x80  /* Local DRAM */
+#define PERF_MEM_LVL_REM_RAM1	0x100 /* Remote DRAM (1 hop) */
+#define PERF_MEM_LVL_REM_RAM2	0x200 /* Remote DRAM (2 hops) */
+#define PERF_MEM_LVL_REM_CCE1	0x400 /* Remote Cache (1 hop) */
+#define PERF_MEM_LVL_REM_CCE2	0x800 /* Remote Cache (2 hops) */
+#define PERF_MEM_LVL_IO		0x1000 /* I/O memory */
+#define PERF_MEM_LVL_UNC	0x2000 /* Uncached memory */
+#define PERF_MEM_LVL_SHIFT	5
+
+/* snoop mode */
+#define PERF_MEM_SNOOP_NA	0x01 /* not available */
+#define PERF_MEM_SNOOP_NONE	0x02 /* no snoop */
+#define PERF_MEM_SNOOP_HIT	0x04 /* snoop hit */
+#define PERF_MEM_SNOOP_MISS	0x08 /* snoop miss */
+#define PERF_MEM_SNOOP_HITM	0x10 /* snoop hit modified */
+#define PERF_MEM_SNOOP_SHIFT	19
+
+/* locked instruction */
+#define PERF_MEM_LOCK_NA	0x01 /* not available */
+#define PERF_MEM_LOCK_LOCKED	0x02 /* locked transaction */
+#define PERF_MEM_LOCK_SHIFT	24
+
+/* TLB access */
+#define PERF_MEM_TLB_NA		0x01 /* not available */
+#define PERF_MEM_TLB_HIT	0x02 /* hit level */
+#define PERF_MEM_TLB_MISS	0x04 /* miss level */
+#define PERF_MEM_TLB_L1		0x08 /* L1 */
+#define PERF_MEM_TLB_L2		0x10 /* L2 */
+#define PERF_MEM_TLB_WK		0x20 /* Hardware Walker*/
+#define PERF_MEM_TLB_OS		0x40 /* OS fault handler */
+#define PERF_MEM_TLB_SHIFT	26
+
+#define PERF_MEM_S(a, s) \
+	(((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
+
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d633581..94b2c70 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -958,6 +958,9 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_READ)
 		size += event->read_size;
 
+	if (sample_type & PERF_SAMPLE_DSRC)
+		size += sizeof(data->dsrc.val);
+
 	event->header_size = size;
 }
 
@@ -4175,6 +4178,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 		perf_output_sample_ustack(handle,
 					  data->stack_user_size,
 					  data->regs_user.regs);
+
+	if (sample_type & PERF_SAMPLE_DSRC)
+		perf_output_put(handle, data->dsrc.val);
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (6 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 07/16] perf: add generic memory sampling interface Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-06 13:31   ` Andi Kleen
  2012-11-05 13:50 ` [PATCH v2 09/16] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DSRC types.

To enable PEBS Load Latency, users have to use the
model specific event:
- on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
- on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/include/asm/msr-index.h          |    1 +
 arch/x86/kernel/cpu/perf_event.c          |    3 +
 arch/x86/kernel/cpu/perf_event.h          |   22 ++++-
 arch/x86/kernel/cpu/perf_event_intel.c    |   56 ++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++++++++++++++++++++++++++--
 5 files changed, 206 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..5edc9c0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -66,6 +66,7 @@
 #define MSR_IA32_PEBS_ENABLE		0x000003f1
 #define MSR_IA32_DS_AREA		0x00000600
 #define MSR_IA32_PERF_CAPABILITIES	0x00000345
+#define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
 
 #define MSR_MTRRfix64K_00000		0x00000250
 #define MSR_MTRRfix16K_80000		0x00000258
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 80dd390..01b4717 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1475,6 +1475,9 @@ static int __init init_hw_perf_events(void)
 	x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
 	x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
+	if (x86_pmu.event_attrs)
+		x86_pmu_events_group.attrs = x86_pmu.event_attrs;
+
 	if (!x86_pmu.events_sysfs_show)
 		x86_pmu_events_group.attrs = &empty_attrs;
 	else
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index f78d6e8..3c5aa72 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -46,6 +46,7 @@ enum extra_reg_type {
 	EXTRA_REG_RSP_0 = 0,	/* offcore_response_0 */
 	EXTRA_REG_RSP_1 = 1,	/* offcore_response_1 */
 	EXTRA_REG_LBR   = 2,	/* lbr_select */
+	EXTRA_REG_LDLAT = 3,	/* ld_lat_threshold */
 
 	EXTRA_REG_MAX		/* number of entries needed */
 };
@@ -61,6 +62,10 @@ struct event_constraint {
 	int	overlap;
 	int	flags;
 };
+/*
+ * struct event_constraint flags
+ */
+#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -233,6 +238,10 @@ struct cpu_hw_events {
 #define INTEL_UEVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
 
+#define INTEL_PLD_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
+
 #define EVENT_CONSTRAINT_END		\
 	EVENT_CONSTRAINT(0, 0, 0)
 
@@ -262,12 +271,22 @@ struct extra_reg {
 	.msr = (ms),		\
 	.config_mask = (m),	\
 	.valid_mask = (vm),	\
-	.idx = EXTRA_REG_##i	\
+	.idx = EXTRA_REG_##i,	\
 	}
 
 #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx)	\
 	EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT, vm, idx)
 
+#define INTEL_UEVENT_EXTRA_REG(event, msr, vm, idx) \
+	EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT | \
+			ARCH_PERFMON_EVENTSEL_UMASK, vm, idx)
+
+#define INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(c) \
+	INTEL_UEVENT_EXTRA_REG(c, \
+			       MSR_PEBS_LD_LAT_THRESHOLD, \
+			       0xffff, \
+			       LDLAT)
+
 #define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0, RSP_0)
 
 union perf_capabilities {
@@ -355,6 +374,7 @@ struct x86_pmu {
 	 */
 	int		attr_rdpmc;
 	struct attribute **format_attrs;
+	struct attribute **event_attrs;
 
 	ssize_t		(*events_sysfs_show)(char *page, u64 config);
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 57d6527..a279d01 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -81,6 +81,7 @@ static struct event_constraint intel_nehalem_event_constraints[] __read_mostly =
 static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
 {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
 	EVENT_EXTRA_END
 };
 
@@ -111,6 +112,7 @@ static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
 {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
 	INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff, RSP_1),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
 	EVENT_EXTRA_END
 };
 
@@ -130,9 +132,55 @@ static struct event_constraint intel_gen_event_constraints[] __read_mostly =
 static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0),
 	INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
 	EVENT_EXTRA_END
 };
 
+
+EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
+EVENT_ATTR(instructions,		INSTRUCTIONS		);
+EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
+EVENT_ATTR(cache-misses,		CACHE_MISSES		);
+EVENT_ATTR(branch-instructions,		BRANCH_INSTRUCTIONS	);
+EVENT_ATTR(branch-misses,		BRANCH_MISSES		);
+EVENT_ATTR(bus-cycles,			BUS_CYCLES		);
+EVENT_ATTR(stalled-cycles-frontend,	STALLED_CYCLES_FRONTEND	);
+EVENT_ATTR(stalled-cycles-backend,	STALLED_CYCLES_BACKEND	);
+EVENT_ATTR(ref-cycles,			REF_CPU_CYCLES		);
+
+EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
+EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+
+struct attribute *nhm_events_attrs[] = {
+	EVENT_PTR(CPU_CYCLES),
+	EVENT_PTR(INSTRUCTIONS),
+	EVENT_PTR(CACHE_REFERENCES),
+	EVENT_PTR(CACHE_MISSES),
+	EVENT_PTR(BRANCH_INSTRUCTIONS),
+	EVENT_PTR(BRANCH_MISSES),
+	EVENT_PTR(BUS_CYCLES),
+	EVENT_PTR(STALLED_CYCLES_FRONTEND),
+	EVENT_PTR(STALLED_CYCLES_BACKEND),
+	EVENT_PTR(REF_CPU_CYCLES),
+	EVENT_PTR(mem_ld_nhm),
+	NULL,
+};
+
+struct attribute *snb_events_attrs[] = {
+	EVENT_PTR(CPU_CYCLES),
+	EVENT_PTR(INSTRUCTIONS),
+	EVENT_PTR(CACHE_REFERENCES),
+	EVENT_PTR(CACHE_MISSES),
+	EVENT_PTR(BRANCH_INSTRUCTIONS),
+	EVENT_PTR(BRANCH_MISSES),
+	EVENT_PTR(BUS_CYCLES),
+	EVENT_PTR(STALLED_CYCLES_FRONTEND),
+	EVENT_PTR(STALLED_CYCLES_BACKEND),
+	EVENT_PTR(REF_CPU_CYCLES),
+	EVENT_PTR(mem_ld_snb),
+	NULL,
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
 	return intel_perfmon_event_map[hw_event];
@@ -2009,6 +2057,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.extra_regs = intel_nehalem_extra_regs;
 
+		x86_pmu.event_attrs = nhm_events_attrs;
+
 		/* UOPS_ISSUED.STALLED_CYCLES */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2049,6 +2099,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.extra_regs = intel_westmere_extra_regs;
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 
+		x86_pmu.event_attrs = nhm_events_attrs;
+
 		/* UOPS_ISSUED.STALLED_CYCLES */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2077,6 +2129,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+		x86_pmu.event_attrs = snb_events_attrs;
+
 		/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2102,6 +2156,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+		x86_pmu.event_attrs = snb_events_attrs;
+
 		/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index f30d85b..759b0c0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -24,6 +24,92 @@ struct pebs_record_32 {
 
  */
 
+union intel_x86_pebs_dse {
+	u64 val;
+	struct {
+		unsigned int ld_dse:4;
+		unsigned int ld_stlb_miss:1;
+		unsigned int ld_locked:1;
+		unsigned int ld_reserved:26;
+	};
+	struct {
+		unsigned int st_l1d_hit:1;
+		unsigned int st_reserved1:3;
+		unsigned int st_stlb_miss:1;
+		unsigned int st_locked:1;
+		unsigned int st_reserved2:26;
+	};
+};
+
+
+/*
+ * Map PEBS Load Latency Data Source encodings to generic
+ * memory data source information
+ */
+#define P(a, b) PERF_MEM_S(a, b)
+#define OP_LH (P(OP, LOAD) | P(LVL, HIT))
+#define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS))
+
+static const u64 pebs_data_source[] = {
+	P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
+	OP_LH | P(LVL, L1) | P(SNOOP, NONE),	/* 0x01: L1 local */
+	OP_LH | P(LVL, LFB)| P(SNOOP, NONE),	/* 0x02: LFB hit */
+	OP_LH | P(LVL, L2) | P(SNOOP, NONE),	/* 0x03: L2 hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, NONE),	/* 0x04: L3 hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, MISS),	/* 0x05: L3 hit, snoop miss */
+	OP_LH | P(LVL, L3) | P(SNOOP, HIT),	/* 0x06: L3 hit, snoop hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, HITM),	/* 0x07: L3 hit, snoop hitm */
+	OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT),  /* 0x08: L3 miss snoop hit */
+	OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
+	OP_LH | P(LVL, LOC_RAM)  | P(SNOOP, HIT),  /* 0x0a: L3 miss, shared */
+	OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT),  /* 0x0b: L3 miss, shared */
+	OP_LH | P(LVL, LOC_RAM)  | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */
+	OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */
+	OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */
+	OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
+};
+
+static u64 load_latency_data(u64 status)
+{
+	union intel_x86_pebs_dse dse;
+	u64 val;
+	int model = boot_cpu_data.x86_model;
+	int fam = boot_cpu_data.x86;
+
+	dse.val = status;
+
+	/*
+	 * use the mapping table for bit 0-3
+	 */
+	val = pebs_data_source[dse.ld_dse];
+
+	/*
+	 * Nehalem models do not support TLB, Lock infos
+	 */
+	if (fam == 0x6 && (model == 26 || model == 30
+	    || model == 31 || model == 46)) {
+		val |= P(TLB, NA) | P(LOCK, NA);
+		return val;
+	}
+	/*
+	 * bit 4: TLB access
+	 * 0 = did not miss 2nd level TLB
+	 * 1 = missed 2nd level TLB
+	 */
+	if (dse.ld_stlb_miss)
+		val |= P(TLB, MISS) | P(TLB, L2);
+	else
+		val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
+
+	/*
+	 * bit 5: locked prefix
+	 */
+	if (dse.ld_locked)
+		val |= P(LOCK, LOCKED);
+
+	return val;
+}
+
 struct pebs_record_core {
 	u64 flags, ip;
 	u64 ax, bx, cx, dx;
@@ -364,7 +450,7 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_nehalem_pebs_event_constraints[] = {
-	INTEL_EVENT_CONSTRAINT(0x0b, 0xf),    /* MEM_INST_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x100b, 0xf),      /* MEM_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0x0f, 0xf),    /* MEM_UNCORE_RETIRED.* */
 	INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
 	INTEL_EVENT_CONSTRAINT(0xc0, 0xf),    /* INST_RETIRED.ANY */
@@ -379,7 +465,7 @@ struct event_constraint intel_nehalem_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_westmere_pebs_event_constraints[] = {
-	INTEL_EVENT_CONSTRAINT(0x0b, 0xf),    /* MEM_INST_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x100b, 0xf),      /* MEM_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0x0f, 0xf),    /* MEM_UNCORE_RETIRED.* */
 	INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
 	INTEL_EVENT_CONSTRAINT(0xc0, 0xf),    /* INSTR_RETIRED.* */
@@ -399,7 +485,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
 	INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
 	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xcd, 0x8),    /* MEM_TRANS_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
 	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -413,7 +499,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
         INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
-        INTEL_EVENT_CONSTRAINT(0xcd, 0x8),    /* MEM_TRANS_RETIRED.* */
+        INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
         INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -448,6 +534,9 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 	hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
 	cpuc->pebs_enabled |= 1ULL << hwc->idx;
+
+	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -560,20 +649,48 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 				   struct pt_regs *iregs, void *__pebs)
 {
 	/*
-	 * We cast to pebs_record_core since that is a subset of
-	 * both formats and we don't use the other fields in this
-	 * routine.
+	 * We cast to pebs_record_nhm to get the load latency data
+	 * if extra_reg MSR_PEBS_LD_LAT_THRESHOLD used
 	 */
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct pebs_record_core *pebs = __pebs;
+	struct pebs_record_nhm *pebs = __pebs;
 	struct perf_sample_data data;
 	struct pt_regs regs;
+	u64 sample_type;
+	int fll;
 
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
+	fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
+	data.period = event->hw.last_period;
+	sample_type = event->attr.sample_type;
+
+	/*
+	 * if PEBS-LL or PreciseStore
+	 */
+	if (fll) {
+		if (sample_type & PERF_SAMPLE_ADDR)
+			data.addr = pebs->dla;
+
+		/*
+		 * Use latency for weight (only avail with PEBS-LL)
+		 */
+		if (fll && (sample_type & PERF_SAMPLE_WEIGHT))
+			data.weight = pebs->lat;
+
+		/*
+		 * data.dsrc encodes the data source
+		 */
+		if (sample_type & PERF_SAMPLE_DSRC) {
+			if (fll)
+				data.dsrc.val = load_latency_data(pebs->dse);
+		}
+	}
+
 	/*
 	 * We use the interrupt regs as a base because the PEBS record
 	 * does not contain a full regs set, specifically it seems to
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 09/16] perf/x86: export PEBS load latency threshold register to sysfs
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (7 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 10/16] perf/x86: add support for PEBS Precise Store Stephane Eranian
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Make the PEBS Load Latency threshold register layout
and encoding visible to user level tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index a279d01..cbfb252 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1787,6 +1787,8 @@ static void intel_pmu_flush_branch_stack(void)
 
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
+PMU_FORMAT_ATTR(ldlat, "config1:0-15");
+
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -1797,6 +1799,7 @@ static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_cmask.attr,
 
 	&format_attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
+	&format_attr_ldlat.attr, /* PEBS load latency */
 	NULL,
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 10/16] perf/x86: add support for PEBS Precise Store
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (8 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 09/16] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 11/16] perf tools: add mem access sampling core support Stephane Eranian
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds support for PEBS Precise Store
which is available on Intel Sandy Bridge and
Ivy Bridge processors.

To use Precise store, the proper PEBS event
must be used: mem_trans_retired:precise_stores.
For the perf tool, the generic mem-stores event
exported via sysfs can be used directly.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.h          |    5 +++
 arch/x86/kernel/cpu/perf_event_intel.c    |    2 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   49 +++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3c5aa72..4e95c90 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -66,6 +66,7 @@ struct event_constraint {
  * struct event_constraint flags
  */
 #define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
+#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -242,6 +243,10 @@ struct cpu_hw_events {
 	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
 			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
 
+#define INTEL_PST_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST)
+
 #define EVENT_CONSTRAINT_END		\
 	EVENT_CONSTRAINT(0, 0, 0)
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index cbfb252..4176be3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -150,6 +150,7 @@ EVENT_ATTR(ref-cycles,			REF_CPU_CYCLES		);
 
 EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
 EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");
 
 struct attribute *nhm_events_attrs[] = {
 	EVENT_PTR(CPU_CYCLES),
@@ -178,6 +179,7 @@ struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(STALLED_CYCLES_BACKEND),
 	EVENT_PTR(REF_CPU_CYCLES),
 	EVENT_PTR(mem_ld_snb),
+	EVENT_PTR(mem_st_snb),
 	NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 759b0c0..ed0ca3e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -69,6 +69,44 @@ static const u64 pebs_data_source[] = {
 	OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
 };
 
+static u64 precise_store_data(u64 status)
+{
+	union intel_x86_pebs_dse dse;
+	u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
+
+	dse.val = status;
+
+	/*
+	 * bit 4: TLB access
+	 * 1 = stored missed 2nd level TLB
+	 *
+	 * so it either hit the walker or the OS
+	 * otherwise hit 2nd level TLB
+	 */
+	if (dse.st_stlb_miss)
+		val |= P(TLB, MISS);
+	else
+		val |= P(TLB, HIT);
+
+	/*
+	 * bit 0: hit L1 data cache
+	 * if not set, then all we know is that
+	 * it missed L1D
+	 */
+	if (dse.st_l1d_hit)
+		val |= P(LVL, HIT);
+	else
+		val |= P(LVL, MISS);
+
+	/*
+	 * bit 5: Locked prefix
+	 */
+	if (dse.st_locked)
+		val |= P(LOCK, LOCKED);
+
+	return val;
+}
+
 static u64 load_latency_data(u64 status)
 {
 	union intel_x86_pebs_dse dse;
@@ -486,6 +524,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
 	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
 	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
 	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -500,6 +539,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
         INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
         INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -537,6 +577,8 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
 	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
 		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
+	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
+		cpuc->pebs_enabled |= 1ULL << 63;
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -657,12 +699,13 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	struct perf_sample_data data;
 	struct pt_regs regs;
 	u64 sample_type;
-	int fll;
+	int fll, fst;
 
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
 	fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+	fst = event->hw.flags & PERF_X86_EVENT_PEBS_ST;
 
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
@@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	/*
 	 * if PEBS-LL or PreciseStore
 	 */
-	if (fll) {
+	if (fll || fst) {
 		if (sample_type & PERF_SAMPLE_ADDR)
 			data.addr = pebs->dla;
 
@@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 		if (sample_type & PERF_SAMPLE_DSRC) {
 			if (fll)
 				data.dsrc.val = load_latency_data(pebs->dse);
+			else
+				data.dsrc.val = precise_store_data(pebs->dse);
 		}
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 11/16] perf tools: add mem access sampling core support
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (9 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 10/16] perf/x86: add support for PEBS Precise Store Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:50 ` [PATCH v2 12/16] perf report: add support for mem access profiling Stephane Eranian
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds the sorting and histogram support
functions to enable profiling of memory accesses.

The following sorting orders are added:
 - symbol_daddr: data address symbol (or raw address)
 - dso_daddr: data address shared object
 - cost: access cost
 - locked: access uses locked transaction
 - tlb : TLB access
 - mem : memory level of the access (L1, L2, L3, RAM, ...)
 - snoop: access snoop mode

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/event.h   |    1 +
 tools/perf/util/evsel.c   |    4 +
 tools/perf/util/hist.c    |   66 ++++++++-
 tools/perf/util/hist.h    |   13 ++
 tools/perf/util/session.c |   37 ++++++
 tools/perf/util/session.h |    4 +
 tools/perf/util/sort.c    |  324 ++++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/sort.h    |   10 +-
 tools/perf/util/symbol.h  |    7 +
 9 files changed, 456 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index cf52977..ad66b44 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -91,6 +91,7 @@ struct perf_sample {
 	u32 cpu;
 	u32 raw_size;
 	u64 weight;
+	u64 dsrc;
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b6f3697..951d4d1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1025,6 +1025,10 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_DSRC) {
+		data->dsrc = *array;
+		array++;
+	}
 	return 0;
 }
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 277947a..52fe4e2 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -66,12 +66,16 @@ static void hists__set_unres_dso_col_len(struct hists *hists, int dso)
 void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 {
 	const unsigned int unresolved_col_width = BITS_PER_LONG / 4;
+	int symlen;
 	u16 len;
 
 	if (h->ms.sym)
 		hists__new_col_len(hists, HISTC_SYMBOL, h->ms.sym->namelen + 4);
-	else
+	else {
+		symlen = unresolved_col_width + 4 + 2;
+		hists__new_col_len(hists, HISTC_SYMBOL, symlen);
 		hists__set_unres_dso_col_len(hists, HISTC_DSO);
+	}
 
 	len = thread__comm_len(h->thread);
 	if (hists__new_col_len(hists, HISTC_COMM, len))
@@ -83,7 +87,6 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 	}
 
 	if (h->branch_info) {
-		int symlen;
 		/*
 		 * +4 accounts for '[x] ' priv level info
 		 * +2 account of 0x prefix on raw addresses
@@ -111,7 +114,32 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 			hists__new_col_len(hists, HISTC_SYMBOL_TO, symlen);
 			hists__set_unres_dso_col_len(hists, HISTC_DSO_TO);
 		}
+	} else if (h->mem_info) {
+		/*
+		 * +4 accounts for '[x] ' priv level info
+		 * +2 account of 0x prefix on raw addresses
+		 */
+		if (h->mem_info->daddr.sym) {
+			symlen = (int)h->mem_info->daddr.sym->namelen + 4 + unresolved_col_width + 2;
+			hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+		} else {
+			symlen = unresolved_col_width + 4 + 2;
+			hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+		}
+		if (h->mem_info->daddr.map) {
+			symlen = dso__name_len(h->mem_info->daddr.map->dso);
+			hists__new_col_len(hists, HISTC_MEM_DADDR_DSO, symlen);
+		} else {
+			symlen = unresolved_col_width + 4 + 2;
+			hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
+		}
+		hists__new_col_len(hists, HISTC_MEM_COST, 7);
+		hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
+		hists__new_col_len(hists, HISTC_MEM_TLB, 22);
+		hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
+		hists__new_col_len(hists, HISTC_MEM_LVL, 21+3);
 	}
+
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -235,7 +263,7 @@ void hists__decay_entries_threaded(struct hists *hists,
 static struct hist_entry *hist_entry__new(struct hist_entry *template)
 {
 	size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
-	struct hist_entry *he = malloc(sizeof(*he) + callchain_size);
+	struct hist_entry *he = calloc(1, sizeof(*he) + callchain_size);
 
 	if (he != NULL) {
 		*he = *template;
@@ -321,6 +349,35 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	return he;
 }
 
+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+					  struct addr_location *al,
+					  struct symbol *sym_parent,
+					  struct mem_info *mi,
+					  u64 weight)
+{
+	struct hist_entry entry = {
+		.thread	= al->thread,
+		.ms = {
+			.map	= al->map,
+			.sym	= al->sym,
+		},
+		.cpu	= al->cpu,
+		.ip	= al->addr,
+		.level	= al->level,
+		.stat = {
+			.period = weight,
+			.nr_events = 1,
+		},
+		.parent = sym_parent,
+		.filtered = symbol__parent_filter(sym_parent),
+		.hists = self,
+		.mem_info = mi,
+		.branch_info = NULL,
+	};
+
+	return add_hist_entry(self, &entry, al, weight);
+}
+
 struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
@@ -344,6 +401,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.filtered = symbol__parent_filter(sym_parent),
 		.branch_info = bi,
 		.hists	= self,
+		.mem_info = NULL,
 	};
 
 	return add_hist_entry(self, &entry, al, period);
@@ -369,6 +427,8 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.hists	= self,
+		.branch_info = NULL,
+		.mem_info = NULL,
 	};
 
 	return add_hist_entry(self, &entry, al, period);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index b874609..a77d102 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -48,6 +48,13 @@ enum hist_column {
 	HISTC_DSO_FROM,
 	HISTC_DSO_TO,
 	HISTC_SRCLINE,
+	HISTC_MEM_DADDR_SYMBOL,
+	HISTC_MEM_DADDR_DSO,
+	HISTC_MEM_COST,
+	HISTC_MEM_LOCKED,
+	HISTC_MEM_TLB,
+	HISTC_MEM_LVL,
+	HISTC_MEM_SNOOP,
 	HISTC_NR_COLS, /* Last entry */
 };
 
@@ -85,6 +92,12 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct branch_info *bi,
 					     u64 period);
 
+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+					  struct addr_location *al,
+					  struct symbol *sym_parent,
+					  struct mem_info *mi,
+					  u64 period);
+
 void hists__output_resort(struct hists *self);
 void hists__output_resort_threaded(struct hists *hists);
 void hists__collapse_resort(struct hists *self);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b6d7cad..58ca636 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -273,6 +273,41 @@ static void ip__resolve_ams(struct machine *self, struct thread *thread,
 	ams->map = al.map;
 }
 
+static void ip__resolve_data(struct machine *self, struct thread *thread,
+			     u8 m,
+			    struct addr_map_symbol *ams,
+			    u64 addr)
+{
+	struct addr_location al;
+
+	memset(&al, 0, sizeof(al));
+
+	thread__find_addr_location(thread, self, m, MAP__VARIABLE, addr, &al, NULL);
+	ams->addr = addr;
+	ams->al_addr = al.addr;
+	ams->sym = al.sym;
+	ams->map = al.map;
+}
+
+struct mem_info *machine__resolve_mem(struct machine *self,
+				      struct thread *thr,
+				      struct perf_sample *sample,
+				      u8 cpumode)
+{
+	struct mem_info *mi;
+
+	mi = calloc(1, sizeof(struct mem_info));
+	if (!mi)
+		return NULL;
+
+	ip__resolve_ams(self, thr, &mi->iaddr, sample->ip);
+	ip__resolve_data(self, thr, cpumode, &mi->daddr, sample->addr);
+	mi->cost = sample->weight;
+	mi->dsrc.val = sample->dsrc;
+
+	return mi;
+}
+
 struct branch_info *machine__resolve_bstack(struct machine *self,
 					    struct thread *thr,
 					    struct branch_stack *bs)
@@ -1013,6 +1048,8 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
         if (sample_type & PERF_SAMPLE_ADDR)
                 printf(" ..... data: 0x%"PRIx64"\n", sample->addr);
 
+	if (sample_type & PERF_SAMPLE_DSRC)
+		printf(" . data_src: 0x%"PRIx64"\n", sample->dsrc);
 }
 
 static struct machine *
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index dd64261..37ab693 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -72,6 +72,10 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 					    struct thread *thread,
 					    struct branch_stack *bs);
 
+struct mem_info *machine__resolve_mem(struct machine *self,
+				      struct thread *thread,
+				      struct perf_sample *sample, u8 cpumode);
+
 bool perf_session__has_traces(struct perf_session *self, const char *msg);
 
 void mem_bswap_64(void *src, int byte_size);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..e2e466d 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -182,11 +182,19 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
 	}
 
 	ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", level);
-	if (sym)
-		ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
-				       width - ret,
-				       sym->name);
-	else {
+	if (sym) {
+		if (map->type == MAP__VARIABLE) {
+			ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
+			ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
+					ip - sym->start);
+			ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+				       width - ret, "");
+		} else {
+			ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+					       width - ret,
+					       sym->name);
+		}
+	} else {
 		size_t len = BITS_PER_LONG / 4;
 		ret += repsep_snprintf(bf + ret, size - ret, "%-#.*llx",
 				       len, ip);
@@ -469,6 +477,238 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *self, char *bf,
 	return repsep_snprintf(bf, size, "%-*s", width, out);
 }
 
+/* --sort daddr_sym */
+static int64_t
+sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	struct addr_map_symbol *l = &left->mem_info->daddr;
+	struct addr_map_symbol *r = &right->mem_info->daddr;
+
+	return (int64_t)(r->addr - l->addr);
+}
+
+static int hist_entry__daddr_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	return _hist_entry__sym_snprintf(self->mem_info->daddr.map,
+					 self->mem_info->daddr.sym,
+					 self->mem_info->daddr.addr,
+					 self->level, bf, size, width);
+}
+
+static int64_t
+sort__dso_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return _sort__dso_cmp(left->mem_info->daddr.map, right->mem_info->daddr.map);
+}
+
+static int hist_entry__dso_daddr_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	return _hist_entry__dso_snprintf(self->mem_info->daddr.map, bf, size, width);
+}
+
+static int64_t
+sort__cost_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	u64 cost_l = left->mem_info->cost;
+	u64 cost_r = right->mem_info->cost;
+
+	return (int64_t)(cost_r - cost_l);
+}
+
+static int hist_entry__cost_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	if (self->mem_info->cost == 0)
+		return repsep_snprintf(bf, size, "%*s", width, "N/A");
+	return repsep_snprintf(bf, size, "%*"PRIu64, width, self->mem_info->cost);
+}
+
+static int64_t
+sort__locked_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_lock - dsrc_l.mem_lock);
+}
+
+static int hist_entry__locked_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	const char *out = "??";
+	u64 mask = self->mem_info->dsrc.mem_lock;
+
+	if (mask & PERF_MEM_LOCK_NA)
+		out = "N/A";
+	else if (mask & PERF_MEM_LOCK_LOCKED)
+		out = "Yes";
+	else
+		out = "No";
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__tlb_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_dtlb - dsrc_l.mem_dtlb);
+}
+
+static const char *tlb_access[] = {
+	"N/A",
+	"HIT",
+	"MISS",
+	"L1",
+	"L2",
+	"Walker",
+	"Fault",
+};
+#define NUM_TLB_ACCESS (sizeof(tlb_access)/sizeof(const char *))
+
+static int hist_entry__tlb_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t l = 0, i;
+	u64 m = self->mem_info->dsrc.mem_dtlb;
+	u64 hit, miss;
+
+	out[0] = '\0';
+
+	hit = m & PERF_MEM_TLB_HIT;
+	miss = m & PERF_MEM_TLB_MISS;
+
+	/* already taken care of */
+	m &= ~(PERF_MEM_TLB_HIT|PERF_MEM_TLB_MISS);
+
+	for (i = 0; m && i < NUM_TLB_ACCESS; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, tlb_access[i], sz - l);
+		l += strlen(tlb_access[i]);
+	}
+	if (hit)
+		strncat(out, " hit", sz - l);
+	if (miss)
+		strncat(out, " miss", sz - l);
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__lvl_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_lvl - dsrc_l.mem_lvl);
+}
+
+static const char *mem_lvl[] = {
+	"N/A",
+	"HIT",
+	"MISS",
+	"L1",
+	"LFB",
+	"L2",
+	"L3",
+	"Local RAM",
+	"Remote RAM (1 hop)",
+	"Remote RAM (2 hops)",
+	"Remote Cache (1 hop)",
+	"Remote Cache (2 hops)",
+	"I/O",
+	"Uncached",
+};
+#define NUM_MEM_LVL (sizeof(mem_lvl)/sizeof(const char *))
+
+static int hist_entry__lvl_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t i, l = 0;
+	u64 m = self->mem_info->dsrc.mem_lvl;
+	u64 hit, miss;
+
+	out[0] = '\0';
+
+	hit = m & PERF_MEM_LVL_HIT;
+	miss = m & PERF_MEM_LVL_MISS;
+
+	/* already taken care of */
+	m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+
+	for (i = 0; m && i < NUM_MEM_LVL; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, mem_lvl[i], sz - l);
+		l += strlen(mem_lvl[i]);
+	}
+	if (hit)
+		strncat(out, " hit", sz - l);
+	if (miss)
+		strncat(out, " miss", sz - l);
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__snoop_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_snoop - dsrc_l.mem_snoop);
+}
+
+static const char *snoop_access[] = {
+	"N/A",
+	"None",
+	"Miss",
+	"Hit",
+	"HitM",
+};
+#define NUM_SNOOP_ACCESS (sizeof(snoop_access)/sizeof(const char *))
+
+static int hist_entry__snoop_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t i, l = 0;
+	u64 m = self->mem_info->dsrc.mem_snoop;
+
+	out[0] = '\0';
+
+	for (i = 0; m && i < NUM_SNOOP_ACCESS; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, snoop_access[i], sz - l);
+		l += strlen(snoop_access[i]);
+	}
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
 struct sort_entry sort_mispredict = {
 	.se_header	= "Branch Mispredicted",
 	.se_cmp		= sort__mispredict_cmp,
@@ -476,6 +716,56 @@ struct sort_entry sort_mispredict = {
 	.se_width_idx	= HISTC_MISPREDICT,
 };
 
+struct sort_entry sort_mem_daddr_sym = {
+	.se_header	= "Data Symbol",
+	.se_cmp		= sort__daddr_cmp,
+	.se_snprintf	= hist_entry__daddr_snprintf,
+	.se_width_idx	= HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_daddr_dso = {
+	.se_header	= "Data Object",
+	.se_cmp		= sort__dso_daddr_cmp,
+	.se_snprintf	= hist_entry__dso_daddr_snprintf,
+	.se_width_idx	= HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_cost = {
+	.se_header	= "Cost",
+	.se_cmp		= sort__cost_cmp,
+	.se_snprintf	= hist_entry__cost_snprintf,
+	.se_width_idx	= HISTC_MEM_COST,
+};
+
+struct sort_entry sort_mem_locked = {
+	.se_header	= "Locked",
+	.se_cmp		= sort__locked_cmp,
+	.se_snprintf	= hist_entry__locked_snprintf,
+	.se_width_idx	= HISTC_MEM_LOCKED,
+};
+
+struct sort_entry sort_mem_tlb = {
+	.se_header	= "TLB access",
+	.se_cmp		= sort__tlb_cmp,
+	.se_snprintf	= hist_entry__tlb_snprintf,
+	.se_width_idx	= HISTC_MEM_TLB,
+};
+
+struct sort_entry sort_mem_lvl = {
+	.se_header	= "Memory access",
+	.se_cmp		= sort__lvl_cmp,
+	.se_snprintf	= hist_entry__lvl_snprintf,
+	.se_width_idx	= HISTC_MEM_LVL,
+};
+
+struct sort_entry sort_mem_snoop = {
+	.se_header	= "Snoop",
+	.se_cmp		= sort__snoop_cmp,
+	.se_snprintf	= hist_entry__snoop_snprintf,
+	.se_width_idx	= HISTC_MEM_SNOOP,
+};
+
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -497,6 +787,13 @@ static struct sort_dimension sort_dimensions[] = {
 	DIM(SORT_CPU, "cpu", sort_cpu),
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_SRCLINE, "srcline", sort_srcline),
+	DIM(SORT_MEM_DADDR_SYMBOL, "symbol_daddr", sort_mem_daddr_sym),
+	DIM(SORT_MEM_DADDR_DSO, "dso_daddr", sort_mem_daddr_dso),
+	DIM(SORT_MEM_COST, "cost", sort_mem_cost),
+	DIM(SORT_MEM_LOCKED, "locked", sort_mem_locked),
+	DIM(SORT_MEM_TLB, "tlb", sort_mem_tlb),
+	DIM(SORT_MEM_LVL, "mem", sort_mem_lvl),
+	DIM(SORT_MEM_SNOOP, "snoop", sort_mem_snoop),
 };
 
 int sort_dimension__add(const char *tok)
@@ -520,7 +817,8 @@ int sort_dimension__add(const char *tok)
 			sort__has_parent = 1;
 		} else if (sd->entry == &sort_sym ||
 			   sd->entry == &sort_sym_from ||
-			   sd->entry == &sort_sym_to) {
+			   sd->entry == &sort_sym_to ||
+			   sd->entry == &sort_mem_daddr_sym) {
 			sort__has_sym = 1;
 		}
 
@@ -553,6 +851,20 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_DSO_TO;
 			else if (!strcmp(sd->name, "mispredict"))
 				sort__first_dimension = SORT_MISPREDICT;
+			else if (!strcmp(sd->name, "symbol_daddr"))
+				sort__first_dimension = SORT_MEM_DADDR_SYMBOL;
+			else if (!strcmp(sd->name, "dso_daddr"))
+				sort__first_dimension = SORT_MEM_DADDR_DSO;
+			else if (!strcmp(sd->name, "cost"))
+				sort__first_dimension = SORT_MEM_COST;
+			else if (!strcmp(sd->name, "locked"))
+				sort__first_dimension = SORT_MEM_LOCKED;
+			else if (!strcmp(sd->name, "tlb"))
+				sort__first_dimension = SORT_MEM_TLB;
+			else if (!strcmp(sd->name, "mem_lvl"))
+				sort__first_dimension = SORT_MEM_LVL;
+			else if (!strcmp(sd->name, "snoop"))
+				sort__first_dimension = SORT_MEM_SNOOP;
 		}
 
 		list_add_tail(&sd->entry->list, &hist_entry__sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 13761d8..175eb4c 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -102,7 +102,8 @@ struct hist_entry {
 	};
 	struct branch_info	*branch_info;
 	struct hists		*hists;
-	struct callchain_root	callchain[0];
+	struct mem_info		*mem_info;
+	struct callchain_root	callchain[0]; /* must be last member */
 };
 
 enum sort_type {
@@ -118,6 +119,13 @@ enum sort_type {
 	SORT_SYM_TO,
 	SORT_MISPREDICT,
 	SORT_SRCLINE,
+	SORT_MEM_DADDR_SYMBOL,
+	SORT_MEM_DADDR_DSO,
+	SORT_MEM_COST,
+	SORT_MEM_LOCKED,
+	SORT_MEM_TLB,
+	SORT_MEM_LVL,
+	SORT_MEM_SNOOP,
 };
 
 /*
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 863b05b..ee719eb 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -152,6 +152,13 @@ struct branch_info {
 	struct branch_flags flags;
 };
 
+struct mem_info {
+	struct addr_map_symbol iaddr;
+	struct addr_map_symbol daddr;
+	u64	cost;
+	union perf_mem_dsrc dsrc;
+};
+
 struct addr_location {
 	struct thread *thread;
 	struct map    *map;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 12/16] perf report: add support for mem access profiling
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (10 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 11/16] perf tools: add mem access sampling core support Stephane Eranian
@ 2012-11-05 13:50 ` Stephane Eranian
  2012-11-05 13:51 ` [PATCH v2 13/16] perf record: " Stephane Eranian
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This patch adds the --mem-mode option to perf report.

This mode requires a perf.data file created with memory
access samples.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-report.c |  131 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f07eae7..14cb7c1 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -46,6 +46,7 @@ struct perf_report {
 	bool			show_full_info;
 	bool			show_threads;
 	bool			inverted_callchain;
+	bool			mem_mode;
 	struct perf_read_values	show_threads_values;
 	const char		*pretty_printing_style;
 	symbol_filter_t		annotate_init;
@@ -54,6 +55,96 @@ struct perf_report {
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 };
 
+static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
+					   struct addr_location *al,
+					   struct perf_sample *sample,
+					   struct perf_evsel *evsel,
+					   struct machine *machine,
+					   union perf_event *event)
+{
+	struct perf_report *rep = container_of(tool, struct perf_report, tool);
+	struct symbol *parent = NULL;
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	int err = 0;
+	struct hist_entry *he;
+	struct mem_info *mi, *mx;
+	uint64_t cost;
+
+	if ((sort__has_parent || symbol_conf.use_callchain)
+	    && sample->callchain) {
+		err = machine__resolve_callchain(machine, evsel, al->thread,
+						 sample, &parent);
+		if (err)
+			return err;
+	}
+
+	mi = machine__resolve_mem(machine, al->thread, sample, cpumode);
+	if (!mi)
+		return -ENOMEM;
+
+	if (rep->hide_unresolved && !al->sym)
+		return 0;
+
+	cost = mi->cost;
+	if (!cost)
+		cost = 1;
+
+	/*
+	 * The report shows the percentage of total branches captured
+	 * and not events sampled. Thus we use a pseudo period of 1.
+	 * Only in the newt browser we are doing integrated annotation,
+	 * so we don't allocated the extra space needed because the stdio
+	 * code will not use it.
+	 */
+	he = __hists__add_mem_entry(&evsel->hists, al, parent, mi,
+				    cost);
+	if (!he)
+		return -ENOMEM;
+
+	if (sort__has_sym && he->ms.sym && use_browser > 0) {
+		struct annotation *notes = symbol__annotation(he->ms.sym);
+
+		assert(evsel != NULL);
+
+		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+			goto out;
+
+		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+		if (err)
+			goto out;
+	}
+
+	if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {
+		struct annotation *notes;
+
+		mx = he->mem_info;
+
+		notes = symbol__annotation(mx->daddr.sym);
+		if (!notes->src
+		    && symbol__alloc_hist(mx->daddr.sym) < 0)
+			goto out;
+
+		err = symbol__inc_addr_samples(mx->daddr.sym,
+					       mx->daddr.map,
+					       evsel->idx,
+					       mx->daddr.al_addr);
+		if (err)
+			goto out;
+	}
+
+	evsel->hists.stats.total_period += sample->period;
+	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+	err = 0;
+
+	if (symbol_conf.use_callchain) {
+		err = callchain_append(he->callchain,
+				       &callchain_cursor,
+				       sample->period);
+	}
+out:
+	return err;
+}
+
 static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 					struct addr_location *al,
 					struct perf_sample *sample,
@@ -209,6 +300,12 @@ static int process_sample_event(struct perf_tool *tool,
 			pr_debug("problem adding lbr entry, skipping event\n");
 			return -1;
 		}
+	} else if (rep->mem_mode == 1) {
+		if (perf_report__add_mem_hist_entry(tool, &al, sample,
+						    evsel, machine, event)) {
+			pr_debug("problem adding mem entry, skipping event\n");
+			return -1;
+		}
 	} else {
 		if (al.map != NULL)
 			al.map->dso->hit = 1;
@@ -292,7 +389,8 @@ static void sig_handler(int sig __maybe_unused)
 	session_done = 1;
 }
 
-static size_t hists__fprintf_nr_sample_events(struct hists *self,
+static size_t hists__fprintf_nr_sample_events(struct perf_report *rep,
+					      struct hists *self,
 					      const char *evname, FILE *fp)
 {
 	size_t ret;
@@ -305,7 +403,11 @@ static size_t hists__fprintf_nr_sample_events(struct hists *self,
 	if (evname != NULL)
 		ret += fprintf(fp, " of event '%s'", evname);
 
-	ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
+	if (rep->mem_mode) {
+		ret += fprintf(fp, "\n# Total cost : %" PRIu64, nr_events);
+		ret += fprintf(fp, "\n# Sort order : %s", sort_order);
+	} else
+		ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
 	return ret + fprintf(fp, "\n#\n");
 }
 
@@ -319,7 +421,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		struct hists *hists = &pos->hists;
 		const char *evname = perf_evsel__name(pos);
 
-		hists__fprintf_nr_sample_events(hists, evname, stdout);
+		hists__fprintf_nr_sample_events(rep, hists, evname, stdout);
 		hists__fprintf(hists, true, 0, 0, stdout);
 		fprintf(stdout, "\n\n");
 	}
@@ -595,7 +697,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Use the stdio interface"),
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-		   " dso_from, symbol_to, symbol_from, mispredict"),
+		   " dso_from, symbol_to, symbol_from, mispredict, mem, cost, symbol_daddr, dso_daddr, tlb, snoop, locked"),
 	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
 		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
@@ -641,6 +743,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "use branch records for histogram filling", parse_branch_mode),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
+	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
 	OPT_END()
 	};
 
@@ -692,6 +795,18 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 				     "dso_to,symbol_to";
 
 	}
+	if (report.mem_mode) {
+		if (sort__branch_mode == 1) {
+			fprintf(stderr, "branch and mem mode incompatible\n");
+			goto error;
+		}
+		/*
+		 * if no sort_order is provided, then specify
+		 * branch-mode specific order
+		 */
+		if (sort_order == default_sort_order)
+			sort_order = "cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
+	}
 
 	if (strcmp(input_name, "-") != 0)
 		setup_browser(true);
@@ -763,6 +878,14 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		sort_entry__setup_elide(&sort_sym_from, symbol_conf.sym_from_list, "sym_from", stdout);
 		sort_entry__setup_elide(&sort_sym_to, symbol_conf.sym_to_list, "sym_to", stdout);
 	} else {
+		if (report.mem_mode) {
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "symbol_daddr", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso_daddr", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "mem", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "cost", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "tlb", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "snoop", stdout);
+		}
 		sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
 		sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 13/16] perf record: add support for mem access profiling
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (11 preceding siblings ...)
  2012-11-05 13:50 ` [PATCH v2 12/16] perf report: add support for mem access profiling Stephane Eranian
@ 2012-11-05 13:51 ` Stephane Eranian
  2012-11-05 13:51 ` [PATCH v2 14/16] perf tools: add new mem command for memory " Stephane Eranian
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Add the -l option to perf record to enable sampling
access cost sampling.

Data address sampling is obtained via the -d option.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-record.c |    2 ++
 tools/perf/perf.h           |    1 +
 tools/perf/util/evsel.c     |    6 ++++++
 3 files changed, 9 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5783c32..8243405 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1041,6 +1041,8 @@ const struct option record_options[] = {
 	OPT_CALLBACK('j', "branch-filter", &record.opts.branch_stack,
 		     "branch filter mask", "branch stack filter modes",
 		     parse_branch_stack),
+	OPT_BOOLEAN('l', "cost", &record.opts.weight,
+		    "event cost"),
 	OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 469fbf2..8a39117 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -237,6 +237,7 @@ struct perf_record_opts {
 	bool	     sample_id_all_missing;
 	bool	     exclude_guest_missing;
 	bool	     period;
+	bool         weight;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int user_freq;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 951d4d1..65ca661 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -474,6 +474,12 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 	}
 
+	if (opts->weight)
+		attr->sample_type	|= PERF_SAMPLE_WEIGHT;
+
+	if (opts->sample_address)
+		attr->sample_type	|= PERF_SAMPLE_DSRC;
+
 	if (opts->no_delay) {
 		attr->watermark = 0;
 		attr->wakeup_events = 1;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (12 preceding siblings ...)
  2012-11-05 13:51 ` [PATCH v2 13/16] perf record: " Stephane Eranian
@ 2012-11-05 13:51 ` Stephane Eranian
  2012-11-06 15:44   ` Arnaldo Carvalho de Melo
  2012-11-05 13:51 ` [PATCH v2 15/16] perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP Stephane Eranian
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

This new command is a wrapper on top of perf record and
perf report to make it easier to configure for memory
access profiling.

To record loads:
$ perf mem -t load rec .....

To record stores:
$ perf mem -t store rec .....

To get the report:
$ perf mem -t load rep

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-mem.txt |   48 +++++++
 tools/perf/Makefile                   |    1 +
 tools/perf/builtin-mem.c              |  238 +++++++++++++++++++++++++++++++++
 tools/perf/builtin.h                  |    1 +
 tools/perf/command-list.txt           |    1 +
 tools/perf/perf.c                     |    1 +
 6 files changed, 290 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-mem.txt
 create mode 100644 tools/perf/builtin-mem.c

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
new file mode 100644
index 0000000..888d511
--- /dev/null
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -0,0 +1,48 @@
+perf-mem(1)
+===========
+
+NAME
+----
+perf-mem - Profile memory accesses
+
+SYNOPSIS
+--------
+[verse]
+'perf mem' [<options>] (record [<command>] | report)
+
+DESCRIPTION
+-----------
+"perf mem -t <TYPE> record" runs a command and gathers memory operation data
+from it, into perf.data. Perf record options are accepted and are passed through.
+
+"perf mem -t <TYPE> report" displays the result. It invokes perf report with the
+right set of options to display a memory access profile.
+
+OPTIONS
+-------
+<command>...::
+	Any command you can specify in a shell.
+
+-t::
+--type=::
+	Select the memory operation type: load or store (default: load)
+
+-D::
+--dump-raw-samples=::
+	Dump the raw decoded samples on the screen in a format that is easy to parse with
+	one sample per line.
+
+-x::
+--field-separator::
+	Specify the field separator used when dump raw samples (-D option). By default,
+	The separator is the space character.
+
+-C::
+--cpu-list::
+	Restrict dump of raw samples to those provided via this option. Note that the same
+	option can be passed in record mode. It will be interpreted the same way as perf
+	record.
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-report[1]
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7e25f59..c42b862 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -461,6 +461,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
 BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
 BUILTIN_OBJS += $(OUTPUT)builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
+BUILTIN_OBJS += $(OUTPUT)builtin-mem.o
 
 PERFLIBS = $(LIB_FILE) $(LIBTRACEEVENT)
 
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
new file mode 100644
index 0000000..c2980b8
--- /dev/null
+++ b/tools/perf/builtin-mem.c
@@ -0,0 +1,238 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/tool.h"
+#include "util/session.h"
+
+#define MEM_OPERATION_LOAD	"load"
+#define MEM_OPERATION_STORE	"store"
+
+static const char	*mem_operation		= MEM_OPERATION_LOAD;
+
+struct perf_mem {
+	struct perf_tool	tool;
+	char const		*input_name;
+	symbol_filter_t		annotate_init;
+	bool			hide_unresolved;
+	bool			dump_raw;
+	const char		*cpu_list;
+	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+};
+
+static const char * const mem_usage[] = {
+	"perf mem [<options>] {record <command> |report}",
+	NULL
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+	char event[64];
+	int ret;
+
+	rec_argc = argc + 4;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	if (!rec_argv)
+		return -1;
+
+	rec_argv[i++] = strdup("record");
+	if (!strcmp(mem_operation, MEM_OPERATION_LOAD))
+		rec_argv[i++] = strdup("-l");
+	rec_argv[i++] = strdup("-d");
+	rec_argv[i++] = strdup("-e");
+
+	if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+		sprintf(event, "cpu/mem-stores/pp");
+	else
+		sprintf(event, "cpu/mem-loads/pp");
+
+	rec_argv[i++] = strdup(event);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	ret = cmd_record(i, rec_argv, NULL);
+	free(rec_argv);
+	return ret;
+}
+
+static int
+dump_raw_samples(struct perf_tool *tool,
+		 union perf_event *event,
+		 struct perf_sample *sample,
+		 struct perf_evsel *evsel __maybe_unused,
+		 struct machine *machine)
+{
+	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
+	struct addr_location al;
+	const char *fmt;
+
+	if (perf_event__preprocess_sample(event, machine, &al, sample,
+				mem->annotate_init) < 0) {
+		fprintf(stderr, "problem processing %d event, skipping it.\n",
+				event->header.type);
+		return -1;
+	}
+
+	if (al.filtered || (mem->hide_unresolved && al.sym == NULL))
+		return 0;
+
+	if (al.map != NULL)
+		al.map->dso->hit = 1;
+
+	if (symbol_conf.field_sep) {
+		fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64
+		      "%s0x%"PRIx64"%s%s:%s\n";
+	} else {
+		fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
+		      "%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
+		symbol_conf.field_sep = " ";
+	}
+
+	printf( fmt,
+		sample->pid,
+		symbol_conf.field_sep,
+		sample->tid,
+		symbol_conf.field_sep,
+		event->ip.ip,
+		symbol_conf.field_sep,
+		sample->addr,
+		symbol_conf.field_sep,
+		sample->weight,
+		symbol_conf.field_sep,
+		sample->dsrc,
+		symbol_conf.field_sep,
+		al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
+		al.sym ? al.sym->name : "???");
+
+	return 0;
+}
+
+static int process_sample_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+                                struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	return dump_raw_samples(tool, event, sample, evsel, machine);
+}
+
+static int report_raw_events(struct perf_mem *mem)
+{
+	int err = -EINVAL;
+	int ret;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY,
+							 0, false, &mem->tool);
+
+	if (session == NULL)
+		return -ENOMEM;
+
+	if (mem->cpu_list) {
+		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
+					       mem->cpu_bitmap);
+		if (ret)
+			goto out_delete;
+	}
+
+	if (symbol__init() < 0)
+		return -1;
+
+	printf("# PID, TID, IP, ADDR, COST, DSRC, SYMBOL\n");
+
+	err = perf_session__process_events(session, &mem->tool);
+	if (err)
+		return err;
+
+	return 0;
+
+out_delete:
+	perf_session__delete(session);
+	return err;
+}
+
+static int report_events(int argc, const char **argv, struct perf_mem *mem)
+{
+	const char **rep_argv;
+	int ret, i = 0, j, rep_argc;
+
+	if (mem->dump_raw)
+		return report_raw_events(mem);
+
+	rep_argc = argc + 3;
+	rep_argv = calloc(rep_argc + 1, sizeof(char *));
+	if (!rep_argv)
+		return -1;
+
+	rep_argv[i++] = strdup("report");
+	rep_argv[i++] = strdup("--mem-mode");
+	rep_argv[i++] = strdup("-n"); /* display number of samples */
+
+	/*
+	 * there is no cost associated with stores, so don't print
+	 * the column
+	 */
+	if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+		rep_argv[i++] = strdup("--sort=mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked");
+
+	for (j = 1; j < argc; j++, i++)
+		rep_argv[i] = argv[j];
+
+	ret = cmd_report(i, rep_argv, NULL);
+	free(rep_argv);
+	return ret;
+}
+
+int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+	struct stat st;
+	struct perf_mem mem = {
+		.tool = {
+			.sample		= process_sample_event,
+			.mmap		= perf_event__process_mmap,
+			.comm		= perf_event__process_comm,
+			.lost		= perf_event__process_lost,
+			.fork		= perf_event__process_fork,
+			.build_id	= perf_event__process_build_id,
+			.ordered_samples= true,
+		},
+		.input_name		 = "perf.data",
+	};
+	const struct option mem_options[] = {
+		OPT_STRING('t', "type", &mem_operation,
+			   "type", "memory operations(load/store)"),
+		OPT_BOOLEAN('D', "dump-raw-samples", &mem.dump_raw,
+			    "dump raw samples in ASCII"),
+		OPT_BOOLEAN('U', "hide-unresolved", &mem.hide_unresolved,
+			    "Only display entries resolved to a symbol"),
+		OPT_STRING('i', "input", &input_name, "file", "input file name"),
+		OPT_STRING('C', "cpu", &mem.cpu_list, "cpu", "list of cpus to profile"),
+		OPT_STRING('x', "field-separator", &symbol_conf.field_sep, "separator",
+			   "separator for columns, no spaces will be added between "
+			   "columns '.' is reserved."),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, mem_options, mem_usage,
+                        PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!argc || !(strncmp(argv[0], "rec", 3) || mem_operation))
+		usage_with_options(mem_usage, mem_options);
+
+	if (!mem.input_name || !strlen(mem.input_name)) {
+		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
+			mem.input_name = "-";
+		else
+			mem.input_name = "perf.data";
+	}
+
+        if (!strncmp(argv[0], "rec", 3))
+		return __cmd_record(argc, argv);
+	else if (!strncmp(argv[0], "rep", 3))
+		return report_events(argc, argv, &mem);
+	else
+		usage_with_options(mem_usage, mem_options);
+
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 08143bd..b210d62 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -36,6 +36,7 @@ extern int cmd_kvm(int argc, const char **argv, const char *prefix);
 extern int cmd_test(int argc, const char **argv, const char *prefix);
 extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_inject(int argc, const char **argv, const char *prefix);
+extern int cmd_mem(int argc, const char **argv, const char *prefix);
 
 extern int find_scripts(char **scripts_array, char **scripts_path_array);
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 3e86bbd..2c5b621 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -24,3 +24,4 @@ perf-kmem			mainporcelain common
 perf-lock			mainporcelain common
 perf-kvm			mainporcelain common
 perf-test			mainporcelain common
+perf-mem			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index e968373..ccdf32d 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -60,6 +60,7 @@ static struct cmd_struct commands[] = {
 	{ "trace",	cmd_trace,	0 },
 #endif
 	{ "inject",	cmd_inject,	0 },
+	{ "mem",	cmd_mem,	0 },
 };
 
 struct pager_config {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 15/16] perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (13 preceding siblings ...)
  2012-11-05 13:51 ` [PATCH v2 14/16] perf tools: add new mem command for memory " Stephane Eranian
@ 2012-11-05 13:51 ` Stephane Eranian
  2012-11-05 13:51 ` [PATCH v2 16/16] perf tools: detect data vs. text mappings Stephane Eranian
  2012-11-06 20:52 ` [PATCH v2 00/16] perf: add memory access sampling support Arnaldo Carvalho de Melo
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Type of mapping was lost and made it hard for a tool
to distinguish code vs. data mmaps. Perf has the ability
to distinguish the two.

Use a bit in the header->misc bitmask to keep track of
the mmap type. If PERF_RECORD_MISC_MMAP_DATA is set then
the mapping is not executable (!VM_EXEC). If not set, then
the mapping is executable.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/uapi/linux/perf_event.h |    1 +
 kernel/events/core.c            |    3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b2a78bb..0fa3feb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -446,6 +446,7 @@ struct perf_event_mmap_page {
 #define PERF_RECORD_MISC_GUEST_KERNEL		(4 << 0)
 #define PERF_RECORD_MISC_GUEST_USER		(5 << 0)
 
+#define PERF_RECORD_MISC_MMAP_DATA		(1 << 13)
 /*
  * Indicates that the content of PERF_SAMPLE_IP points to
  * the actual instruction that triggered the event. See also
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94b2c70..e7318a7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4764,6 +4764,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
 	mmap_event->file_name = name;
 	mmap_event->file_size = size;
 
+	if (!(vma->vm_flags & VM_EXEC))
+		mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+
 	mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
 
 	rcu_read_lock();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 16/16] perf tools: detect data vs. text mappings
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (14 preceding siblings ...)
  2012-11-05 13:51 ` [PATCH v2 15/16] perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP Stephane Eranian
@ 2012-11-05 13:51 ` Stephane Eranian
  2012-11-06 20:52 ` [PATCH v2 00/16] perf: add memory access sampling support Arnaldo Carvalho de Melo
  16 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-05 13:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, namhyung.kim

Leverages the PERF_RECORD_MISC_MMAP_DATA bit in
the RECORD_MMAP record header. When the bit is set
then the mapping type is set to MAP__VARIABLE.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/machine.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 502eec0..1a88b9c 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -192,6 +192,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
 	struct thread *thread;
 	struct map *map;
+	enum map_type type;
 	int ret = 0;
 
 	if (dump_trace)
@@ -208,10 +209,17 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
 	thread = machine__findnew_thread(machine, event->mmap.pid);
 	if (thread == NULL)
 		goto out_problem;
+
+	if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
+		type = MAP__VARIABLE;
+	else 
+		type = MAP__FUNCTION;
+
 	map = map__new(&machine->user_dsos, event->mmap.start,
 			event->mmap.len, event->mmap.pgoff,
 			event->mmap.pid, event->mmap.filename,
-			MAP__FUNCTION);
+			type);
+
 	if (map == NULL)
 		goto out_problem;
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/16] perf, core: Add a concept of a weightened sample
  2012-11-05 13:50 ` [PATCH v2 03/16] perf, core: Add a concept of a weightened sample Stephane Eranian
@ 2012-11-05 20:01   ` Arnaldo Carvalho de Melo
  2012-11-05 20:07     ` Arnaldo Carvalho de Melo
  2012-11-05 22:51     ` Andi Kleen
  0 siblings, 2 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-05 20:01 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, jolsa, namhyung.kim

Em Mon, Nov 05, 2012 at 02:50:50PM +0100, Stephane Eranian escreveu:
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
>  #define PERF_ATTR_SIZE_VER0	64	/* sizeof first published struct */
> @@ -559,6 +562,7 @@ enum perf_event_type {
>  	 *	{ u64			stream_id;} && PERF_SAMPLE_STREAM_ID
>  	 *	{ u32			cpu, res; } && PERF_SAMPLE_CPU
>  	 *	{ u64			period;   } && PERF_SAMPLE_PERIOD
> +	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
>  	 *
>  	 *	{ struct read_format	values;	  } && PERF_SAMPLE_READ

This comes after PERF_SAMPLE_STACK_USER so that perf_evsel__parse_sample
can get it right, isn't it? Or is just the comment wrong?

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dbccf83..d633581 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event *event)
>  	if (sample_type & PERF_SAMPLE_PERIOD)
>  		size += sizeof(data->period);
>  
> +	if (sample_type & PERF_SAMPLE_WEIGHT)
> +		size += sizeof(data->weight);

Here it doesn't matter, we're just computing the sample size, but would
be nice to add this just after the (sample_type &
PERF_SAMPLE_STACK_USER) branch.

> +
>  	if (sample_type & PERF_SAMPLE_READ)
>  		size += event->read_size;
>  
> @@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
>  	if (sample_type & PERF_SAMPLE_PERIOD)
>  		perf_output_put(handle, data->period);
>  
> +	if (sample_type & PERF_SAMPLE_WEIGHT)
> +		perf_output_put(handle, data->weight);
> +

Yeap, it should go after PERF_SAMPLE_STACK_USER

>  	if (sample_type & PERF_SAMPLE_READ)
>  		perf_output_read(handle, event);
>  
> -- 
> 1.7.9.5

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/16] perf, core: Add a concept of a weightened sample
  2012-11-05 20:01   ` Arnaldo Carvalho de Melo
@ 2012-11-05 20:07     ` Arnaldo Carvalho de Melo
  2012-11-05 22:51     ` Andi Kleen
  1 sibling, 0 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-05 20:07 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, jolsa, namhyung.kim

Em Mon, Nov 05, 2012 at 05:01:47PM -0300, Arnaldo Carvalho de Melo escreveu:
> > +
> >  	if (sample_type & PERF_SAMPLE_READ)
> >  		size += event->read_size;
> >  
> > @@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
> >  	if (sample_type & PERF_SAMPLE_PERIOD)
> >  		perf_output_put(handle, data->period);
> >  
> > +	if (sample_type & PERF_SAMPLE_WEIGHT)
> > +		perf_output_put(handle, data->weight);
> > +
> 
> Yeap, it should go after PERF_SAMPLE_STACK_USER
> 
> >  	if (sample_type & PERF_SAMPLE_READ)
> >  		perf_output_read(handle, event);
> >  

I mean this, on top of the above patch:

diff -u b/kernel/events/core.c b/kernel/events/core.c
--- b/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4083,9 +4083,6 @@
 	if (sample_type & PERF_SAMPLE_PERIOD)
 		perf_output_put(handle, data->period);
 
-	if (sample_type & PERF_SAMPLE_WEIGHT)
-		perf_output_put(handle, data->weight);
-
 	if (sample_type & PERF_SAMPLE_READ)
 		perf_output_read(handle, event);
 
@@ -4175,6 +4172,9 @@
 		perf_output_sample_ustack(handle,
 					  data->stack_user_size,
 					  data->regs_user.regs);
+
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		perf_output_put(handle, data->weight);
 }
 
 void perf_prepare_sample(struct perf_event_header *header,

------------

To match what this patch series will do on a later patch, i.e.
parse the weight at the end of perf_evsel__parse_sample, after
it parses PERF_SAMPLE_READ, PERF_SAMPLE_CALLCHAIN, PERF_SAMPLE_RAW,
PERF_SAMPLE_BRANCH_STACK, PERF_SAMPLE_REGS_USER and
PERF_SAMPLE_STACK_USER:

+++ b/tools/perf/util/evsel.c
@@ -1020,6 +1020,11 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
                }
        }

+       if (type & PERF_SAMPLE_WEIGHT) {
+               data->weight= *array;
+               array++;
+       }
+
        return 0;
 }

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 03/16] perf, core: Add a concept of a weightened sample
  2012-11-05 20:01   ` Arnaldo Carvalho de Melo
  2012-11-05 20:07     ` Arnaldo Carvalho de Melo
@ 2012-11-05 22:51     ` Andi Kleen
  1 sibling, 0 replies; 38+ messages in thread
From: Andi Kleen @ 2012-11-05 22:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, linux-kernel, peterz, mingo, jolsa, namhyung.kim

> > +
> >  	if (sample_type & PERF_SAMPLE_READ)
> >  		size += event->read_size;
> >  
> > @@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
> >  	if (sample_type & PERF_SAMPLE_PERIOD)
> >  		perf_output_put(handle, data->period);
> >  
> > +	if (sample_type & PERF_SAMPLE_WEIGHT)
> > +		perf_output_put(handle, data->weight);
> > +
> 
> Yeap, it should go after PERF_SAMPLE_STACK_USER

There was a bug I recently noticed but haven't investigated so far.

When I do -b -W --transaction the flags/weights get corrupted.

May be related to that.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-05 13:50 ` [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
@ 2012-11-06 13:31   ` Andi Kleen
  2012-11-06 14:29     ` Stephane Eranian
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2012-11-06 13:31 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, acme, jolsa, namhyung.kim

> +EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
> +EVENT_ATTR(instructions,		INSTRUCTIONS		);
> +EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
> +EVENT_ATTR(cache-misses,		CACHE_MISSES		);
> +EVENT_ATTR(branch-instructions,		BRANCH_INSTRUCTIONS	);
> +EVENT_ATTR(branch-misses,		BRANCH_MISSES		);
> +EVENT_ATTR(bus-cycles,			BUS_CYCLES		);
> +EVENT_ATTR(stalled-cycles-frontend,	STALLED_CYCLES_FRONTEND	);
> +EVENT_ATTR(stalled-cycles-backend,	STALLED_CYCLES_BACKEND	);
> +EVENT_ATTR(ref-cycles,			REF_CPU_CYCLES		);

The merge_events() approach from the Haswell patches should be far cleaner

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-06 13:31   ` Andi Kleen
@ 2012-11-06 14:29     ` Stephane Eranian
  2012-11-06 18:50       ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-06 14:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On Tue, Nov 6, 2012 at 2:31 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> +EVENT_ATTR(cpu-cycles,                       CPU_CYCLES              );
>> +EVENT_ATTR(instructions,             INSTRUCTIONS            );
>> +EVENT_ATTR(cache-references,         CACHE_REFERENCES        );
>> +EVENT_ATTR(cache-misses,             CACHE_MISSES            );
>> +EVENT_ATTR(branch-instructions,              BRANCH_INSTRUCTIONS     );
>> +EVENT_ATTR(branch-misses,            BRANCH_MISSES           );
>> +EVENT_ATTR(bus-cycles,                       BUS_CYCLES              );
>> +EVENT_ATTR(stalled-cycles-frontend,  STALLED_CYCLES_FRONTEND );
>> +EVENT_ATTR(stalled-cycles-backend,   STALLED_CYCLES_BACKEND  );
>> +EVENT_ATTR(ref-cycles,                       REF_CPU_CYCLES          );
>
> The merge_events() approach from the Haswell patches should be far cleaner
>
And which patch in your HSW series implements this?

> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-05 13:51 ` [PATCH v2 14/16] perf tools: add new mem command for memory " Stephane Eranian
@ 2012-11-06 15:44   ` Arnaldo Carvalho de Melo
  2012-11-06 15:49     ` Stephane Eranian
  2012-11-06 15:50     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 15:44 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, jolsa, namhyung.kim

Em Mon, Nov 05, 2012 at 02:51:01PM +0100, Stephane Eranian escreveu:
> +
> +	if (strcmp(mem_operation, MEM_OPERATION_LOAD))
> +		sprintf(event, "cpu/mem-stores/pp");
> +	else
> +		sprintf(event, "cpu/mem-loads/pp");
> +

Fails for me on a Sandy Bridge notebook:

[root@sandy ~]# perf mem -t store record -a
Error:
'precise' request may not be supported. Try removing 'p' modifier

So if I manually fall back to a less precise mode:

[root@sandy ~]# perf record -g -a -e cpu/mem-stores/p
Error:
'precise' request may not be supported. Try removing 'p' modifier
[root@sandy ~]#

Still can't, manually fall back a one more level:

[root@sandy ~]# perf record -g -a -e cpu/mem-stores/
^C[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 7.419 MB perf.data (~324160 samples) ]

Yay, got some numbers.

smpboot: CPU0: Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz (fam: 06, model: 2a, stepping: 07)
Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, Intel PMU driver.
perf_event_intel: PEBS disabled due to CPU errata, please upgrade microcode
... version:                3
... bit width:              48
... generic registers:      4
... value mask:             0000ffffffffffff
... max period:             000000007fffffff
... fixed-purpose events:   3
... event mask:             000000070000000f
Disabled fast string operations
NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

Do you have any test program you use to test if the results make sense that you
could share?

I could then try to shape it into a new 'perf test' entry that would test
if requesting this event and then doing synthetic testing would produce 
the experted numbers or in a acceptable range.

One suggestion for such new features, examples of use with the resulting output
looks great on changeset logs :-)

Ah, I recall seeing some, but that was on the patchset cover letter,
that doesn't get inserted in the git changelog history, will look at those
now.

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 15:44   ` Arnaldo Carvalho de Melo
@ 2012-11-06 15:49     ` Stephane Eranian
  2012-11-06 16:51       ` Arnaldo Carvalho de Melo
  2012-11-06 15:50     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-06 15:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Namhyung Kim

On Tue, Nov 6, 2012 at 4:44 PM, Arnaldo Carvalho de Melo
<acme@redhat.com> wrote:
> Em Mon, Nov 05, 2012 at 02:51:01PM +0100, Stephane Eranian escreveu:
>> +
>> +     if (strcmp(mem_operation, MEM_OPERATION_LOAD))
>> +             sprintf(event, "cpu/mem-stores/pp");
>> +     else
>> +             sprintf(event, "cpu/mem-loads/pp");
>> +
>
> Fails for me on a Sandy Bridge notebook:
>
That's because you need to update the microcode.

$ sudo modprobe microcode
$ cat /proc/cpuinfo

Line with microcode needs to say:
microcode : 0x28

update the microcode package or install directly from
intel download center.

> [root@sandy ~]# perf mem -t store record -a
> Error:
> 'precise' request may not be supported. Try removing 'p' modifier
>
> So if I manually fall back to a less precise mode:
>
> [root@sandy ~]# perf record -g -a -e cpu/mem-stores/p
> Error:
> 'precise' request may not be supported. Try removing 'p' modifier
> [root@sandy ~]#
>
> Still can't, manually fall back a one more level:
>
> [root@sandy ~]# perf record -g -a -e cpu/mem-stores/
> ^C[ perf record: Woken up 25 times to write data ]
> [ perf record: Captured and wrote 7.419 MB perf.data (~324160 samples) ]
>
> Yay, got some numbers.
>
> smpboot: CPU0: Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz (fam: 06, model: 2a, stepping: 07)
> Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, Intel PMU driver.
> perf_event_intel: PEBS disabled due to CPU errata, please upgrade microcode
> ... version:                3
> ... bit width:              48
> ... generic registers:      4
> ... value mask:             0000ffffffffffff
> ... max period:             000000007fffffff
> ... fixed-purpose events:   3
> ... event mask:             000000070000000f
> Disabled fast string operations
> NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
>
> Do you have any test program you use to test if the results make sense that you
> could share?
>
> I could then try to shape it into a new 'perf test' entry that would test
> if requesting this event and then doing synthetic testing would produce
> the experted numbers or in a acceptable range.
>
> One suggestion for such new features, examples of use with the resulting output
> looks great on changeset logs :-)
>
> Ah, I recall seeing some, but that was on the patchset cover letter,
> that doesn't get inserted in the git changelog history, will look at those
> now.
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 15:44   ` Arnaldo Carvalho de Melo
  2012-11-06 15:49     ` Stephane Eranian
@ 2012-11-06 15:50     ` Arnaldo Carvalho de Melo
  2012-11-06 15:57       ` Stephane Eranian
  1 sibling, 1 reply; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 15:50 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, jolsa, namhyung.kim

Em Tue, Nov 06, 2012 at 12:44:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> [root@sandy ~]# perf record -g -a -e cpu/mem-stores/
> ^C[ perf record: Woken up 25 times to write data ]
> [ perf record: Captured and wrote 7.419 MB perf.data (~324160 samples) ]
> 
> Yay, got some numbers.

But then the results out of:

$ perf mem -t load rep --stdio

Are bogus at least on the callchains:

# ========
# captured on: Tue Nov  6 12:46:21 2012
# hostname : sandy.ghostprotocols.net
# os release : 3.7.0-rc2+
# perf version : 3.7.rc4.gfaa41f
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz
# cpuid : GenuineIntel,6,42,7
# total memory : 16220228 kB
# cmdline : /home/acme/bin/perf record -g -a -e cpu/mem-stores/ 
# event : name = cpu/mem-stores/, type = 4, config = 0x2cd, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_ho
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, uncore_cbox_2 = 8, uncore_cbox_3 
# ========
#
# Samples: 98  of event 'cpu/mem-stores/'
# Total cost : 98
# Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead      Samples     Cost             Memory access                                          Symbol               Share
# ........  ...........  .......  ........................  ..............................................  ..................
#
    19.39%           19      N/A                            [k] csd_unlock                                  [kernel.kallsyms] 
            |
            --- csd_unlock
               |          
               |--6242.11%-- generic_smp_call_function_single_interrupt
               |          smp_call_function_single_interrupt
               |          call_function_single_interrupt
               |          cpuidle_enter
               |          cpuidle_enter_state
               |          cpuidle_idle_call
               |          cpu_idle
               |          |          
               |          |--85.08%-- start_secondary
               |          |          
               |           --14.92%-- rest_init
               |                     start_kernel
               |                     x86_64_start_reservations
               |                     x86_64_start_kernel
               |          
               |--100.00%-- smp_call_function_single_interrupt
               |          call_function_single_interrupt
               |          cpuidle_enter
               |          cpuidle_enter_state
               |          cpuidle_idle_call
               |          cpu_idle
               |          start_secondary
                --97088126703734472704.00%-- [...]

     5.10%            5      N/A                            [k] _raw_spin_lock_irqsave                      [kernel.kallsyms] 
            |



Ideas?

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 15:50     ` Arnaldo Carvalho de Melo
@ 2012-11-06 15:57       ` Stephane Eranian
  2012-11-06 17:07         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-06 15:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Namhyung Kim

On Tue, Nov 6, 2012 at 4:50 PM, Arnaldo Carvalho de Melo
<acme@ghostprotocols.net> wrote:
> Em Tue, Nov 06, 2012 at 12:44:46PM -0300, Arnaldo Carvalho de Melo escreveu:
>> [root@sandy ~]# perf record -g -a -e cpu/mem-stores/
>> ^C[ perf record: Woken up 25 times to write data ]
>> [ perf record: Captured and wrote 7.419 MB perf.data (~324160 samples) ]
>>
>> Yay, got some numbers.
>
> But then the results out of:
>
> $ perf mem -t load rep --stdio
>
> Are bogus at least on the callchains:
>
I think it's because we are not using the period in the hist_entry but
the cost and there
is no cost for stores.

I think you should start with loads and no callchain. I am interested
in your toughts on how to
fix the data symbol resolution problem in perf. In V2, I modified the
kernel to at least
help perf distinguish MAP__FUNCTION from MAP__VARIABLE. But there is
still something
wrong.


> # ========
> # captured on: Tue Nov  6 12:46:21 2012
> # hostname : sandy.ghostprotocols.net
> # os release : 3.7.0-rc2+
> # perf version : 3.7.rc4.gfaa41f
> # arch : x86_64
> # nrcpus online : 8
> # nrcpus avail : 8
> # cpudesc : Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz
> # cpuid : GenuineIntel,6,42,7
> # total memory : 16220228 kB
> # cmdline : /home/acme/bin/perf record -g -a -e cpu/mem-stores/
> # event : name = cpu/mem-stores/, type = 4, config = 0x2cd, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_ho
> # HEADER_CPU_TOPOLOGY info available, use -I to display
> # HEADER_NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, uncore_cbox_2 = 8, uncore_cbox_3
> # ========
> #
> # Samples: 98  of event 'cpu/mem-stores/'
> # Total cost : 98
> # Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
> #
> # Overhead      Samples     Cost             Memory access                                          Symbol               Share
> # ........  ...........  .......  ........................  ..............................................  ..................
> #
>     19.39%           19      N/A                            [k] csd_unlock                                  [kernel.kallsyms]
>             |
>             --- csd_unlock
>                |
>                |--6242.11%-- generic_smp_call_function_single_interrupt
>                |          smp_call_function_single_interrupt
>                |          call_function_single_interrupt
>                |          cpuidle_enter
>                |          cpuidle_enter_state
>                |          cpuidle_idle_call
>                |          cpu_idle
>                |          |
>                |          |--85.08%-- start_secondary
>                |          |
>                |           --14.92%-- rest_init
>                |                     start_kernel
>                |                     x86_64_start_reservations
>                |                     x86_64_start_kernel
>                |
>                |--100.00%-- smp_call_function_single_interrupt
>                |          call_function_single_interrupt
>                |          cpuidle_enter
>                |          cpuidle_enter_state
>                |          cpuidle_idle_call
>                |          cpu_idle
>                |          start_secondary
>                 --97088126703734472704.00%-- [...]
>
>      5.10%            5      N/A                            [k] _raw_spin_lock_irqsave                      [kernel.kallsyms]
>             |
>
>
>
> Ideas?
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 15:49     ` Stephane Eranian
@ 2012-11-06 16:51       ` Arnaldo Carvalho de Melo
  2012-11-06 17:05         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 16:51 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Namhyung Kim

Em Tue, Nov 06, 2012 at 04:49:26PM +0100, Stephane Eranian escreveu:
> On Tue, Nov 6, 2012 at 4:44 PM, Arnaldo Carvalho de Melo
> <acme@redhat.com> wrote:
> > Em Mon, Nov 05, 2012 at 02:51:01PM +0100, Stephane Eranian escreveu:
> >> +
> >> +     if (strcmp(mem_operation, MEM_OPERATION_LOAD))
> >> +             sprintf(event, "cpu/mem-stores/pp");
> >> +     else
> >> +             sprintf(event, "cpu/mem-loads/pp");

> > Fails for me on a Sandy Bridge notebook:

> That's because you need to update the microcode.

> $ sudo modprobe microcode
> $ cat /proc/cpuinfo
> 
> Line with microcode needs to say:
> microcode : 0x28

Here it says:

microcode	: 0x25
 
> update the microcode package or install directly from
> intel download center.

Thanks for the info, will do. And they will add "microcode hints for
selected processors should give that info to the user" to my TODO list
:-)

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 16:51       ` Arnaldo Carvalho de Melo
@ 2012-11-06 17:05         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 17:05 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Namhyung Kim

Em Tue, Nov 06, 2012 at 01:51:19PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Nov 06, 2012 at 04:49:26PM +0100, Stephane Eranian escreveu:
> > On Tue, Nov 6, 2012 at 4:44 PM, Arnaldo Carvalho de Melo
> > <acme@redhat.com> wrote:
> > > Em Mon, Nov 05, 2012 at 02:51:01PM +0100, Stephane Eranian escreveu:
> > >> +             sprintf(event, "cpu/mem-loads/pp");

> > > Fails for me on a Sandy Bridge notebook:

> > That's because you need to update the microcode.
> 
> > $ sudo modprobe microcode
> > $ cat /proc/cpuinfo

> > Line with microcode needs to say:
> > microcode : 0x28

> Here it says:

> microcode	: 0x25

Updated it and now I have 0x28, better:

Samples: 8K of event 'cpu/mem-loads/pp', Event count (approx.): 89255
  0.75%  1  665  L3 hit         [.] arena_run_tree_remove
  0.67%  1  602  LFB hit        [.] nsSMILAnimationController::DoSample(bool)
  0.63%  1  563  Local RAM hit  [.] 0x00000000004620c4
  0.58%  1  515  Local RAM hit  [.] 0x00007f6b278aa7bc
  0.54%  1  479  L2 hit         [.] XPending
  0.33%  1  292  Local RAM hit  [.] nsSVGFEMorphologyElement::Filter(nsSVGFilterInstance*, nsTArray<

Will do more tests after lunch,

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 14/16] perf tools: add new mem command for memory access profiling
  2012-11-06 15:57       ` Stephane Eranian
@ 2012-11-06 17:07         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 17:07 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, Peter Zijlstra, mingo, ak, Jiri Olsa, Namhyung Kim

Em Tue, Nov 06, 2012 at 04:57:59PM +0100, Stephane Eranian escreveu:
> On Tue, Nov 6, 2012 at 4:50 PM, Arnaldo Carvalho de Melo
> <acme@ghostprotocols.net> wrote:
> > Em Tue, Nov 06, 2012 at 12:44:46PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> [root@sandy ~]# perf record -g -a -e cpu/mem-stores/
> >> ^C[ perf record: Woken up 25 times to write data ]
> >> [ perf record: Captured and wrote 7.419 MB perf.data (~324160 samples) ]
> >>
> >> Yay, got some numbers.
> >
> > But then the results out of:
> >
> > $ perf mem -t load rep --stdio
> >
> > Are bogus at least on the callchains:
> >
> I think it's because we are not using the period in the hist_entry but
> the cost and there
> is no cost for stores.
> 
> I think you should start with loads and no callchain. I am interested
> in your toughts on how to
> fix the data symbol resolution problem in perf. In V2, I modified the
> kernel to at least
> help perf distinguish MAP__FUNCTION from MAP__VARIABLE. But there is
> still something
> wrong.

Yeah, I saw that, now that I'm playing with this patchset I'll work on
that, after lunch.
 
> 
> > # ========
> > # captured on: Tue Nov  6 12:46:21 2012
> > # hostname : sandy.ghostprotocols.net
> > # os release : 3.7.0-rc2+
> > # perf version : 3.7.rc4.gfaa41f
> > # arch : x86_64
> > # nrcpus online : 8
> > # nrcpus avail : 8
> > # cpudesc : Intel(R) Core(TM) i7-2920XM CPU @ 2.50GHz
> > # cpuid : GenuineIntel,6,42,7
> > # total memory : 16220228 kB
> > # cmdline : /home/acme/bin/perf record -g -a -e cpu/mem-stores/
> > # event : name = cpu/mem-stores/, type = 4, config = 0x2cd, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_ho
> > # HEADER_CPU_TOPOLOGY info available, use -I to display
> > # HEADER_NUMA_TOPOLOGY info available, use -I to display
> > # pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7, uncore_cbox_2 = 8, uncore_cbox_3
> > # ========
> > #
> > # Samples: 98  of event 'cpu/mem-stores/'
> > # Total cost : 98
> > # Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
> > #
> > # Overhead      Samples     Cost             Memory access                                          Symbol               Share
> > # ........  ...........  .......  ........................  ..............................................  ..................
> > #
> >     19.39%           19      N/A                            [k] csd_unlock                                  [kernel.kallsyms]
> >             |
> >             --- csd_unlock
> >                |
> >                |--6242.11%-- generic_smp_call_function_single_interrupt
> >                |          smp_call_function_single_interrupt
> >                |          call_function_single_interrupt
> >                |          cpuidle_enter
> >                |          cpuidle_enter_state
> >                |          cpuidle_idle_call
> >                |          cpu_idle
> >                |          |
> >                |          |--85.08%-- start_secondary
> >                |          |
> >                |           --14.92%-- rest_init
> >                |                     start_kernel
> >                |                     x86_64_start_reservations
> >                |                     x86_64_start_kernel
> >                |
> >                |--100.00%-- smp_call_function_single_interrupt
> >                |          call_function_single_interrupt
> >                |          cpuidle_enter
> >                |          cpuidle_enter_state
> >                |          cpuidle_idle_call
> >                |          cpu_idle
> >                |          start_secondary
> >                 --97088126703734472704.00%-- [...]
> >
> >      5.10%            5      N/A                            [k] _raw_spin_lock_irqsave                      [kernel.kallsyms]
> >             |
> >
> >
> >
> > Ideas?
> >
> > - Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-06 14:29     ` Stephane Eranian
@ 2012-11-06 18:50       ` Andi Kleen
  2012-11-06 19:37         ` Stephane Eranian
  0 siblings, 1 reply; 38+ messages in thread
From: Andi Kleen @ 2012-11-06 18:50 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On Tue, Nov 06, 2012 at 03:29:01PM +0100, Stephane Eranian wrote:
> On Tue, Nov 6, 2012 at 2:31 PM, Andi Kleen <ak@linux.intel.com> wrote:
> >> +EVENT_ATTR(cpu-cycles,                       CPU_CYCLES              );
> >> +EVENT_ATTR(instructions,             INSTRUCTIONS            );
> >> +EVENT_ATTR(cache-references,         CACHE_REFERENCES        );
> >> +EVENT_ATTR(cache-misses,             CACHE_MISSES            );
> >> +EVENT_ATTR(branch-instructions,              BRANCH_INSTRUCTIONS     );
> >> +EVENT_ATTR(branch-misses,            BRANCH_MISSES           );
> >> +EVENT_ATTR(bus-cycles,                       BUS_CYCLES              );
> >> +EVENT_ATTR(stalled-cycles-frontend,  STALLED_CYCLES_FRONTEND );
> >> +EVENT_ATTR(stalled-cycles-backend,   STALLED_CYCLES_BACKEND  );
> >> +EVENT_ATTR(ref-cycles,                       REF_CPU_CYCLES          );
> >
> > The merge_events() approach from the Haswell patches should be far cleaner
> >
> And which patch in your HSW series implements this?

[PATCH 27/32] perf, x86: Support CPU specific sysfs events 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-06 18:50       ` Andi Kleen
@ 2012-11-06 19:37         ` Stephane Eranian
  2012-11-07 14:39           ` Stephane Eranian
  0 siblings, 1 reply; 38+ messages in thread
From: Stephane Eranian @ 2012-11-06 19:37 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On Tue, Nov 6, 2012 at 7:50 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Tue, Nov 06, 2012 at 03:29:01PM +0100, Stephane Eranian wrote:
>> On Tue, Nov 6, 2012 at 2:31 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> >> +EVENT_ATTR(cpu-cycles,                       CPU_CYCLES              );
>> >> +EVENT_ATTR(instructions,             INSTRUCTIONS            );
>> >> +EVENT_ATTR(cache-references,         CACHE_REFERENCES        );
>> >> +EVENT_ATTR(cache-misses,             CACHE_MISSES            );
>> >> +EVENT_ATTR(branch-instructions,              BRANCH_INSTRUCTIONS     );
>> >> +EVENT_ATTR(branch-misses,            BRANCH_MISSES           );
>> >> +EVENT_ATTR(bus-cycles,                       BUS_CYCLES              );
>> >> +EVENT_ATTR(stalled-cycles-frontend,  STALLED_CYCLES_FRONTEND );
>> >> +EVENT_ATTR(stalled-cycles-backend,   STALLED_CYCLES_BACKEND  );
>> >> +EVENT_ATTR(ref-cycles,                       REF_CPU_CYCLES          );
>> >
>> > The merge_events() approach from the Haswell patches should be far cleaner
>> >
>> And which patch in your HSW series implements this?
>
> [PATCH 27/32] perf, x86: Support CPU specific sysfs events
>
Will try using that one.
Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/16] perf: add memory access sampling support
  2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
                   ` (15 preceding siblings ...)
  2012-11-05 13:51 ` [PATCH v2 16/16] perf tools: detect data vs. text mappings Stephane Eranian
@ 2012-11-06 20:52 ` Arnaldo Carvalho de Melo
  2012-11-07  7:38   ` Namhyung Kim
  2012-11-07 14:53   ` Masami Hiramatsu
  16 siblings, 2 replies; 38+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-11-06 20:52 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: linux-kernel, Andi Kleen, Masami Hiramatsu, Peter Zijlstra,
	Namhyung Kim, Jiri Olsa, Ingo Molnar

Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
> Or if one is interested in the data view:
> $ perf mem -t load rep --sort=symbol_daddr,cost
> # Samples: 19K of event 'cpu/mem-loads/pp'
> # Total cost : 1013994
> # Sort order : symbol_daddr,cost
> #
> # Overhead      Samples             Data Symbol     Cost
> # ........  ...........  ......................  .......
> #
>      0.10%            1  [.] 0x00007f67dffe8038      986
>      0.09%            1  [.] 0x00007f67df91a750      890
>      0.08%            1  [.] 0x00007f67e288fba8      826
> 

> CAVEAT: Note that the data addresses are not resolved correctly currently due to a
> problem in perf data symbol resolution code which I have not been able to
> uncover so far.

Stephane,

	Those data addresses mostly are on the stack, we need reverse
resolution using DWARF location expressions to figure out what is the
name of a variable that is on a particular address, etc.

	Masami, have you played with this already? I mean:

[root@sandy acme]# perf mem -t load rep --stdio --sort=symbol,symbol_daddr,cost
# Samples: 30  of event 'cpu/mem-loads/pp'
# Total cost : 640
# Sort order : symbol,symbol_daddr,cost
#
# Overhead      Samples                  Symbol             Data Symbol     Cost
# ........  ...........  ......................  ......................  .......
#
    55.00%            1  [k] lookup_fast         [k] 0xffff8803b7521bd4      352
     5.47%            1  [k] cache_alloc_refill  [k] 0xffff880407705024       35
     3.44%            1  [k] cache_alloc_refill  [k] 0xffff88041d8527d8       22
     3.28%            1  [k] run_timer_softirq   [k] 0xffff88041e2c3e90       21
     2.50%            1  [k] __list_add          [k] 0xffff8803b7521d68       16
     2.19%            1  [.] __strcoll_l         [.] 0x00007fffa8d44080       14
     1.88%            1  [.] __strcoll_l         [.] 0x00007fffa8d44104       12

If we go to the annotation browser to see where is that lookup_fast hitting we get:

100.00 │       mov    -0x34(%rbp),%eax

How to map 0xffff8803b7521bd4 to a stack variable, struct members and all?

Humm, for userspace we have PERF_SAMPLE_REGS_USER for the dwarf unwinder we
need for userspace, but what about reverse mapping of kernel variables? Jiri?

- Arnaldo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/16] perf: add memory access sampling support
  2012-11-06 20:52 ` [PATCH v2 00/16] perf: add memory access sampling support Arnaldo Carvalho de Melo
@ 2012-11-07  7:38   ` Namhyung Kim
  2012-11-07 10:02     ` Stephane Eranian
  2012-11-07 14:53   ` Masami Hiramatsu
  1 sibling, 1 reply; 38+ messages in thread
From: Namhyung Kim @ 2012-11-07  7:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, linux-kernel, Andi Kleen, Masami Hiramatsu,
	Peter Zijlstra, Jiri Olsa, Ingo Molnar

Hi Arnaldo,

On Tue, 6 Nov 2012 17:52:21 -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
> [root@sandy acme]# perf mem -t load rep --stdio --sort=symbol,symbol_daddr,cost
> # Samples: 30  of event 'cpu/mem-loads/pp'
> # Total cost : 640
> # Sort order : symbol,symbol_daddr,cost
> #
> # Overhead      Samples                  Symbol             Data Symbol     Cost
> # ........  ...........  ......................  ......................  .......
> #
>     55.00%            1  [k] lookup_fast         [k] 0xffff8803b7521bd4      352
>      5.47%            1  [k] cache_alloc_refill  [k] 0xffff880407705024       35
>      3.44%            1  [k] cache_alloc_refill  [k] 0xffff88041d8527d8       22
>      3.28%            1  [k] run_timer_softirq   [k] 0xffff88041e2c3e90       21
>      2.50%            1  [k] __list_add          [k] 0xffff8803b7521d68       16
>      2.19%            1  [.] __strcoll_l         [.] 0x00007fffa8d44080       14
>      1.88%            1  [.] __strcoll_l         [.] 0x00007fffa8d44104       12
>
> If we go to the annotation browser to see where is that lookup_fast hitting we get:
>
> 100.00 │       mov    -0x34(%rbp),%eax
>
> How to map 0xffff8803b7521bd4 to a stack variable, struct members and all?
>
> Humm, for userspace we have PERF_SAMPLE_REGS_USER for the dwarf unwinder we
> need for userspace, but what about reverse mapping of kernel variables? Jiri?

I suspect there aren't much thing we can do on stack.  One thing we can
do is adding dso_daddr to sort key and seeing it's a [stack] (or
[stack:tid]) or not.  But not sure it can be done for kernel stacks too.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 00/16] perf: add memory access sampling support
  2012-11-07  7:38   ` Namhyung Kim
@ 2012-11-07 10:02     ` Stephane Eranian
  0 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-07 10:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, LKML, Andi Kleen, Masami Hiramatsu,
	Peter Zijlstra, Jiri Olsa, Ingo Molnar

On Wed, Nov 7, 2012 at 8:38 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hi Arnaldo,
>
> On Tue, 6 Nov 2012 17:52:21 -0300, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
>> [root@sandy acme]# perf mem -t load rep --stdio --sort=symbol,symbol_daddr,cost
>> # Samples: 30  of event 'cpu/mem-loads/pp'
>> # Total cost : 640
>> # Sort order : symbol,symbol_daddr,cost
>> #
>> # Overhead      Samples                  Symbol             Data Symbol     Cost
>> # ........  ...........  ......................  ......................  .......
>> #
>>     55.00%            1  [k] lookup_fast         [k] 0xffff8803b7521bd4      352
>>      5.47%            1  [k] cache_alloc_refill  [k] 0xffff880407705024       35
>>      3.44%            1  [k] cache_alloc_refill  [k] 0xffff88041d8527d8       22
>>      3.28%            1  [k] run_timer_softirq   [k] 0xffff88041e2c3e90       21
>>      2.50%            1  [k] __list_add          [k] 0xffff8803b7521d68       16
>>      2.19%            1  [.] __strcoll_l         [.] 0x00007fffa8d44080       14
>>      1.88%            1  [.] __strcoll_l         [.] 0x00007fffa8d44104       12
>>
>> If we go to the annotation browser to see where is that lookup_fast hitting we get:
>>
>> 100.00 │       mov    -0x34(%rbp),%eax
>>
>> How to map 0xffff8803b7521bd4 to a stack variable, struct members and all?
>>

Yes, you need dwarf to achieve this. But I think we should first fix the problem
with global variables at the user and kernel levels and there all you
need is the
data symbol table (which we load because of the -d option) and correct
base+offset
calculations. For now, I am getting perf-xxxx.map for global variables.

For instance, I use the following simple test program with:
$ perf mem rec mcol 10000

I expect to see symbol aa+offset in the data symbol column and mcol in the
data object column. But instance, I get raw hex and perf-xxxx.map.

I think that's what we need to fix first.

/* mcol.c */
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>

typedef struct {
        long real;
        long img;
} elem_t;

#define NB      100000ULL
static elem_t aa[NB * 2];

uint64_t
mat_column(elem_t *m, size_t nr, size_t nc, elem_t *n)
{
        uint64_t real = 0 , img = 0;
        elem_t *p;
        size_t i, j;

        /* scan by columns */
        for(i=0; i < nc; i++) {
                for(j=0; j < nr; j++) {
                        real += (m+i+nc*j)->real;
                        img += (m+i+nc*j)->img;
                }
        }
        return (real+img) > 0 ? 0 : 1;
}

int main(int argc, char **argv)
{
        size_t rows, cols;
        unsigned long long nloop, nl;
        elem_t number;

        number.real = 1;
        number.img = 5;

        nloop = nl = argc > 1 ? strtoul(argv[1], NULL, 0) : 10000;
        rows = NB;
        cols = 2;
        printf("mat[%p:%p] size=%zuMiB\n", aa, &aa[-1+NB*2], sizeof(aa) >> 20);
        while(nl--) {
                number.img++;
                number.real += mat_column(aa, rows, cols, &number);
        }
        return 0;
}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency
  2012-11-06 19:37         ` Stephane Eranian
@ 2012-11-07 14:39           ` Stephane Eranian
  0 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-07 14:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Namhyung Kim

On Tue, Nov 6, 2012 at 8:37 PM, Stephane Eranian <eranian@google.com> wrote:
> On Tue, Nov 6, 2012 at 7:50 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> On Tue, Nov 06, 2012 at 03:29:01PM +0100, Stephane Eranian wrote:
>>> On Tue, Nov 6, 2012 at 2:31 PM, Andi Kleen <ak@linux.intel.com> wrote:
>>> >> +EVENT_ATTR(cpu-cycles,                       CPU_CYCLES              );
>>> >> +EVENT_ATTR(instructions,             INSTRUCTIONS            );
>>> >> +EVENT_ATTR(cache-references,         CACHE_REFERENCES        );
>>> >> +EVENT_ATTR(cache-misses,             CACHE_MISSES            );
>>> >> +EVENT_ATTR(branch-instructions,              BRANCH_INSTRUCTIONS     );
>>> >> +EVENT_ATTR(branch-misses,            BRANCH_MISSES           );
>>> >> +EVENT_ATTR(bus-cycles,                       BUS_CYCLES              );
>>> >> +EVENT_ATTR(stalled-cycles-frontend,  STALLED_CYCLES_FRONTEND );
>>> >> +EVENT_ATTR(stalled-cycles-backend,   STALLED_CYCLES_BACKEND  );
>>> >> +EVENT_ATTR(ref-cycles,                       REF_CPU_CYCLES          );
>>> >
>>> > The merge_events() approach from the Haswell patches should be far cleaner
>>> >

>>> And which patch in your HSW series implements this?
>>
>> [PATCH 27/32] perf, x86: Support CPU specific sysfs events
>>
Ok, integrated that patch in my series and changed my code to use it.
It works.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Re: [PATCH v2 00/16] perf: add memory access sampling support
  2012-11-06 20:52 ` [PATCH v2 00/16] perf: add memory access sampling support Arnaldo Carvalho de Melo
  2012-11-07  7:38   ` Namhyung Kim
@ 2012-11-07 14:53   ` Masami Hiramatsu
  2012-11-07 14:56     ` Stephane Eranian
  1 sibling, 1 reply; 38+ messages in thread
From: Masami Hiramatsu @ 2012-11-07 14:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Stephane Eranian, linux-kernel, Andi Kleen, Peter Zijlstra,
	Namhyung Kim, Jiri Olsa, Ingo Molnar, yrl.pp-manager.tt

(2012/11/07 5:52), Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
>> Or if one is interested in the data view:
>> $ perf mem -t load rep --sort=symbol_daddr,cost
>> # Samples: 19K of event 'cpu/mem-loads/pp'
>> # Total cost : 1013994
>> # Sort order : symbol_daddr,cost
>> #
>> # Overhead      Samples             Data Symbol     Cost
>> # ........  ...........  ......................  .......
>> #
>>      0.10%            1  [.] 0x00007f67dffe8038      986
>>      0.09%            1  [.] 0x00007f67df91a750      890
>>      0.08%            1  [.] 0x00007f67e288fba8      826
>>
> 
>> CAVEAT: Note that the data addresses are not resolved correctly currently due to a
>> problem in perf data symbol resolution code which I have not been able to
>> uncover so far.
> 
> Stephane,
> 
> 	Those data addresses mostly are on the stack, we need reverse
> resolution using DWARF location expressions to figure out what is the
> name of a variable that is on a particular address, etc.
> 
> 	Masami, have you played with this already? I mean:

No, but it looks interesting. I'll try :)

> 
> [root@sandy acme]# perf mem -t load rep --stdio --sort=symbol,symbol_daddr,cost
> # Samples: 30  of event 'cpu/mem-loads/pp'
> # Total cost : 640
> # Sort order : symbol,symbol_daddr,cost
> #
> # Overhead      Samples                  Symbol             Data Symbol     Cost
> # ........  ...........  ......................  ......................  .......
> #
>     55.00%            1  [k] lookup_fast         [k] 0xffff8803b7521bd4      352
>      5.47%            1  [k] cache_alloc_refill  [k] 0xffff880407705024       35
>      3.44%            1  [k] cache_alloc_refill  [k] 0xffff88041d8527d8       22
>      3.28%            1  [k] run_timer_softirq   [k] 0xffff88041e2c3e90       21
>      2.50%            1  [k] __list_add          [k] 0xffff8803b7521d68       16
>      2.19%            1  [.] __strcoll_l         [.] 0x00007fffa8d44080       14
>      1.88%            1  [.] __strcoll_l         [.] 0x00007fffa8d44104       12
> 
> If we go to the annotation browser to see where is that lookup_fast hitting we get:
> 
> 100.00 │       mov    -0x34(%rbp),%eax
> 
> How to map 0xffff8803b7521bd4 to a stack variable, struct members and all?

If perf stores %rbp value, we can do forward searching from local variables
in this scope block. In some cases, the memory is dereferenced by another
pointer. In that case, it is hard to do that (perhaps, we need to disassemble
it and reverse execution). But in most case of the memory address on a stack,
it will work, I think.

Thank you,

> 
> Humm, for userspace we have PERF_SAMPLE_REGS_USER for the dwarf unwinder we
> need for userspace, but what about reverse mapping of kernel variables? Jiri?
> 
> - Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Re: [PATCH v2 00/16] perf: add memory access sampling support
  2012-11-07 14:53   ` Masami Hiramatsu
@ 2012-11-07 14:56     ` Stephane Eranian
  0 siblings, 0 replies; 38+ messages in thread
From: Stephane Eranian @ 2012-11-07 14:56 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Arnaldo Carvalho de Melo, LKML, Andi Kleen, Peter Zijlstra,
	Namhyung Kim, Jiri Olsa, Ingo Molnar, yrl.pp-manager.tt

On Wed, Nov 7, 2012 at 3:53 PM, Masami Hiramatsu
<masami.hiramatsu.pt@hitachi.com> wrote:
> (2012/11/07 5:52), Arnaldo Carvalho de Melo wrote:
>> Em Mon, Nov 05, 2012 at 02:50:47PM +0100, Stephane Eranian escreveu:
>>> Or if one is interested in the data view:
>>> $ perf mem -t load rep --sort=symbol_daddr,cost
>>> # Samples: 19K of event 'cpu/mem-loads/pp'
>>> # Total cost : 1013994
>>> # Sort order : symbol_daddr,cost
>>> #
>>> # Overhead      Samples             Data Symbol     Cost
>>> # ........  ...........  ......................  .......
>>> #
>>>      0.10%            1  [.] 0x00007f67dffe8038      986
>>>      0.09%            1  [.] 0x00007f67df91a750      890
>>>      0.08%            1  [.] 0x00007f67e288fba8      826
>>>
>>
>>> CAVEAT: Note that the data addresses are not resolved correctly currently due to a
>>> problem in perf data symbol resolution code which I have not been able to
>>> uncover so far.
>>
>> Stephane,
>>
>>       Those data addresses mostly are on the stack, we need reverse
>> resolution using DWARF location expressions to figure out what is the
>> name of a variable that is on a particular address, etc.
>>
>>       Masami, have you played with this already? I mean:
>
> No, but it looks interesting. I'll try :)
>
>>
>> [root@sandy acme]# perf mem -t load rep --stdio --sort=symbol,symbol_daddr,cost
>> # Samples: 30  of event 'cpu/mem-loads/pp'
>> # Total cost : 640
>> # Sort order : symbol,symbol_daddr,cost
>> #
>> # Overhead      Samples                  Symbol             Data Symbol     Cost
>> # ........  ...........  ......................  ......................  .......
>> #
>>     55.00%            1  [k] lookup_fast         [k] 0xffff8803b7521bd4      352
>>      5.47%            1  [k] cache_alloc_refill  [k] 0xffff880407705024       35
>>      3.44%            1  [k] cache_alloc_refill  [k] 0xffff88041d8527d8       22
>>      3.28%            1  [k] run_timer_softirq   [k] 0xffff88041e2c3e90       21
>>      2.50%            1  [k] __list_add          [k] 0xffff8803b7521d68       16
>>      2.19%            1  [.] __strcoll_l         [.] 0x00007fffa8d44080       14
>>      1.88%            1  [.] __strcoll_l         [.] 0x00007fffa8d44104       12
>>
>> If we go to the annotation browser to see where is that lookup_fast hitting we get:
>>
>> 100.00 │       mov    -0x34(%rbp),%eax
>>
>> How to map 0xffff8803b7521bd4 to a stack variable, struct members and all?
>
> If perf stores %rbp value, we can do forward searching from local variables
> in this scope block. In some cases, the memory is dereferenced by another
> pointer. In that case, it is hard to do that (perhaps, we need to disassemble
> it and reverse execution). But in most case of the memory address on a stack,
> it will work, I think.
>
PEBS-LL does store rbp but it is not yet exposed. That will be in a follow-up
patch.

> Thank you,
>
>>
>> Humm, for userspace we have PERF_SAMPLE_REGS_USER for the dwarf unwinder we
>> need for userspace, but what about reverse mapping of kernel variables? Jiri?
>>
>> - Arnaldo
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>
>
> --
> Masami HIRAMATSU
> IT Management Research Dept. Linux Technology Center
> Hitachi, Ltd., Yokohama Research Laboratory
> E-mail: masami.hiramatsu.pt@hitachi.com
>
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [tip:perf/core] perf tools: Add arbitary aliases and support names with -
  2012-11-05 13:50 ` [PATCH v2 05/16] perf, tools: Add arbitary aliases and support names with - Stephane Eranian
@ 2012-11-14  7:34   ` tip-bot for Andi Kleen
  0 siblings, 0 replies; 38+ messages in thread
From: tip-bot for Andi Kleen @ 2012-11-14  7:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, peterz, namhyung.kim, jolsa, ak,
	tglx, mingo

Commit-ID:  eef9ba98b9ee96f52e544b09f581c397d8cc8265
Gitweb:     http://git.kernel.org/tip/eef9ba98b9ee96f52e544b09f581c397d8cc8265
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Mon, 5 Nov 2012 14:50:52 +0100
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 8 Nov 2012 16:12:03 -0300

perf tools: Add arbitary aliases and support names with -

- Add missing scanner symbol for arbitrary aliases inside the config
  region.

- looks nicer than _, so allow - in the event names. Used for various of
  the arch perfmon and Haswell events.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1352123463-7346-6-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.l |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..66959fa 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -81,6 +81,7 @@ num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
 name		[a-zA-Z_*?][a-zA-Z0-9_*?]*
+name_minus	[a-zA-Z_*?][a-zA-Z0-9\-_*?]*
 modifier_event	[ukhpGH]{1,8}
 modifier_bp	[rwx]{1,3}
 
@@ -168,6 +169,7 @@ period			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
+{name_minus}		{ return str(yyscanner, PE_NAME); }
 }
 
 mem:			{ BEGIN(mem); return PE_PREFIX_MEM; }

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2012-11-14  7:34 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-05 13:50 [PATCH v2 00/16] perf: add memory access sampling support Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 01/16] perf/x86: improve sysfs event mapping with event string Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 02/16] perf/x86: add flags to event constraints Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 03/16] perf, core: Add a concept of a weightened sample Stephane Eranian
2012-11-05 20:01   ` Arnaldo Carvalho de Melo
2012-11-05 20:07     ` Arnaldo Carvalho de Melo
2012-11-05 22:51     ` Andi Kleen
2012-11-05 13:50 ` [PATCH v2 04/16] perf: add minimal support for PERF_SAMPLE_WEIGHT Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 05/16] perf, tools: Add arbitary aliases and support names with - Stephane Eranian
2012-11-14  7:34   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2012-11-05 13:50 ` [PATCH v2 06/16] perf: add support for PERF_SAMPLE_ADDR in dump_sampple() Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 07/16] perf: add generic memory sampling interface Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 08/16] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
2012-11-06 13:31   ` Andi Kleen
2012-11-06 14:29     ` Stephane Eranian
2012-11-06 18:50       ` Andi Kleen
2012-11-06 19:37         ` Stephane Eranian
2012-11-07 14:39           ` Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 09/16] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 10/16] perf/x86: add support for PEBS Precise Store Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 11/16] perf tools: add mem access sampling core support Stephane Eranian
2012-11-05 13:50 ` [PATCH v2 12/16] perf report: add support for mem access profiling Stephane Eranian
2012-11-05 13:51 ` [PATCH v2 13/16] perf record: " Stephane Eranian
2012-11-05 13:51 ` [PATCH v2 14/16] perf tools: add new mem command for memory " Stephane Eranian
2012-11-06 15:44   ` Arnaldo Carvalho de Melo
2012-11-06 15:49     ` Stephane Eranian
2012-11-06 16:51       ` Arnaldo Carvalho de Melo
2012-11-06 17:05         ` Arnaldo Carvalho de Melo
2012-11-06 15:50     ` Arnaldo Carvalho de Melo
2012-11-06 15:57       ` Stephane Eranian
2012-11-06 17:07         ` Arnaldo Carvalho de Melo
2012-11-05 13:51 ` [PATCH v2 15/16] perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP Stephane Eranian
2012-11-05 13:51 ` [PATCH v2 16/16] perf tools: detect data vs. text mappings Stephane Eranian
2012-11-06 20:52 ` [PATCH v2 00/16] perf: add memory access sampling support Arnaldo Carvalho de Melo
2012-11-07  7:38   ` Namhyung Kim
2012-11-07 10:02     ` Stephane Eranian
2012-11-07 14:53   ` Masami Hiramatsu
2012-11-07 14:56     ` Stephane Eranian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.