linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v1 00/10] perf: add memory access sampling support
@ 2012-10-29 15:15 Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string Stephane Eranian
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch series had a new feature to the kernel perf_events
interface and corresponding user level tool, perf.

With this patch, it is possible to sample (not trace) memory
accesses (load, store). For loads, the instruction and data
addresses are captured along with the latency and data source.
For stores, the instruction and data addresses are capture
along with limited cache and TLB information.

For load data source, the memory hierarchy level, the tlb, snoop
and lock information is captured.

Although the perf_event interface is extended in a generic manner,
sampling memory accesses requires HW support. The current patches
implement the feature on Intel processors starting with Nehalem.
The patches leverage the PEBS Load Latency and Precise Store
mechanisms. Precise Store is present only on Sandy Bridge and
Ivy Bridge based processors.

The perf tool is extended to make capturing and analyzing the
data easier with a new command: perf mem.

$ perf mem -t load rec triad
$ perf mem -t load rep --stdio
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead      Samples     Cost  Memory access Symbol      Shared Obj  Data Symbol             Data Object Snoop  TLB access   Locked
# ........  ...........  .......  ............. ..........  ........... ......................  ........... ...... ............ ......
#
     0.10%            1      986  LFB hit       [.] triad   triad       [.] 0x00007f67dffe8038  [unknown]    None  L1 or L2 hit  No    
     0.09%            1      890  LFB hit       [.] triad   triad       [.] 0x00007f67df91a750  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      826  LFB hit       [.] triad   triad       [.] 0x00007f67e288fba8  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      825  LFB hit       [.] triad   triad       [.] 0x00007f67dea28c80  [unknown]    None  L1 or L2 hit  No    
     0.08%            1      787  LFB hit       [.] triad   triad       [.] 0x00007f67df055a60  [unknown]    None  L1 or L2 hit  No    

The perf mem command is a wrapper around perf record/report. It passes the
right options to the report and record commands. Note that the TUI mode is
supported.

One powerful feature of perf is that users can toy with sort order to display
the information in different format or from a different angle. This is particularly
useful with memory sampling:

$ perf mem -t load rep --sort=mem
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : mem
#
# Overhead      Samples             Memory access
# ........  ...........  ........................
#
    85.26%        10633  LFB hit                 
     7.35%         8151  L1 hit                  
     3.13%          383  L3 hit                  
     3.09%          195  Local RAM hit           
     1.16%          259  L2 hit                  
     0.00%            4  Uncached hit            

Or if one is interested in the data view:
$ perf mem -t load rep --sort=symbol_daddr,cost
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : symbol_daddr,cost
#
# Overhead      Samples             Data Symbol     Cost
# ........  ...........  ......................  .......
#
     0.10%            1  [.] 0x00007f67dffe8038      986
     0.09%            1  [.] 0x00007f67df91a750      890
     0.08%            1  [.] 0x00007f67e288fba8      826


One note on the cost displayed: On Intel processors with PEBS Load Latency, as described
in the SDM, the cost encompasses the number of cycles from dispatch to Globally Observable
(GO) state. That means, that it includes OOO execution. It is not usual to see L1D Hits
with a cost of > 100 cycles. Always look at the memory level for an approximation of the
access penalty, then interpret the cost value accordingly.

There is no cost associated with stores.

CAVEAT: Note that the data addresses are not resolved correctly currently due to a
problem in perf data symbol resolution code which I have not been able to
uncover so far.

Enjoy,

Signed-off-by: Stephane Eranian <eranian@google.com>


Stephane Eranian (10):
  perf/x86: improve sysfs event mapping with event string
  perf/x86: add flags to event constraints
  perf: add generic memory sampling interface
  perf/x86: add memory profiling via PEBS Load Latency
  perf/x86: export PEBS load latency threshold register to sysfs
  perf/x86: add support for PEBS Precise Store
  perf tools: add mem access sampling core support
  perf report: add support for mem access profiling
  perf record: add support for mem access profiling
  perf tools: add new mem command for memory access profiling

 arch/x86/include/asm/msr-index.h              |    1 +
 arch/x86/kernel/cpu/perf_event.c              |   31 +--
 arch/x86/kernel/cpu/perf_event.h              |   61 ++++-
 arch/x86/kernel/cpu/perf_event_intel.c        |   66 ++++-
 arch/x86/kernel/cpu/perf_event_intel_ds.c     |  182 +++++++++++++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 include/linux/perf_event.h                    |    5 +
 include/uapi/linux/perf_event.h               |   70 +++++-
 kernel/events/core.c                          |   12 +
 tools/perf/Documentation/perf-mem.txt         |   51 ++++
 tools/perf/Makefile                           |    1 +
 tools/perf/builtin-mem.c                      |  237 ++++++++++++++++++
 tools/perf/builtin-record.c                   |    2 +
 tools/perf/builtin-report.c                   |  131 +++++++++-
 tools/perf/builtin.h                          |    1 +
 tools/perf/command-list.txt                   |    1 +
 tools/perf/perf.c                             |    1 +
 tools/perf/perf.h                             |    1 +
 tools/perf/util/event.h                       |    2 +
 tools/perf/util/evsel.c                       |   15 ++
 tools/perf/util/hist.c                        |   66 ++++-
 tools/perf/util/hist.h                        |   13 +
 tools/perf/util/session.c                     |   44 ++++
 tools/perf/util/session.h                     |    4 +
 tools/perf/util/sort.c                        |  324 ++++++++++++++++++++++++-
 tools/perf/util/sort.h                        |   10 +-
 tools/perf/util/symbol.h                      |    7 +
 27 files changed, 1296 insertions(+), 45 deletions(-)
 create mode 100644 tools/perf/Documentation/perf-mem.txt
 create mode 100644 tools/perf/builtin-mem.c

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 19:25   ` Andi Kleen
  2012-10-29 15:15 ` [Patch v1 02/10] perf/x86: add flags to event constraints Stephane Eranian
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c |   26 ++++++++++++--------------
 arch/x86/kernel/cpu/perf_event.h |   26 ++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 0a55ab2..7c30b4c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1316,20 +1316,22 @@ static struct attribute_group x86_pmu_format_group = {
 	.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-	struct device_attribute attr;
-	u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
  */
 static void __init filter_events(struct attribute **attrs)
 {
+	struct device_attribute *d;
+	struct perf_pmu_events_attr *pmu_attr;
 	int i, j;
 
 	for (i = 0; attrs[i]; i++) {
+		d = (struct device_attribute *)attrs[i];
+		pmu_attr = container_of(d, struct perf_pmu_events_attr, attr);
+		/* str trumps id */
+		if (pmu_attr->event_str)
+			continue;
 		if (x86_pmu.event_map(i))
 			continue;
 
@@ -1348,17 +1350,13 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 		container_of(attr, struct perf_pmu_events_attr, attr);
 
 	u64 config = x86_pmu.event_map(pmu_attr->id);
-	return x86_pmu.events_sysfs_show(page, config);
-}
 
-#define EVENT_VAR(_id)  event_attr_##_id
-#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+	/* string trumps id */
+	if (pmu_attr->event_str)
+		return sprintf(page, "%s", pmu_attr->event_str);
 
-#define EVENT_ATTR(_name, _id)					\
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {		\
-	.attr = __ATTR(_name, 0444, events_sysfs_show, NULL),	\
-	.id   =  PERF_COUNT_HW_##_id,				\
-};
+	return x86_pmu.events_sysfs_show(page, config);
+}
 
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 115c1ea..41479da 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -419,6 +419,29 @@ do {									\
 #define ERF_NO_HT_SHARING	1
 #define ERF_HAS_RSP_1		2
 
+#define EVENT_VAR(_id)  event_attr_##_id
+#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+
+#define EVENT_ATTR(_name, _id)					\
+static struct perf_pmu_events_attr EVENT_VAR(_id) = {		\
+	.attr = __ATTR(_name, 0444, events_sysfs_show, NULL),	\
+	.id   =  PERF_COUNT_HW_##_id,				\
+	.event_str = NULL,					\
+};
+
+#define EVENT_ATTR_STR(_name, v, str)				  \
+static struct perf_pmu_events_attr event_attr_##v = {		  \
+	.attr      = __ATTR(_name, 0444, events_sysfs_show, NULL),\
+	.id        =  0,					  \
+	.event_str =  str,					  \
+};
+
+struct perf_pmu_events_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str;
+};
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
@@ -633,6 +656,9 @@ int p6_pmu_init(void);
 
 int knc_pmu_init(void);
 
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
 #else /* CONFIG_CPU_SUP_INTEL */
 
 static inline void reserve_ds_buffers(void)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 02/10] perf/x86: add flags to event constraints
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 03/10] perf: add generic memory sampling interface Stephane Eranian
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.

The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.

This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.c              |    2 +-
 arch/x86/kernel/cpu/perf_event.h              |    8 +++++---
 arch/x86/kernel/cpu/perf_event_intel.c        |    5 ++++-
 arch/x86/kernel/cpu/perf_event_intel_ds.c     |    4 +++-
 arch/x86/kernel/cpu/perf_event_intel_uncore.c |    2 +-
 include/linux/perf_event.h                    |    1 +
 6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 7c30b4c..9b56c20 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1470,7 +1470,7 @@ static int __init init_hw_perf_events(void)
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
-				   0, x86_pmu.num_counters, 0);
+				   0, x86_pmu.num_counters, 0, 0);
 
 	x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
 	x86_pmu_format_group.attrs = x86_pmu.format_attrs;
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 41479da..f78d6e8 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -59,6 +59,7 @@ struct event_constraint {
 	u64	cmask;
 	int	weight;
 	int	overlap;
+	int	flags;
 };
 
 struct amd_nb {
@@ -170,16 +171,17 @@ struct cpu_hw_events {
 	void				*kfree_on_online;
 };
 
-#define __EVENT_CONSTRAINT(c, n, m, w, o) {\
+#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
 	{ .idxmsk64 = (n) },		\
 	.code = (c),			\
 	.cmask = (m),			\
 	.weight = (w),			\
 	.overlap = (o),			\
+	.flags = f,			\
 }
 
 #define EVENT_CONSTRAINT(c, n, m)	\
-	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0)
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
 
 /*
  * The overlap flag marks event constraints with overlapping counter
@@ -203,7 +205,7 @@ struct cpu_hw_events {
  * and its counter masks must be kept at a minimum.
  */
 #define EVENT_CONSTRAINT_OVERLAP(c, n, m)	\
-	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1)
+	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1, 0)
 
 /*
  * Constraint on the Event code.
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 93b9e11..57d6527 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1367,8 +1367,10 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
 
 	if (x86_pmu.event_constraints) {
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
-			if ((event->hw.config & c->cmask) == c->code)
+			if ((event->hw.config & c->cmask) == c->code) {
+				event->hw.flags |= c->flags;
 				return c;
+			}
 		}
 	}
 
@@ -1413,6 +1415,7 @@ intel_put_shared_regs_event_constraints(struct cpu_hw_events *cpuc,
 static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
 					struct perf_event *event)
 {
+	event->hw.flags = 0;
 	intel_put_shared_regs_event_constraints(cpuc, event);
 }
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..f30d85b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -430,8 +430,10 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 
 	if (x86_pmu.pebs_constraints) {
 		for_each_event_constraint(c, x86_pmu.pebs_constraints) {
-			if ((event->hw.config & c->cmask) == c->code)
+			if ((event->hw.config & c->cmask) == c->code) {
+				event->hw.flags |= c->flags;
 				return c;
+			}
 		}
 	}
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 3cf3d97..cc39e64 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2438,7 +2438,7 @@ static int __init uncore_type_init(struct intel_uncore_type *type)
 
 	type->unconstrainted = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
-				0, type->num_counters, 0);
+				0, type->num_counters, 0, 0);
 
 	for (i = 0; i < type->num_boxes; i++) {
 		pmus[i].func_id = -1;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..484cfbc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -128,6 +128,7 @@ struct hw_perf_event {
 			int		event_base_rdpmc;
 			int		idx;
 			int		last_cpu;
+			int		flags;
 
 			struct hw_perf_event_extra extra_reg;
 			struct hw_perf_event_extra branch_reg;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 03/10] perf: add generic memory sampling interface
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 02/10] perf/x86: add flags to event constraints Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds PERF_SAMPLE_COST and PERF_SAMPLE_DSRC.
The first collects a cost associated with the sampled
event. In case of memory access, the cost would be
the latency of the load, otherwise it defaults to
the sampling period.

PERF_SAMPLE_DSRC collects the data source, i.e., where
did the data associated with the sampled instruction
come from. Information is stored in a perf_mem_dsrc
structure. It contains opcode, mem level, tlb, snoop,
lock information, subject to availability in hardware.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 include/linux/perf_event.h      |    4 +++
 include/uapi/linux/perf_event.h |   70 ++++++++++++++++++++++++++++++++++++++-
 kernel/events/core.c            |   12 +++++++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 484cfbc..a323ee2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -579,6 +579,8 @@ struct perf_sample_data {
 		u32	reserved;
 	}				cpu_entry;
 	u64				period;
+	u64				cost;
+	union  perf_mem_dsrc		dsrc;
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
@@ -597,6 +599,8 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
+	data->cost = period; /* by default */
+	data->dsrc.val = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..2a3401b 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_BRANCH_STACK		= 1U << 11,
 	PERF_SAMPLE_REGS_USER			= 1U << 12,
 	PERF_SAMPLE_STACK_USER			= 1U << 13,
+	PERF_SAMPLE_COST			= 1U << 14,
+	PERF_SAMPLE_DSRC			= 1U << 15,
 
-	PERF_SAMPLE_MAX = 1U << 14,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 16,		/* non-ABI */
 };
 
 /*
@@ -587,6 +589,9 @@ enum perf_event_type {
 	 * 	{ u64			size;
 	 * 	  char			data[size];
 	 * 	  u64			dyn_size; } && PERF_SAMPLE_STACK_USER
+	 *
+	 *	{ u64			cost;  } && PERF_SAMPLE_COST
+	 *	{ u64			dsrc;  } && PERF_SAMPLE_DSRC
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
@@ -612,4 +617,67 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+union perf_mem_dsrc {
+	__u64 val;
+	struct {
+		__u64   mem_op:5,	/* type of opcode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_snoop:5,	/* snoop mode */
+			mem_lock:2,	/* lock instr */
+			mem_dtlb:7,	/* tlb access */
+			mem_rsvd:31;
+	};
+};
+
+/* type of opcode (load/store/prefetch,code) */
+#define PERF_MEM_OP_NA		0x01 /* not available */
+#define PERF_MEM_OP_LOAD	0x02 /* load instruction */
+#define PERF_MEM_OP_STORE	0x04 /* store instruction */
+#define PERF_MEM_OP_PFETCH	0x08 /* prefetch */
+#define PERF_MEM_OP_EXEC	0x10 /* code (execution) */
+#define PERF_MEM_OP_SHIFT	0
+
+/* memory hierarchy (memory level, hit or miss) */
+#define PERF_MEM_LVL_NA		0x01  /* not available */
+#define PERF_MEM_LVL_HIT	0x02  /* hit level */
+#define PERF_MEM_LVL_MISS	0x04  /* miss level  */
+#define PERF_MEM_LVL_L1		0x08  /* L1 */
+#define PERF_MEM_LVL_LFB	0x10  /* Line Fill Buffer */
+#define PERF_MEM_LVL_L2		0x20  /* L2 hit */
+#define PERF_MEM_LVL_L3		0x40  /* L3 hit */
+#define PERF_MEM_LVL_LOC_RAM	0x80  /* Local DRAM */
+#define PERF_MEM_LVL_REM_RAM1	0x100 /* Remote DRAM (1 hop) */
+#define PERF_MEM_LVL_REM_RAM2	0x200 /* Remote DRAM (2 hops) */
+#define PERF_MEM_LVL_REM_CCE1	0x400 /* Remote Cache (1 hop) */
+#define PERF_MEM_LVL_REM_CCE2	0x800 /* Remote Cache (2 hops) */
+#define PERF_MEM_LVL_IO		0x1000 /* I/O memory */
+#define PERF_MEM_LVL_UNC	0x2000 /* Uncached memory */
+#define PERF_MEM_LVL_SHIFT	5
+
+/* snoop mode */
+#define PERF_MEM_SNOOP_NA	0x01 /* not available */
+#define PERF_MEM_SNOOP_NONE	0x02 /* no snoop */
+#define PERF_MEM_SNOOP_HIT	0x04 /* snoop hit */
+#define PERF_MEM_SNOOP_MISS	0x08 /* snoop miss */
+#define PERF_MEM_SNOOP_HITM	0x10 /* snoop hit modified */
+#define PERF_MEM_SNOOP_SHIFT	19
+
+/* locked instruction */
+#define PERF_MEM_LOCK_NA	0x01 /* not available */
+#define PERF_MEM_LOCK_LOCKED	0x02 /* locked transaction */
+#define PERF_MEM_LOCK_SHIFT	24
+
+/* TLB access */
+#define PERF_MEM_TLB_NA		0x01 /* not available */
+#define PERF_MEM_TLB_HIT	0x02 /* hit level */
+#define PERF_MEM_TLB_MISS	0x04 /* miss level */
+#define PERF_MEM_TLB_L1		0x08 /* L1 */
+#define PERF_MEM_TLB_L2		0x10 /* L2 */
+#define PERF_MEM_TLB_WK		0x20 /* Hardware Walker*/
+#define PERF_MEM_TLB_OS		0x40 /* OS fault handler */
+#define PERF_MEM_TLB_SHIFT	26
+
+#define PERF_MEM_S(a, s) \
+	(((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
+
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..a1cf8f2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -955,6 +955,12 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_READ)
 		size += event->read_size;
 
+	if (sample_type & PERF_SAMPLE_COST)
+		size += sizeof(data->cost);
+
+	if (sample_type & PERF_SAMPLE_DSRC)
+		size += sizeof(data->dsrc.val);
+
 	event->header_size = size;
 }
 
@@ -4169,6 +4175,12 @@ void perf_output_sample(struct perf_output_handle *handle,
 		perf_output_sample_ustack(handle,
 					  data->stack_user_size,
 					  data->regs_user.regs);
+
+	if (sample_type & PERF_SAMPLE_COST)
+		perf_output_put(handle, data->cost);
+
+	if (sample_type & PERF_SAMPLE_DSRC)
+		perf_output_put(handle, data->dsrc.val);
 }
 
 void perf_prepare_sample(struct perf_event_header *header,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (2 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 03/10] perf: add generic memory sampling interface Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:23   ` Peter Zijlstra
                     ` (4 more replies)
  2012-10-29 15:15 ` [Patch v1 05/10] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
                   ` (5 subsequent siblings)
  9 siblings, 5 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DSRC types.

To enable PEBS Load Latency, users have to use the
model specific event:
- on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
- on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/include/asm/msr-index.h          |    1 +
 arch/x86/kernel/cpu/perf_event.c          |    3 +
 arch/x86/kernel/cpu/perf_event.h          |   22 ++++-
 arch/x86/kernel/cpu/perf_event_intel.c    |   56 ++++++++++++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  133 +++++++++++++++++++++++++++--
 5 files changed, 206 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..5edc9c0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -66,6 +66,7 @@
 #define MSR_IA32_PEBS_ENABLE		0x000003f1
 #define MSR_IA32_DS_AREA		0x00000600
 #define MSR_IA32_PERF_CAPABILITIES	0x00000345
+#define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
 
 #define MSR_MTRRfix64K_00000		0x00000250
 #define MSR_MTRRfix16K_80000		0x00000258
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9b56c20..2fc34d6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1475,6 +1475,9 @@ static int __init init_hw_perf_events(void)
 	x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
 	x86_pmu_format_group.attrs = x86_pmu.format_attrs;
 
+	if (x86_pmu.event_attrs)
+		x86_pmu_events_group.attrs = x86_pmu.event_attrs;
+
 	if (!x86_pmu.events_sysfs_show)
 		x86_pmu_events_group.attrs = &empty_attrs;
 	else
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index f78d6e8..3c5aa72 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -46,6 +46,7 @@ enum extra_reg_type {
 	EXTRA_REG_RSP_0 = 0,	/* offcore_response_0 */
 	EXTRA_REG_RSP_1 = 1,	/* offcore_response_1 */
 	EXTRA_REG_LBR   = 2,	/* lbr_select */
+	EXTRA_REG_LDLAT = 3,	/* ld_lat_threshold */
 
 	EXTRA_REG_MAX		/* number of entries needed */
 };
@@ -61,6 +62,10 @@ struct event_constraint {
 	int	overlap;
 	int	flags;
 };
+/*
+ * struct event_constraint flags
+ */
+#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -233,6 +238,10 @@ struct cpu_hw_events {
 #define INTEL_UEVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)
 
+#define INTEL_PLD_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
+
 #define EVENT_CONSTRAINT_END		\
 	EVENT_CONSTRAINT(0, 0, 0)
 
@@ -262,12 +271,22 @@ struct extra_reg {
 	.msr = (ms),		\
 	.config_mask = (m),	\
 	.valid_mask = (vm),	\
-	.idx = EXTRA_REG_##i	\
+	.idx = EXTRA_REG_##i,	\
 	}
 
 #define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx)	\
 	EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT, vm, idx)
 
+#define INTEL_UEVENT_EXTRA_REG(event, msr, vm, idx) \
+	EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT | \
+			ARCH_PERFMON_EVENTSEL_UMASK, vm, idx)
+
+#define INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(c) \
+	INTEL_UEVENT_EXTRA_REG(c, \
+			       MSR_PEBS_LD_LAT_THRESHOLD, \
+			       0xffff, \
+			       LDLAT)
+
 #define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0, RSP_0)
 
 union perf_capabilities {
@@ -355,6 +374,7 @@ struct x86_pmu {
 	 */
 	int		attr_rdpmc;
 	struct attribute **format_attrs;
+	struct attribute **event_attrs;
 
 	ssize_t		(*events_sysfs_show)(char *page, u64 config);
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 57d6527..059fc8c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -81,6 +81,7 @@ static struct event_constraint intel_nehalem_event_constraints[] __read_mostly =
 static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
 {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
 	EVENT_EXTRA_END
 };
 
@@ -111,6 +112,7 @@ static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
 {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
 	INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff, RSP_1),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
 	EVENT_EXTRA_END
 };
 
@@ -130,9 +132,55 @@ static struct event_constraint intel_gen_event_constraints[] __read_mostly =
 static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
 	INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0),
 	INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
 	EVENT_EXTRA_END
 };
 
+
+EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
+EVENT_ATTR(instructions,		INSTRUCTIONS		);
+EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
+EVENT_ATTR(cache-misses,		CACHE_MISSES		);
+EVENT_ATTR(branch-instructions,		BRANCH_INSTRUCTIONS	);
+EVENT_ATTR(branch-misses,		BRANCH_MISSES		);
+EVENT_ATTR(bus-cycles,			BUS_CYCLES		);
+EVENT_ATTR(stalled-cycles-frontend,	STALLED_CYCLES_FRONTEND	);
+EVENT_ATTR(stalled-cycles-backend,	STALLED_CYCLES_BACKEND	);
+EVENT_ATTR(ref-cycles,			REF_CPU_CYCLES		);
+
+EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
+EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+
+struct attribute *nhm_events_attrs[] = {
+	EVENT_PTR(CPU_CYCLES),
+	EVENT_PTR(INSTRUCTIONS),
+	EVENT_PTR(CACHE_REFERENCES),
+	EVENT_PTR(CACHE_MISSES),
+	EVENT_PTR(BRANCH_INSTRUCTIONS),
+	EVENT_PTR(BRANCH_MISSES),
+	EVENT_PTR(BUS_CYCLES),
+	EVENT_PTR(STALLED_CYCLES_FRONTEND),
+	EVENT_PTR(STALLED_CYCLES_BACKEND),
+	EVENT_PTR(REF_CPU_CYCLES),
+	EVENT_PTR(mem_ld_nhm),
+	NULL,
+};
+
+struct attribute *snb_events_attrs[] = {
+	EVENT_PTR(CPU_CYCLES),
+	EVENT_PTR(INSTRUCTIONS),
+	EVENT_PTR(CACHE_REFERENCES),
+	EVENT_PTR(CACHE_MISSES),
+	EVENT_PTR(BRANCH_INSTRUCTIONS),
+	EVENT_PTR(BRANCH_MISSES),
+	EVENT_PTR(BUS_CYCLES),
+	EVENT_PTR(STALLED_CYCLES_FRONTEND),
+	EVENT_PTR(STALLED_CYCLES_BACKEND),
+	EVENT_PTR(REF_CPU_CYCLES),
+	EVENT_PTR(mem_ld_snb),
+	NULL,
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
 	return intel_perfmon_event_map[hw_event];
@@ -2009,6 +2057,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.enable_all = intel_pmu_nhm_enable_all;
 		x86_pmu.extra_regs = intel_nehalem_extra_regs;
 
+		x86_pmu.event_attrs = nhm_events_attrs;
+
 		/* UOPS_ISSUED.STALLED_CYCLES */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2049,6 +2099,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.extra_regs = intel_westmere_extra_regs;
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 
+		x86_pmu.event_attrs = nhm_events_attrs;
+
 		/* UOPS_ISSUED.STALLED_CYCLES */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2077,6 +2129,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+		x86_pmu.event_attrs = snb_events_attrs;
+
 		/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2102,6 +2156,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+		x86_pmu.event_attrs = snb_events_attrs;
+
 		/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
 		intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
 			X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index f30d85b..3ee6e83 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -24,6 +24,92 @@ struct pebs_record_32 {
 
  */
 
+union intel_x86_pebs_dse {
+	u64 val;
+	struct {
+		unsigned int ld_dse:4;
+		unsigned int ld_stlb_miss:1;
+		unsigned int ld_locked:1;
+		unsigned int ld_reserved:26;
+	};
+	struct {
+		unsigned int st_l1d_hit:1;
+		unsigned int st_reserved1:3;
+		unsigned int st_stlb_miss:1;
+		unsigned int st_locked:1;
+		unsigned int st_reserved2:26;
+	};
+};
+
+
+/*
+ * Map PEBS Load Latency Data Source encodings to generic
+ * memory data source information
+ */
+#define P(a, b) PERF_MEM_S(a, b)
+#define OP_LH (P(OP, LOAD) | P(LVL, HIT))
+#define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS))
+
+static const u64 pebs_data_source[] = {
+	P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
+	OP_LH | P(LVL, L1) | P(SNOOP, NONE),	/* 0x01: L1 local */
+	OP_LH | P(LVL, LFB)| P(SNOOP, NONE),	/* 0x02: LFB hit */
+	OP_LH | P(LVL, L2) | P(SNOOP, NONE),	/* 0x03: L2 hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, NONE),	/* 0x04: L3 hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, MISS),	/* 0x05: L3 hit, snoop miss */
+	OP_LH | P(LVL, L3) | P(SNOOP, HIT),	/* 0x06: L3 hit, snoop hit */
+	OP_LH | P(LVL, L3) | P(SNOOP, HITM),	/* 0x07: L3 hit, snoop hitm */
+	OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT),  /* 0x08: L3 miss snoop hit */
+	OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
+	OP_LH | P(LVL, LOC_RAM)  | P(SNOOP, HIT),  /* 0x0a: L3 miss, shared */
+	OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT),  /* 0x0b: L3 miss, shared */
+	OP_LH | P(LVL, LOC_RAM)  | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */
+	OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */
+	OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */
+	OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
+};
+
+static u64 load_latency_data(u64 status)
+{
+	union intel_x86_pebs_dse dse;
+	u64 val;
+	int model = boot_cpu_data.x86_model;
+	int fam = boot_cpu_data.x86;
+
+	dse.val = status;
+
+	/*
+	 * use the mapping table for bit 0-15
+	 */
+	val = pebs_data_source[dse.ld_dse];
+
+	/*
+	 * Nehalem models do not support TLB, Lock infos
+	 */
+	if (fam == 0x6 && (model == 26 || model == 30
+	    || model == 31 || model == 46)) {
+		val |= P(TLB, NA) | P(LOCK, NA);
+		return val;
+	}
+	/*
+	 * bit 4: TLB access
+	 * 0 = did not miss 2nd level TLB
+	 * 1 = missed 2nd level TLB
+	 */
+	if (dse.ld_stlb_miss)
+		val |= P(TLB, MISS) | P(TLB, L2);
+	else
+		val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
+
+	/*
+	 * bit 5: locked prefix
+	 */
+	if (dse.ld_locked)
+		val |= P(LOCK, LOCKED);
+
+	return val;
+}
+
 struct pebs_record_core {
 	u64 flags, ip;
 	u64 ax, bx, cx, dx;
@@ -364,7 +450,7 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_nehalem_pebs_event_constraints[] = {
-	INTEL_EVENT_CONSTRAINT(0x0b, 0xf),    /* MEM_INST_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x100b, 0xf),      /* MEM_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0x0f, 0xf),    /* MEM_UNCORE_RETIRED.* */
 	INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
 	INTEL_EVENT_CONSTRAINT(0xc0, 0xf),    /* INST_RETIRED.ANY */
@@ -379,7 +465,7 @@ struct event_constraint intel_nehalem_pebs_event_constraints[] = {
 };
 
 struct event_constraint intel_westmere_pebs_event_constraints[] = {
-	INTEL_EVENT_CONSTRAINT(0x0b, 0xf),    /* MEM_INST_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x100b, 0xf),      /* MEM_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0x0f, 0xf),    /* MEM_UNCORE_RETIRED.* */
 	INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
 	INTEL_EVENT_CONSTRAINT(0xc0, 0xf),    /* INSTR_RETIRED.* */
@@ -399,7 +485,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
 	INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
 	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
-	INTEL_EVENT_CONSTRAINT(0xcd, 0x8),    /* MEM_TRANS_RETIRED.* */
+	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
 	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -413,7 +499,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
         INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
-        INTEL_EVENT_CONSTRAINT(0xcd, 0x8),    /* MEM_TRANS_RETIRED.* */
+        INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
         INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -448,6 +534,9 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 	hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
 	cpuc->pebs_enabled |= 1ULL << hwc->idx;
+
+	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -560,20 +649,48 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 				   struct pt_regs *iregs, void *__pebs)
 {
 	/*
-	 * We cast to pebs_record_core since that is a subset of
-	 * both formats and we don't use the other fields in this
-	 * routine.
+	 * We cast to pebs_record_nhm to get the load latency data
+	 * if extra_reg MSR_PEBS_LD_LAT_THRESHOLD used
 	 */
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
-	struct pebs_record_core *pebs = __pebs;
+	struct pebs_record_nhm *pebs = __pebs;
 	struct perf_sample_data data;
 	struct pt_regs regs;
+	u64 sample_type;
+	int fll;
 
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
+	fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
+	data.period = event->hw.last_period;
+	sample_type = event->attr.sample_type;
+
+	/*
+	 * if PEBS-LL or PreciseStore
+	 */
+	if (fll) {
+		if (sample_type & PERF_SAMPLE_ADDR)
+			data.addr = pebs->dla;
+
+		/*
+		 * Use latency for cost (only avail with PEBS-LL)
+		 */
+		if (fll && (sample_type & PERF_SAMPLE_COST))
+			data.cost = pebs->lat;
+
+		/*
+		 * data.dsrc encodes the data source
+		 */
+		if (sample_type & PERF_SAMPLE_DSRC) {
+			if (fll)
+				data.dsrc.val = load_latency_data(pebs->dse);
+		}
+	}
+
 	/*
 	 * We use the interrupt regs as a base because the PEBS record
 	 * does not contain a full regs set, specifically it seems to
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 05/10] perf/x86: export PEBS load latency threshold register to sysfs
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (3 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 06/10] perf/x86: add support for PEBS Precise Store Stephane Eranian
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

Make the PEBS Load Latency threshold register layout
and encoding visible to user level tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 059fc8c..c37b7f8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1787,6 +1787,8 @@ static void intel_pmu_flush_branch_stack(void)
 
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
+PMU_FORMAT_ATTR(ldlat, "config1:0-15");
+
 static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_event.attr,
 	&format_attr_umask.attr,
@@ -1797,6 +1799,7 @@ static struct attribute *intel_arch3_formats_attr[] = {
 	&format_attr_cmask.attr,
 
 	&format_attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
+	&format_attr_ldlat.attr, /* PEBS load latency */
 	NULL,
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 06/10] perf/x86: add support for PEBS Precise Store
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (4 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 05/10] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:40   ` Peter Zijlstra
  2012-10-31  5:21   ` Namhyung Kim
  2012-10-29 15:15 ` [Patch v1 07/10] perf tools: add mem access sampling core support Stephane Eranian
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds support for PEBS Precise Store
which is available on Intel Sandy Bridge and
Ivy Bridge processors.

To use Precise store, the proper PEBS event
must be used: mem_trans_retired:precise_stores.
For the perf tool, the generic mem-stores event
exported via sysfs can be used directly.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/kernel/cpu/perf_event.h          |    5 +++
 arch/x86/kernel/cpu/perf_event_intel.c    |    2 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   49 +++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3c5aa72..4e95c90 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -66,6 +66,7 @@ struct event_constraint {
  * struct event_constraint flags
  */
 #define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
+#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -242,6 +243,10 @@ struct cpu_hw_events {
 	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
 			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
 
+#define INTEL_PST_CONSTRAINT(c, n)	\
+	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST)
+
 #define EVENT_CONSTRAINT_END		\
 	EVENT_CONSTRAINT(0, 0, 0)
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index c37b7f8..bf25b7b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -150,6 +150,7 @@ EVENT_ATTR(ref-cycles,			REF_CPU_CYCLES		);
 
 EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");
 
 struct attribute *nhm_events_attrs[] = {
 	EVENT_PTR(CPU_CYCLES),
@@ -178,6 +179,7 @@ struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(STALLED_CYCLES_BACKEND),
 	EVENT_PTR(REF_CPU_CYCLES),
 	EVENT_PTR(mem_ld_snb),
+	EVENT_PTR(mem_st_snb),
 	NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 3ee6e83..4c5f639 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -69,6 +69,44 @@ static const u64 pebs_data_source[] = {
 	OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
 };
 
+static u64 precise_store_data(u64 status)
+{
+	union intel_x86_pebs_dse dse;
+	u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
+
+	dse.val = status;
+
+	/*
+	 * bit 4: TLB access
+	 * 1 = stored missed 2nd level TLB
+	 *
+	 * so it either hit the walker or the OS
+	 * otherwise hit 2nd level TLB
+	 */
+	if (dse.st_stlb_miss)
+		val |= P(TLB, MISS);
+	else
+		val |= P(TLB, HIT);
+
+	/*
+	 * bit 0: hit L1 data cache
+	 * if not set, then all we know is that
+	 * it missed L1D
+	 */
+	if (dse.st_l1d_hit)
+		val |= P(LVL, HIT);
+	else
+		val |= P(LVL, MISS);
+
+	/*
+	 * bit 5: Locked prefix
+	 */
+	if (dse.st_locked)
+		val |= P(LOCK, LOCKED);
+
+	return val;
+}
+
 static u64 load_latency_data(u64 status)
 {
 	union intel_x86_pebs_dse dse;
@@ -486,6 +524,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
 	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
 	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
 	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
 	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -500,6 +539,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
         INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
         INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
         INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -537,6 +577,8 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
 	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
 		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
+	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
+		cpuc->pebs_enabled |= 1ULL << 63;
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -657,12 +699,13 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	struct perf_sample_data data;
 	struct pt_regs regs;
 	u64 sample_type;
-	int fll;
+	int fll, fst;
 
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
 	fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+	fst = event->hw.flags & PERF_X86_EVENT_PEBS_ST;
 
 	perf_sample_data_init(&data, 0, event->hw.last_period);
 
@@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	/*
 	 * if PEBS-LL or PreciseStore
 	 */
-	if (fll) {
+	if (fll || fst) {
 		if (sample_type & PERF_SAMPLE_ADDR)
 			data.addr = pebs->dla;
 
@@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 		if (sample_type & PERF_SAMPLE_DSRC) {
 			if (fll)
 				data.dsrc.val = load_latency_data(pebs->dse);
+			else if (fst)
+				data.dsrc.val = precise_store_data(pebs->dse);
 		}
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 07/10] perf tools: add mem access sampling core support
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (5 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 06/10] perf/x86: add support for PEBS Precise Store Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 16:55   ` Andi Kleen
  2012-10-31  5:51   ` Namhyung Kim
  2012-10-29 15:15 ` [Patch v1 08/10] perf report: add support for mem access profiling Stephane Eranian
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds the sorting and histogram support
functions to enable profiling of memory accesses.

The following sorting orders are added:
 - symbol_daddr: data address symbol (or raw address)
 - dso_daddr: data address shared object
 - cost: access cost
 - locked: access uses locked transaction
 - tlb : TLB access
 - mem : memory level of the access (L1, L2, L3, RAM, ...)
 - snoop: access snoop mode

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/util/event.h   |    2 +
 tools/perf/util/evsel.c   |    9 ++
 tools/perf/util/hist.c    |   66 ++++++++-
 tools/perf/util/hist.h    |   13 ++
 tools/perf/util/session.c |   44 ++++++
 tools/perf/util/session.h |    4 +
 tools/perf/util/sort.c    |  324 ++++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/sort.h    |   10 +-
 tools/perf/util/symbol.h  |    7 +
 9 files changed, 469 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index da97aff..b26b692 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -89,6 +89,8 @@ struct perf_sample {
 	u64 period;
 	u32 cpu;
 	u32 raw_size;
+	u64 cost;
+	u64 dsrc;
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 618d411..c82a271 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1020,6 +1020,15 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		}
 	}
 
+	if (type & PERF_SAMPLE_COST) {
+		data->cost = *array;
+		array++;
+	}
+
+	if (type & PERF_SAMPLE_DSRC) {
+		data->dsrc = *array;
+		array++;
+	}
 	return 0;
 }
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 277947a..0c21499 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -66,12 +66,16 @@ static void hists__set_unres_dso_col_len(struct hists *hists, int dso)
 void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 {
 	const unsigned int unresolved_col_width = BITS_PER_LONG / 4;
+	int symlen;
 	u16 len;
 
 	if (h->ms.sym)
 		hists__new_col_len(hists, HISTC_SYMBOL, h->ms.sym->namelen + 4);
-	else
+	else {
+		symlen = unresolved_col_width + 4 + 2;
+		hists__new_col_len(hists, HISTC_SYMBOL, symlen);
 		hists__set_unres_dso_col_len(hists, HISTC_DSO);
+	}
 
 	len = thread__comm_len(h->thread);
 	if (hists__new_col_len(hists, HISTC_COMM, len))
@@ -83,7 +87,6 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 	}
 
 	if (h->branch_info) {
-		int symlen;
 		/*
 		 * +4 accounts for '[x] ' priv level info
 		 * +2 account of 0x prefix on raw addresses
@@ -111,7 +114,32 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 			hists__new_col_len(hists, HISTC_SYMBOL_TO, symlen);
 			hists__set_unres_dso_col_len(hists, HISTC_DSO_TO);
 		}
+	} else if (h->mem_info) {
+		/*
+		 * +4 accounts for '[x] ' priv level info
+		 * +2 account of 0x prefix on raw addresses
+		 */
+		if (h->mem_info->daddr.sym) {
+			symlen = (int)h->mem_info->daddr.sym->namelen + 4;
+			hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+		} else {
+			symlen = unresolved_col_width + 4 + 2;
+			hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+		}
+		if (h->mem_info->daddr.map) {
+			symlen = dso__name_len(h->mem_info->daddr.map->dso);
+			hists__new_col_len(hists, HISTC_MEM_DADDR_DSO, symlen);
+		} else {
+			symlen = unresolved_col_width + 4 + 2;
+			hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
+		}
+		hists__new_col_len(hists, HISTC_MEM_COST, 7);
+		hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
+		hists__new_col_len(hists, HISTC_MEM_TLB, 22);
+		hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
+		hists__new_col_len(hists, HISTC_MEM_LVL, 21+3);
 	}
+
 }
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -235,7 +263,7 @@ void hists__decay_entries_threaded(struct hists *hists,
 static struct hist_entry *hist_entry__new(struct hist_entry *template)
 {
 	size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
-	struct hist_entry *he = malloc(sizeof(*he) + callchain_size);
+	struct hist_entry *he = calloc(1, sizeof(*he) + callchain_size);
 
 	if (he != NULL) {
 		*he = *template;
@@ -321,6 +349,35 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 	return he;
 }
 
+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+					  struct addr_location *al,
+					  struct symbol *sym_parent,
+					  struct mem_info *mi,
+					  u64 cost)
+{
+	struct hist_entry entry = {
+		.thread	= al->thread,
+		.ms = {
+			.map	= al->map,
+			.sym	= al->sym,
+		},
+		.cpu	= al->cpu,
+		.ip	= al->addr,
+		.level	= al->level,
+		.stat = {
+			.period = cost,
+			.nr_events = 1,
+		},
+		.parent = sym_parent,
+		.filtered = symbol__parent_filter(sym_parent),
+		.hists = self,
+		.mem_info = mi,
+		.branch_info = NULL,
+	};
+
+	return add_hist_entry(self, &entry, al, cost);
+}
+
 struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
@@ -344,6 +401,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.filtered = symbol__parent_filter(sym_parent),
 		.branch_info = bi,
 		.hists	= self,
+		.mem_info = NULL,
 	};
 
 	return add_hist_entry(self, &entry, al, period);
@@ -369,6 +427,8 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.hists	= self,
+		.branch_info = NULL,
+		.mem_info = NULL,
 	};
 
 	return add_hist_entry(self, &entry, al, period);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index c751624..3ae4f62 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -48,6 +48,13 @@ enum hist_column {
 	HISTC_DSO_FROM,
 	HISTC_DSO_TO,
 	HISTC_SRCLINE,
+	HISTC_MEM_DADDR_SYMBOL,
+	HISTC_MEM_DADDR_DSO,
+	HISTC_MEM_COST,
+	HISTC_MEM_LOCKED,
+	HISTC_MEM_TLB,
+	HISTC_MEM_LVL,
+	HISTC_MEM_SNOOP,
 	HISTC_NR_COLS, /* Last entry */
 };
 
@@ -85,6 +92,12 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct branch_info *bi,
 					     u64 period);
 
+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+					  struct addr_location *al,
+					  struct symbol *sym_parent,
+					  struct mem_info *mi,
+					  u64 period);
+
 void hists__output_resort(struct hists *self);
 void hists__output_resort_threaded(struct hists *hists);
 void hists__collapse_resort(struct hists *self);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15abe40..ea1541f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -273,6 +273,41 @@ static void ip__resolve_ams(struct machine *self, struct thread *thread,
 	ams->map = al.map;
 }
 
+static void ip__resolve_data(struct machine *self, struct thread *thread,
+			     u8 m,
+			    struct addr_map_symbol *ams,
+			    u64 addr)
+{
+	struct addr_location al;
+
+	memset(&al, 0, sizeof(al));
+
+	thread__find_addr_location(thread, self, m, MAP__VARIABLE, addr, &al, NULL);
+	ams->addr = addr;
+	ams->al_addr = al.addr;
+	ams->sym = al.sym;
+	ams->map = al.map;
+}
+
+struct mem_info *machine__resolve_mem(struct machine *self,
+				      struct thread *thr,
+				      struct perf_sample *sample,
+				      u8 cpumode)
+{
+	struct mem_info *mi;
+
+	mi = calloc(1, sizeof(struct mem_info));
+	if (!mi)
+		return NULL;
+
+	ip__resolve_ams(self, thr, &mi->iaddr, sample->ip);
+	ip__resolve_data(self, thr, cpumode, &mi->daddr, sample->addr);
+	mi->cost = sample->cost;
+	mi->dsrc.val = sample->dsrc;
+
+	return mi;
+}
+
 struct branch_info *machine__resolve_bstack(struct machine *self,
 					    struct thread *thr,
 					    struct branch_stack *bs)
@@ -1006,6 +1041,15 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_STACK_USER)
 		stack_user__printf(&sample->user_stack);
+
+	if (sample_type & PERF_SAMPLE_ADDR)
+		printf(" ..... data: 0x%"PRIx64"\n", sample->addr);
+
+	if (sample_type & PERF_SAMPLE_COST)
+		printf(" ..... cost: %"PRIu64"\n", sample->cost);
+
+	if (sample_type & PERF_SAMPLE_DSRC)
+		printf(" . data_src: 0x%"PRIx64"\n", sample->dsrc);
 }
 
 static struct machine *
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index dd64261..37ab693 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -72,6 +72,10 @@ struct branch_info *machine__resolve_bstack(struct machine *self,
 					    struct thread *thread,
 					    struct branch_stack *bs);
 
+struct mem_info *machine__resolve_mem(struct machine *self,
+				      struct thread *thread,
+				      struct perf_sample *sample, u8 cpumode);
+
 bool perf_session__has_traces(struct perf_session *self, const char *msg);
 
 void mem_bswap_64(void *src, int byte_size);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..e2e466d 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -182,11 +182,19 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
 	}
 
 	ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", level);
-	if (sym)
-		ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
-				       width - ret,
-				       sym->name);
-	else {
+	if (sym) {
+		if (map->type == MAP__VARIABLE) {
+			ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
+			ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
+					ip - sym->start);
+			ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+				       width - ret, "");
+		} else {
+			ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+					       width - ret,
+					       sym->name);
+		}
+	} else {
 		size_t len = BITS_PER_LONG / 4;
 		ret += repsep_snprintf(bf + ret, size - ret, "%-#.*llx",
 				       len, ip);
@@ -469,6 +477,238 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *self, char *bf,
 	return repsep_snprintf(bf, size, "%-*s", width, out);
 }
 
+/* --sort daddr_sym */
+static int64_t
+sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	struct addr_map_symbol *l = &left->mem_info->daddr;
+	struct addr_map_symbol *r = &right->mem_info->daddr;
+
+	return (int64_t)(r->addr - l->addr);
+}
+
+static int hist_entry__daddr_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	return _hist_entry__sym_snprintf(self->mem_info->daddr.map,
+					 self->mem_info->daddr.sym,
+					 self->mem_info->daddr.addr,
+					 self->level, bf, size, width);
+}
+
+static int64_t
+sort__dso_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return _sort__dso_cmp(left->mem_info->daddr.map, right->mem_info->daddr.map);
+}
+
+static int hist_entry__dso_daddr_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	return _hist_entry__dso_snprintf(self->mem_info->daddr.map, bf, size, width);
+}
+
+static int64_t
+sort__cost_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	u64 cost_l = left->mem_info->cost;
+	u64 cost_r = right->mem_info->cost;
+
+	return (int64_t)(cost_r - cost_l);
+}
+
+static int hist_entry__cost_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	if (self->mem_info->cost == 0)
+		return repsep_snprintf(bf, size, "%*s", width, "N/A");
+	return repsep_snprintf(bf, size, "%*"PRIu64, width, self->mem_info->cost);
+}
+
+static int64_t
+sort__locked_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_lock - dsrc_l.mem_lock);
+}
+
+static int hist_entry__locked_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	const char *out = "??";
+	u64 mask = self->mem_info->dsrc.mem_lock;
+
+	if (mask & PERF_MEM_LOCK_NA)
+		out = "N/A";
+	else if (mask & PERF_MEM_LOCK_LOCKED)
+		out = "Yes";
+	else
+		out = "No";
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__tlb_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_dtlb - dsrc_l.mem_dtlb);
+}
+
+static const char *tlb_access[] = {
+	"N/A",
+	"HIT",
+	"MISS",
+	"L1",
+	"L2",
+	"Walker",
+	"Fault",
+};
+#define NUM_TLB_ACCESS (sizeof(tlb_access)/sizeof(const char *))
+
+static int hist_entry__tlb_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t l = 0, i;
+	u64 m = self->mem_info->dsrc.mem_dtlb;
+	u64 hit, miss;
+
+	out[0] = '\0';
+
+	hit = m & PERF_MEM_TLB_HIT;
+	miss = m & PERF_MEM_TLB_MISS;
+
+	/* already taken care of */
+	m &= ~(PERF_MEM_TLB_HIT|PERF_MEM_TLB_MISS);
+
+	for (i = 0; m && i < NUM_TLB_ACCESS; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, tlb_access[i], sz - l);
+		l += strlen(tlb_access[i]);
+	}
+	if (hit)
+		strncat(out, " hit", sz - l);
+	if (miss)
+		strncat(out, " miss", sz - l);
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__lvl_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_lvl - dsrc_l.mem_lvl);
+}
+
+static const char *mem_lvl[] = {
+	"N/A",
+	"HIT",
+	"MISS",
+	"L1",
+	"LFB",
+	"L2",
+	"L3",
+	"Local RAM",
+	"Remote RAM (1 hop)",
+	"Remote RAM (2 hops)",
+	"Remote Cache (1 hop)",
+	"Remote Cache (2 hops)",
+	"I/O",
+	"Uncached",
+};
+#define NUM_MEM_LVL (sizeof(mem_lvl)/sizeof(const char *))
+
+static int hist_entry__lvl_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t i, l = 0;
+	u64 m = self->mem_info->dsrc.mem_lvl;
+	u64 hit, miss;
+
+	out[0] = '\0';
+
+	hit = m & PERF_MEM_LVL_HIT;
+	miss = m & PERF_MEM_LVL_MISS;
+
+	/* already taken care of */
+	m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+
+	for (i = 0; m && i < NUM_MEM_LVL; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, mem_lvl[i], sz - l);
+		l += strlen(mem_lvl[i]);
+	}
+	if (hit)
+		strncat(out, " hit", sz - l);
+	if (miss)
+		strncat(out, " miss", sz - l);
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__snoop_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+	union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+	return (int64_t)(dsrc_r.mem_snoop - dsrc_l.mem_snoop);
+}
+
+static const char *snoop_access[] = {
+	"N/A",
+	"None",
+	"Miss",
+	"Hit",
+	"HitM",
+};
+#define NUM_SNOOP_ACCESS (sizeof(snoop_access)/sizeof(const char *))
+
+static int hist_entry__snoop_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	char out[64];
+	size_t sz = sizeof(out) - 1; /* -1 for null termination */
+	size_t i, l = 0;
+	u64 m = self->mem_info->dsrc.mem_snoop;
+
+	out[0] = '\0';
+
+	for (i = 0; m && i < NUM_SNOOP_ACCESS; i++, m >>= 1) {
+		if (!(m & 0x1))
+			continue;
+		if (l) {
+			strcat(out, " or ");
+			l += 4;
+		}
+		strncat(out, snoop_access[i], sz - l);
+		l += strlen(snoop_access[i]);
+	}
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
 struct sort_entry sort_mispredict = {
 	.se_header	= "Branch Mispredicted",
 	.se_cmp		= sort__mispredict_cmp,
@@ -476,6 +716,56 @@ struct sort_entry sort_mispredict = {
 	.se_width_idx	= HISTC_MISPREDICT,
 };
 
+struct sort_entry sort_mem_daddr_sym = {
+	.se_header	= "Data Symbol",
+	.se_cmp		= sort__daddr_cmp,
+	.se_snprintf	= hist_entry__daddr_snprintf,
+	.se_width_idx	= HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_daddr_dso = {
+	.se_header	= "Data Object",
+	.se_cmp		= sort__dso_daddr_cmp,
+	.se_snprintf	= hist_entry__dso_daddr_snprintf,
+	.se_width_idx	= HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_cost = {
+	.se_header	= "Cost",
+	.se_cmp		= sort__cost_cmp,
+	.se_snprintf	= hist_entry__cost_snprintf,
+	.se_width_idx	= HISTC_MEM_COST,
+};
+
+struct sort_entry sort_mem_locked = {
+	.se_header	= "Locked",
+	.se_cmp		= sort__locked_cmp,
+	.se_snprintf	= hist_entry__locked_snprintf,
+	.se_width_idx	= HISTC_MEM_LOCKED,
+};
+
+struct sort_entry sort_mem_tlb = {
+	.se_header	= "TLB access",
+	.se_cmp		= sort__tlb_cmp,
+	.se_snprintf	= hist_entry__tlb_snprintf,
+	.se_width_idx	= HISTC_MEM_TLB,
+};
+
+struct sort_entry sort_mem_lvl = {
+	.se_header	= "Memory access",
+	.se_cmp		= sort__lvl_cmp,
+	.se_snprintf	= hist_entry__lvl_snprintf,
+	.se_width_idx	= HISTC_MEM_LVL,
+};
+
+struct sort_entry sort_mem_snoop = {
+	.se_header	= "Snoop",
+	.se_cmp		= sort__snoop_cmp,
+	.se_snprintf	= hist_entry__snoop_snprintf,
+	.se_width_idx	= HISTC_MEM_SNOOP,
+};
+
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -497,6 +787,13 @@ static struct sort_dimension sort_dimensions[] = {
 	DIM(SORT_CPU, "cpu", sort_cpu),
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_SRCLINE, "srcline", sort_srcline),
+	DIM(SORT_MEM_DADDR_SYMBOL, "symbol_daddr", sort_mem_daddr_sym),
+	DIM(SORT_MEM_DADDR_DSO, "dso_daddr", sort_mem_daddr_dso),
+	DIM(SORT_MEM_COST, "cost", sort_mem_cost),
+	DIM(SORT_MEM_LOCKED, "locked", sort_mem_locked),
+	DIM(SORT_MEM_TLB, "tlb", sort_mem_tlb),
+	DIM(SORT_MEM_LVL, "mem", sort_mem_lvl),
+	DIM(SORT_MEM_SNOOP, "snoop", sort_mem_snoop),
 };
 
 int sort_dimension__add(const char *tok)
@@ -520,7 +817,8 @@ int sort_dimension__add(const char *tok)
 			sort__has_parent = 1;
 		} else if (sd->entry == &sort_sym ||
 			   sd->entry == &sort_sym_from ||
-			   sd->entry == &sort_sym_to) {
+			   sd->entry == &sort_sym_to ||
+			   sd->entry == &sort_mem_daddr_sym) {
 			sort__has_sym = 1;
 		}
 
@@ -553,6 +851,20 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_DSO_TO;
 			else if (!strcmp(sd->name, "mispredict"))
 				sort__first_dimension = SORT_MISPREDICT;
+			else if (!strcmp(sd->name, "symbol_daddr"))
+				sort__first_dimension = SORT_MEM_DADDR_SYMBOL;
+			else if (!strcmp(sd->name, "dso_daddr"))
+				sort__first_dimension = SORT_MEM_DADDR_DSO;
+			else if (!strcmp(sd->name, "cost"))
+				sort__first_dimension = SORT_MEM_COST;
+			else if (!strcmp(sd->name, "locked"))
+				sort__first_dimension = SORT_MEM_LOCKED;
+			else if (!strcmp(sd->name, "tlb"))
+				sort__first_dimension = SORT_MEM_TLB;
+			else if (!strcmp(sd->name, "mem_lvl"))
+				sort__first_dimension = SORT_MEM_LVL;
+			else if (!strcmp(sd->name, "snoop"))
+				sort__first_dimension = SORT_MEM_SNOOP;
 		}
 
 		list_add_tail(&sd->entry->list, &hist_entry__sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 13761d8..175eb4c 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -102,7 +102,8 @@ struct hist_entry {
 	};
 	struct branch_info	*branch_info;
 	struct hists		*hists;
-	struct callchain_root	callchain[0];
+	struct mem_info		*mem_info;
+	struct callchain_root	callchain[0]; /* must be last member */
 };
 
 enum sort_type {
@@ -118,6 +119,13 @@ enum sort_type {
 	SORT_SYM_TO,
 	SORT_MISPREDICT,
 	SORT_SRCLINE,
+	SORT_MEM_DADDR_SYMBOL,
+	SORT_MEM_DADDR_DSO,
+	SORT_MEM_COST,
+	SORT_MEM_LOCKED,
+	SORT_MEM_TLB,
+	SORT_MEM_LVL,
+	SORT_MEM_SNOOP,
 };
 
 /*
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 8b6ef7f..4ccc729 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -153,6 +153,13 @@ struct branch_info {
 	struct branch_flags flags;
 };
 
+struct mem_info {
+	struct addr_map_symbol iaddr;
+	struct addr_map_symbol daddr;
+	u64	cost;
+	union perf_mem_dsrc dsrc;
+};
+
 struct addr_location {
 	struct thread *thread;
 	struct map    *map;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 08/10] perf report: add support for mem access profiling
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (6 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 07/10] perf tools: add mem access sampling core support Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-31  6:01   ` Namhyung Kim
  2012-10-29 15:15 ` [Patch v1 09/10] perf record: " Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 10/10] perf tools: add new mem command for memory " Stephane Eranian
  9 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This patch adds the --mem-mode option to perf report.

This mode requires a perf.data file created with memory
access samples.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-report.c |  131 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 90d1162..4d2e4f5 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -47,6 +47,7 @@ struct perf_report {
 	bool			show_full_info;
 	bool			show_threads;
 	bool			inverted_callchain;
+	bool			mem_mode;
 	struct perf_read_values	show_threads_values;
 	const char		*pretty_printing_style;
 	symbol_filter_t		annotate_init;
@@ -55,6 +56,96 @@ struct perf_report {
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 };
 
+static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
+					   struct addr_location *al,
+					   struct perf_sample *sample,
+					   struct perf_evsel *evsel,
+					   struct machine *machine,
+					   union perf_event *event)
+{
+	struct perf_report *rep = container_of(tool, struct perf_report, tool);
+	struct symbol *parent = NULL;
+	u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+	int err = 0;
+	struct hist_entry *he;
+	struct mem_info *mi, *mx;
+	uint64_t cost;
+
+	if ((sort__has_parent || symbol_conf.use_callchain)
+	    && sample->callchain) {
+		err = machine__resolve_callchain(machine, evsel, al->thread,
+						 sample, &parent);
+		if (err)
+			return err;
+	}
+
+	mi = machine__resolve_mem(machine, al->thread, sample, cpumode);
+	if (!mi)
+		return -ENOMEM;
+
+	if (rep->hide_unresolved && !al->sym)
+		return 0;
+
+	cost = mi->cost;
+	if (!cost)
+		cost = 1;
+
+	/*
+	 * The report shows the percentage of total branches captured
+	 * and not events sampled. Thus we use a pseudo period of 1.
+	 * Only in the newt browser we are doing integrated annotation,
+	 * so we don't allocated the extra space needed because the stdio
+	 * code will not use it.
+	 */
+	he = __hists__add_mem_entry(&evsel->hists, al, parent, mi,
+				    cost);
+	if (!he)
+		return -ENOMEM;
+
+	if (sort__has_sym && he->ms.sym && use_browser > 0) {
+		struct annotation *notes = symbol__annotation(he->ms.sym);
+
+		assert(evsel != NULL);
+
+		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+			goto out;
+
+		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+		if (err)
+			goto out;
+	}
+
+	if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {
+		struct annotation *notes;
+
+		mx = he->mem_info;
+
+		notes = symbol__annotation(mx->daddr.sym);
+		if (!notes->src
+		    && symbol__alloc_hist(mx->daddr.sym) < 0)
+			goto out;
+
+		err = symbol__inc_addr_samples(mx->daddr.sym,
+					       mx->daddr.map,
+					       evsel->idx,
+					       mx->daddr.al_addr);
+		if (err)
+			goto out;
+	}
+
+	evsel->hists.stats.total_period += sample->period;
+	hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+	err = 0;
+
+	if (symbol_conf.use_callchain) {
+		err = callchain_append(he->callchain,
+				       &callchain_cursor,
+				       sample->period);
+	}
+out:
+	return err;
+}
+
 static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 					struct addr_location *al,
 					struct perf_sample *sample,
@@ -210,6 +301,12 @@ static int process_sample_event(struct perf_tool *tool,
 			pr_debug("problem adding lbr entry, skipping event\n");
 			return -1;
 		}
+	} else if (rep->mem_mode == 1) {
+		if (perf_report__add_mem_hist_entry(tool, &al, sample,
+						    evsel, machine, event)) {
+			pr_debug("problem adding mem entry, skipping event\n");
+			return -1;
+		}
 	} else {
 		if (al.map != NULL)
 			al.map->dso->hit = 1;
@@ -293,7 +390,8 @@ static void sig_handler(int sig __maybe_unused)
 	session_done = 1;
 }
 
-static size_t hists__fprintf_nr_sample_events(struct hists *self,
+static size_t hists__fprintf_nr_sample_events(struct perf_report *rep,
+					      struct hists *self,
 					      const char *evname, FILE *fp)
 {
 	size_t ret;
@@ -306,7 +404,11 @@ static size_t hists__fprintf_nr_sample_events(struct hists *self,
 	if (evname != NULL)
 		ret += fprintf(fp, " of event '%s'", evname);
 
-	ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
+	if (rep->mem_mode) {
+		ret += fprintf(fp, "\n# Total cost : %" PRIu64, nr_events);
+		ret += fprintf(fp, "\n# Sort order : %s", sort_order);
+	} else
+		ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
 	return ret + fprintf(fp, "\n#\n");
 }
 
@@ -320,7 +422,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		struct hists *hists = &pos->hists;
 		const char *evname = perf_evsel__name(pos);
 
-		hists__fprintf_nr_sample_events(hists, evname, stdout);
+		hists__fprintf_nr_sample_events(rep, hists, evname, stdout);
 		hists__fprintf(hists, true, 0, 0, stdout);
 		fprintf(stdout, "\n\n");
 	}
@@ -596,7 +698,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Use the stdio interface"),
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-		   " dso_from, symbol_to, symbol_from, mispredict"),
+		   " dso_from, symbol_to, symbol_from, mispredict, mem, cost, symbol_daddr, dso_daddr, tlb, snoop, locked"),
 	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
 		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
@@ -642,6 +744,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "use branch records for histogram filling", parse_branch_mode),
 	OPT_STRING(0, "objdump", &objdump_path, "path",
 		   "objdump binary to use for disassembly and annotations"),
+	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
 	OPT_END()
 	};
 
@@ -693,6 +796,18 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 				     "dso_to,symbol_to";
 
 	}
+	if (report.mem_mode) {
+		if (sort__branch_mode == 1) {
+			fprintf(stderr, "branch and mem mode incompatible\n");
+			goto error;
+		}
+		/*
+		 * if no sort_order is provided, then specify
+		 * branch-mode specific order
+		 */
+		if (sort_order == default_sort_order)
+			sort_order = "cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
+	}
 
 	if (strcmp(report.input_name, "-") != 0)
 		setup_browser(true);
@@ -764,6 +879,14 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		sort_entry__setup_elide(&sort_sym_from, symbol_conf.sym_from_list, "sym_from", stdout);
 		sort_entry__setup_elide(&sort_sym_to, symbol_conf.sym_to_list, "sym_to", stdout);
 	} else {
+		if (report.mem_mode) {
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "symbol_daddr", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso_daddr", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "mem", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "cost", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "tlb", stdout);
+			sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "snoop", stdout);
+		}
 		sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
 		sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 09/10] perf record: add support for mem access profiling
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (7 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 08/10] perf report: add support for mem access profiling Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-29 15:15 ` [Patch v1 10/10] perf tools: add new mem command for memory " Stephane Eranian
  9 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

Add the -l option to perf record to enable sampling
access cost sampling.

Data address sampling is obtained via the -d option.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-record.c |    2 ++
 tools/perf/perf.h           |    1 +
 tools/perf/util/evsel.c     |    6 ++++++
 3 files changed, 9 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 53c9892..cafe5d9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1040,6 +1040,8 @@ const struct option record_options[] = {
 	OPT_CALLBACK('j', "branch-filter", &record.opts.branch_stack,
 		     "branch filter mask", "branch stack filter modes",
 		     parse_branch_stack),
+	OPT_BOOLEAN('l', "cost", &record.opts.cost,
+		    "event cost"),
 	OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c50985e..ea2db82 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -236,6 +236,7 @@ struct perf_record_opts {
 	bool	     sample_id_all_missing;
 	bool	     exclude_guest_missing;
 	bool	     period;
+	bool         cost;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int user_freq;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c82a271..6f84fbc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -474,6 +474,12 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 		attr->sample_type	|= PERF_SAMPLE_CPU;
 	}
 
+	if (opts->cost)
+		attr->sample_type	|= PERF_SAMPLE_COST;
+
+	if (opts->sample_address)
+		attr->sample_type	|= PERF_SAMPLE_DSRC;
+
 	if (opts->no_delay) {
 		attr->watermark = 0;
 		attr->wakeup_events = 1;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [Patch v1 10/10] perf tools: add new mem command for memory access profiling
  2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
                   ` (8 preceding siblings ...)
  2012-10-29 15:15 ` [Patch v1 09/10] perf record: " Stephane Eranian
@ 2012-10-29 15:15 ` Stephane Eranian
  2012-10-31  6:57   ` Namhyung Kim
  9 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, ak, acme, jolsa, ming.m.lin

This new command is a wrapper on top of perf record and
perf report to make it easier to configure for memory
access profiling.

To record loads:
$ perf mem -t load rec .....

To record stores:
$ perf mem -t store rec .....

To get the report:
$ perf mem -t load rep

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-mem.txt |   51 +++++++
 tools/perf/Makefile                   |    1 +
 tools/perf/builtin-mem.c              |  237 +++++++++++++++++++++++++++++++++
 tools/perf/builtin.h                  |    1 +
 tools/perf/command-list.txt           |    1 +
 tools/perf/perf.c                     |    1 +
 6 files changed, 292 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-mem.txt
 create mode 100644 tools/perf/builtin-mem.c

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
new file mode 100644
index 0000000..57ead4f
--- /dev/null
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -0,0 +1,51 @@
+perf-mem(1)
+===========
+
+NAME
+----
+perf-mem - Profile memory accesses
+
+SYNOPSIS
+--------
+[verse]
+'perf mem' -t load record <command>
+'perf mem' -t store record <command>
+'perf mem' -t load report
+'perf mem' -t store report
+
+DESCRIPTION
+-----------
+"perf mem -t <TYPE> record" runs a command and gathers memory operation data
+from it, into perf.data. Perf record options are accepted and are passed through.
+
+"perf mem -t <TYPE> report" displays the result. It invokes perf report with the
+right set of options to display a memory access profile.
+
+OPTIONS
+-------
+<command>...::
+	Any command you can specify in a shell.
+
+-t::
+--type=::
+	Select the memory operation type: load or store
+
+-R::
+--dump-raw-samples=::
+	Dump the raw decoded samples on the screen in a format that is easy to parse with
+	one sample per line.
+
+-x::
+--field-separator::
+	Specify the field separator used when dump raw samples (-R option). By default,
+	The separator is the space character.
+
+-C::
+--cpu-list::
+	Restrict dump of raw samples to those provided via this option. Note that the same
+	option can be passed in record mode. It will be interpreted the same way as perf
+	record.
+
+SEE ALSO
+--------
+linkperf:perf-record[1], linkperf:perf-report[1]
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 629fc6a..643b316 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -459,6 +459,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
 BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
 BUILTIN_OBJS += $(OUTPUT)builtin-test.o
 BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
+BUILTIN_OBJS += $(OUTPUT)builtin-mem.o
 
 PERFLIBS = $(LIB_FILE) $(LIBTRACEEVENT)
 
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
new file mode 100644
index 0000000..5334f7d
--- /dev/null
+++ b/tools/perf/builtin-mem.c
@@ -0,0 +1,237 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/tool.h"
+#include "util/session.h"
+
+#define MEM_OPERATION_LOAD	"load"
+#define MEM_OPERATION_STORE	"store"
+
+static char const	*input_name		= "perf.data";
+static const char	*mem_operation		= MEM_OPERATION_LOAD;
+static const char	*csv_sep		= NULL;
+
+struct perf_mem {
+	struct perf_tool	tool;
+	char const		*input_name;
+	symbol_filter_t		annotate_init;
+	bool			hide_unresolved;
+	bool			dump_raw;
+	const char		*cpu_list;
+	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+};
+
+static const char * const mem_usage[] = {
+	"perf mem [<options>] {record <command> |report}",
+	NULL
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+	int rec_argc, i = 0, j;
+	const char **rec_argv;
+	char event[64];
+	int ret;
+
+	rec_argc = argc + 4;
+	rec_argv = calloc(rec_argc + 1, sizeof(char *));
+	if (!rec_argv)
+		return -1;
+
+	rec_argv[i++] = strdup("record");
+	if (!strcmp(mem_operation, MEM_OPERATION_LOAD))
+		rec_argv[i++] = strdup("-l");
+	rec_argv[i++] = strdup("-d");
+	rec_argv[i++] = strdup("-e");
+
+	if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+		sprintf(event, "cpu/mem-stores/pp");
+	else
+		sprintf(event, "cpu/mem-loads/pp");
+
+	rec_argv[i++] = strdup(event);
+	for (j = 1; j < argc; j++, i++)
+		rec_argv[i] = argv[j];
+
+	ret = cmd_record(i, rec_argv, NULL);
+	free(rec_argv);
+	return ret;
+}
+
+static int
+dump_raw_samples(struct perf_tool *tool,
+		 union perf_event *event,
+		 struct perf_sample *sample,
+		 struct perf_evsel *evsel __maybe_unused,
+		 struct machine *machine)
+{
+	struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
+	struct addr_location al;
+	const char *fmt;
+
+	if (perf_event__preprocess_sample(event, machine, &al, sample,
+				mem->annotate_init) < 0) {
+		fprintf(stderr, "problem processing %d event, skipping it.\n",
+				event->header.type);
+		return -1;
+	}
+
+	if (al.filtered || (mem->hide_unresolved && al.sym == NULL))
+		return 0;
+
+	if (al.map != NULL)
+		al.map->dso->hit = 1;
+
+	if (csv_sep) {
+		fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64"%s0x%"PRIx64"%s%s:%s\n";
+	} else {
+		fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64"%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
+		csv_sep = " ";
+	}
+
+	printf( fmt,
+		sample->pid,
+		csv_sep,
+		sample->tid,
+		csv_sep,
+		event->ip.ip,
+		csv_sep,
+		sample->addr,
+		csv_sep,
+		sample->cost,
+		csv_sep,
+		sample->dsrc,
+		csv_sep,
+		al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
+		al.sym ? al.sym->name : "???");
+
+	return 0;
+}
+
+static int process_sample_event(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample,
+                                struct perf_evsel *evsel,
+				struct machine *machine)
+{
+	return dump_raw_samples(tool, event, sample, evsel, machine);
+}
+
+static int report_raw_events(struct perf_mem *mem)
+{
+	int err = -EINVAL;
+	int ret;
+	struct perf_session *session = perf_session__new(input_name, O_RDONLY,
+							 0, false, &mem->tool);
+
+	if (mem->cpu_list) {
+		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
+					       mem->cpu_bitmap);
+		if (ret)
+			goto out_delete;
+	}
+
+	if (symbol__init() < 0)
+		return -1;
+
+	if (session == NULL)
+		return -ENOMEM;
+
+	printf("# PID, TID, IP, ADDR, COST, DSRC, SYMBOL\n");
+
+	err = perf_session__process_events(session, &mem->tool);
+	if (err)
+		return err;
+
+	return 0;
+
+out_delete:
+	perf_session__delete(session);
+	return err;
+}
+
+static int report_events(int argc, const char **argv, struct perf_mem *mem)
+{
+	const char **rep_argv;
+	int ret, i = 0, j, rep_argc;
+
+	if (mem->dump_raw)
+		return report_raw_events(mem);
+
+	rep_argc = argc + 3;
+	rep_argv = calloc(rep_argc + 1, sizeof(char *));
+	if (!rep_argv)
+		return -1;
+
+	rep_argv[i++] = strdup("report");
+	rep_argv[i++] = strdup("--mem-mode");
+	rep_argv[i++] = strdup("-n"); /* display number of samples */
+
+	/*
+	 * there is no cost associated with stores, so don't print
+	 * the column
+	 */
+	if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+		rep_argv[i++] = strdup("--sort=mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked");
+
+	for (j = 1; j < argc; j++, i++)
+		rep_argv[i] = argv[j];
+
+	ret = cmd_report(i, rep_argv, NULL);
+	free(rep_argv);
+	return ret;
+}
+
+int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+	struct stat st;
+	struct perf_mem mem = {
+		.tool = {
+			.sample		= process_sample_event,
+			.mmap		= perf_event__process_mmap,
+			.comm		= perf_event__process_comm,
+			.lost		= perf_event__process_lost,
+			.fork		= perf_event__process_fork,
+			.build_id	= perf_event__process_build_id,
+			.ordered_samples= true,
+		},
+		.input_name		 = "perf.data",
+	};
+	const struct option mem_options[] = {
+		OPT_STRING('t', "type", &mem_operation,
+			   "type", "memory operations(load/store)"),
+		OPT_BOOLEAN('R', "dump-raw-samples", &mem.dump_raw,
+			    "dump raw samples in ASCII"),
+		OPT_BOOLEAN('U', "hide-unresolved", &mem.hide_unresolved,
+			    "Only display entries resolved to a symbol"),
+		OPT_STRING('i', "input", &input_name, "file", "input file name"),
+		OPT_STRING('C', "cpu", &mem.cpu_list, "cpu", "list of cpus to profile"),
+		OPT_STRING('x', "field-separator", &csv_sep, "separator",
+			   "dump raw samples with custom separator"),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, mem_options, mem_usage,
+                        PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!argc || !(strncmp(argv[0], "rec", 3) || mem_operation))
+		usage_with_options(mem_usage, mem_options);
+
+	if (!mem.input_name || !strlen(mem.input_name)) {
+		if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
+			mem.input_name = "-";
+		else
+			mem.input_name = "perf.data";
+	}
+
+        if (!strncmp(argv[0], "rec", 3))
+		return __cmd_record(argc, argv);
+	else if (!strncmp(argv[0], "rep", 3))
+		return report_events(argc, argv, &mem);
+	else
+		usage_with_options(mem_usage, mem_options);
+
+	return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 08143bd..b210d62 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -36,6 +36,7 @@ extern int cmd_kvm(int argc, const char **argv, const char *prefix);
 extern int cmd_test(int argc, const char **argv, const char *prefix);
 extern int cmd_trace(int argc, const char **argv, const char *prefix);
 extern int cmd_inject(int argc, const char **argv, const char *prefix);
+extern int cmd_mem(int argc, const char **argv, const char *prefix);
 
 extern int find_scripts(char **scripts_array, char **scripts_path_array);
 #endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 3e86bbd..2c5b621 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -24,3 +24,4 @@ perf-kmem			mainporcelain common
 perf-lock			mainporcelain common
 perf-kvm			mainporcelain common
 perf-test			mainporcelain common
+perf-mem			mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index d480d8a..15995fa 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -59,6 +59,7 @@ static struct cmd_struct commands[] = {
 	{ "trace",	cmd_trace,	0 },
 #endif
 	{ "inject",	cmd_inject,	0 },
+	{ "mem",	cmd_mem,	0 },
 };
 
 struct pager_config {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
@ 2012-10-29 15:23   ` Peter Zijlstra
  2012-10-29 15:24     ` Stephane Eranian
  2012-10-29 15:35   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 15:23 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
> +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); 

I haven't fully grokked the macro magic yet, but event=0x100b seems
wrong, event only takes 8 bit on Intel.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:23   ` Peter Zijlstra
@ 2012-10-29 15:24     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa, Lin Ming

On Mon, Oct 29, 2012 at 4:23 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x100b,umask=0x1,ldlat=3");
>> +EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
>
> I haven't fully grokked the macro magic yet, but event=0x100b seems
> wrong, event only takes 8 bit on Intel.

Yeah, my bad. I messed up on this one. should be:
event=0x0b,umask=0x10,ldlat=3

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
  2012-10-29 15:23   ` Peter Zijlstra
@ 2012-10-29 15:35   ` Peter Zijlstra
  2012-10-29 15:39     ` Stephane Eranian
  2012-10-29 15:38   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 15:35 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +static u64 load_latency_data(u64 status)
> +{
> +       union intel_x86_pebs_dse dse;
> +       u64 val;
> +       int model = boot_cpu_data.x86_model;
> +       int fam = boot_cpu_data.x86;
> +
> +       dse.val = status;
> +
> +       /*
> +        * use the mapping table for bit 0-15
> +        */
> +       val = pebs_data_source[dse.ld_dse];
> +
> +       /*
> +        * Nehalem models do not support TLB, Lock infos
> +        */
> +       if (fam == 0x6 && (model == 26 || model == 30
> +           || model == 31 || model == 46)) {
> +               val |= P(TLB, NA) | P(LOCK, NA);
> +               return val;
> +       }

I'm so 100% sure we'll forget to add a nhm model number if we ever find
we missed one.

Could we either add a classification enum to x86_pmu that is set in the
big model switch on init, or do this with your new constraints flags,
where we have a different flag for NHM_LL vs SNB_LL or so?

Or if all else fails, add a quirk to the Intel Debugstore bits bitfield,
something like pebs_ll_nhm.

> +       /*
> +        * bit 4: TLB access
> +        * 0 = did not miss 2nd level TLB
> +        * 1 = missed 2nd level TLB
> +        */
> +       if (dse.ld_stlb_miss)
> +               val |= P(TLB, MISS) | P(TLB, L2);
> +       else
> +               val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
> +
> +       /*
> +        * bit 5: locked prefix
> +        */
> +       if (dse.ld_locked)
> +               val |= P(LOCK, LOCKED);
> +
> +       return val;
> +} 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
  2012-10-29 15:23   ` Peter Zijlstra
  2012-10-29 15:35   ` Peter Zijlstra
@ 2012-10-29 15:38   ` Peter Zijlstra
  2012-10-29 15:43     ` Stephane Eranian
  2012-10-29 19:42   ` Andi Kleen
  2012-10-30  8:43   ` Namhyung Kim
  4 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 15:38 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> +       fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
> +
>         perf_sample_data_init(&data, 0, event->hw.last_period);
>  
> +       data.period = event->hw.last_period;
> +       sample_type = event->attr.sample_type;
> +
> +       /*
> +        * if PEBS-LL or PreciseStore
> +        */
> +       if (fll) {
> +               if (sample_type & PERF_SAMPLE_ADDR)
> +                       data.addr = pebs->dla;
> +
> +               /*
> +                * Use latency for cost (only avail with PEBS-LL)
> +                */
> +               if (fll && (sample_type & PERF_SAMPLE_COST))
> +                       data.cost = pebs->lat;
> +
> +               /*
> +                * data.dsrc encodes the data source
> +                */
> +               if (sample_type & PERF_SAMPLE_DSRC) {
> +                       if (fll)
> +                               data.dsrc.val = load_latency_data(pebs->dse);
> +               }
> +       } 


Why does fl1 exist? the above looks really convoluted, all fl1 checks
inside are pointless. Also 'fl1' is a bizarre name, 'pebs_ll' would be a
better name I guess.

What's wrong with:

	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) {
		if (sample_type & PERF_SAMPLE_ADDR)
			data.addr = pebs->dla;

		if (sample_type & PERF_SAMPLE_COST)
			data.cost = perf->lat;

		if (sample_type & PERF_SAMPLE_DSRC)
			data.dsrc.val = load_latency_data(pebs->dse);
	}



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:35   ` Peter Zijlstra
@ 2012-10-29 15:39     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:39 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Oct 29, 2012 at 4:35 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +static u64 load_latency_data(u64 status)
>> +{
>> +       union intel_x86_pebs_dse dse;
>> +       u64 val;
>> +       int model = boot_cpu_data.x86_model;
>> +       int fam = boot_cpu_data.x86;
>> +
>> +       dse.val = status;
>> +
>> +       /*
>> +        * use the mapping table for bit 0-15
>> +        */
>> +       val = pebs_data_source[dse.ld_dse];
>> +
>> +       /*
>> +        * Nehalem models do not support TLB, Lock infos
>> +        */
>> +       if (fam == 0x6 && (model == 26 || model == 30
>> +           || model == 31 || model == 46)) {
>> +               val |= P(TLB, NA) | P(LOCK, NA);
>> +               return val;
>> +       }
>
> I'm so 100% sure we'll forget to add a nhm model number if we ever find
> we missed one.
>
> Could we either add a classification enum to x86_pmu that is set in the
> big model switch on init, or do this with your new constraints flags,
> where we have a different flag for NHM_LL vs SNB_LL or so?
>
Let's add something in x86_pmu. We could, for instance, generalize
x86_pmu.er_flags into x86_pmu.flags. Instead of adding yet another
flag field.

> Or if all else fails, add a quirk to the Intel Debugstore bits bitfield,
> something like pebs_ll_nhm.
>
>> +       /*
>> +        * bit 4: TLB access
>> +        * 0 = did not miss 2nd level TLB
>> +        * 1 = missed 2nd level TLB
>> +        */
>> +       if (dse.ld_stlb_miss)
>> +               val |= P(TLB, MISS) | P(TLB, L2);
>> +       else
>> +               val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
>> +
>> +       /*
>> +        * bit 5: locked prefix
>> +        */
>> +       if (dse.ld_locked)
>> +               val |= P(LOCK, LOCKED);
>> +
>> +       return val;
>> +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store
  2012-10-29 15:15 ` [Patch v1 06/10] perf/x86: add support for PEBS Precise Store Stephane Eranian
@ 2012-10-29 15:40   ` Peter Zijlstra
  2012-10-29 15:44     ` Stephane Eranian
  2012-10-31  5:21   ` Namhyung Kim
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 15:40 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
> -       if (fll) {
> +       if (fll || fst) {
>                 if (sample_type & PERF_SAMPLE_ADDR)
>                         data.addr = pebs->dla;
>  
> @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>                 if (sample_type & PERF_SAMPLE_DSRC) {
>                         if (fll)
>                                 data.dsrc.val = load_latency_data(pebs->dse);
> +                       else if (fst)
> +                               data.dsrc.val = precise_store_data(pebs->dse);
>                 }
>         }
>   

Ah, this is why I guess.. hrm.. ok, n/m then.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:38   ` Peter Zijlstra
@ 2012-10-29 15:43     ` Stephane Eranian
  2012-10-29 15:44       ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Oct 29, 2012 at 4:38 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> +       fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
>> +
>>         perf_sample_data_init(&data, 0, event->hw.last_period);
>>
>> +       data.period = event->hw.last_period;
>> +       sample_type = event->attr.sample_type;
>> +
>> +       /*
>> +        * if PEBS-LL or PreciseStore
>> +        */
>> +       if (fll) {
>> +               if (sample_type & PERF_SAMPLE_ADDR)
>> +                       data.addr = pebs->dla;
>> +
>> +               /*
>> +                * Use latency for cost (only avail with PEBS-LL)
>> +                */
>> +               if (fll && (sample_type & PERF_SAMPLE_COST))
>> +                       data.cost = pebs->lat;
>> +
>> +               /*
>> +                * data.dsrc encodes the data source
>> +                */
>> +               if (sample_type & PERF_SAMPLE_DSRC) {
>> +                       if (fll)
>> +                               data.dsrc.val = load_latency_data(pebs->dse);
>> +               }
>> +       }
>
>
> Why does fl1 exist? the above looks really convoluted, all fl1 checks
> inside are pointless. Also 'fl1' is a bizarre name, 'pebs_ll' would be a
> better name I guess.
>
You meant fll, instead I think.

> What's wrong with:
>
>         if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) {
>                 if (sample_type & PERF_SAMPLE_ADDR)
>                         data.addr = pebs->dla;
>
>                 if (sample_type & PERF_SAMPLE_COST)
>                         data.cost = perf->lat;
>
>                 if (sample_type & PERF_SAMPLE_DSRC)
>                         data.dsrc.val = load_latency_data(pebs->dse);
>         }
>
Well, that would work too, but I am trying to factorize the code
with Precise Store which is a later patch. But we could clone
your proposal and do:

>         if (event->hw.flags & PERF_X86_EVENT_PEBS_ST) {
>                 if (sample_type & PERF_SAMPLE_ADDR)
>                         data.addr = pebs->dla;
>
>                 if (sample_type & PERF_SAMPLE_DSRC)
>                         data.dsrc.val = precise_store_data(pebs->dse);
>         }

I am fine with that too.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:43     ` Stephane Eranian
@ 2012-10-29 15:44       ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 15:44 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, 2012-10-29 at 16:43 +0100, Stephane Eranian wrote:
> You meant fll, instead I think.

Oh, yes, too small font I guess.

> Well, that would work too, but I am trying to factorize the code
> with Precise Store which is a later patch. 

Yeah, just found that, its fine the way it is. Just looked funneh on
first reading.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store
  2012-10-29 15:40   ` Peter Zijlstra
@ 2012-10-29 15:44     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 15:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, mingo, ak, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Oct 29, 2012 at 4:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2012-10-29 at 16:15 +0100, Stephane Eranian wrote:
>> -       if (fll) {
>> +       if (fll || fst) {
>>                 if (sample_type & PERF_SAMPLE_ADDR)
>>                         data.addr = pebs->dla;
>>
>> @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>>                 if (sample_type & PERF_SAMPLE_DSRC) {
>>                         if (fll)
>>                                 data.dsrc.val = load_latency_data(pebs->dse);
>> +                       else if (fst)
>> +                               data.dsrc.val = precise_store_data(pebs->dse);
>>                 }
>>         }
>>
>
> Ah, this is why I guess.. hrm.. ok, n/m then.

Yes, it is. But your proposal and my extension for Precise Store look good too.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 07/10] perf tools: add mem access sampling core support
  2012-10-29 15:15 ` [Patch v1 07/10] perf tools: add mem access sampling core support Stephane Eranian
@ 2012-10-29 16:55   ` Andi Kleen
  2012-10-29 17:00     ` Stephane Eranian
  2012-10-31  5:51   ` Namhyung Kim
  1 sibling, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2012-10-29 16:55 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, acme, jolsa, ming.m.lin

On Mon, Oct 29, 2012 at 04:15:49PM +0100, Stephane Eranian wrote:
> This patch adds the sorting and histogram support
> functions to enable profiling of memory accesses.

I hope my weight patches get in soon.

Could you merge your cost with weight then?

Otherwise having both cost and weight would be a bit redundant.

-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 07/10] perf tools: add mem access sampling core support
  2012-10-29 16:55   ` Andi Kleen
@ 2012-10-29 17:00     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 17:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa,
	Lin Ming

On Mon, Oct 29, 2012 at 5:55 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Mon, Oct 29, 2012 at 04:15:49PM +0100, Stephane Eranian wrote:
>> This patch adds the sorting and histogram support
>> functions to enable profiling of memory accesses.
>
> I hope my weight patches get in soon.
>
> Could you merge your cost with weight then?
>
Those are very similar changes indeed. I need
to investigate whether I can use your directly.

> Otherwise having both cost and weight would be a bit redundant.
>
Yes, we don't want that.

I realized the title of my patch was confusing. Should read: cost support.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string
  2012-10-29 15:15 ` [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string Stephane Eranian
@ 2012-10-29 19:25   ` Andi Kleen
  0 siblings, 0 replies; 37+ messages in thread
From: Andi Kleen @ 2012-10-29 19:25 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, acme, jolsa, ming.m.lin

On Mon, Oct 29, 2012 at 04:15:43PM +0100, Stephane Eranian wrote:
> This patch extends Jiri's changes to make generic
> events mapping visible via sysfs. The patch extends
> the mechanism to non-generic events by allowing
> the mappings to be hardcoded in strings.
> 
> This mechanism will be used by the PEBS-LL patch
> later on.

I'm going to pull this patch into my Haswell patchkit to handle
the same problem. I originally had a different solution, but 
this works as well with the latest perf/core.

-Andi


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
                     ` (2 preceding siblings ...)
  2012-10-29 15:38   ` Peter Zijlstra
@ 2012-10-29 19:42   ` Andi Kleen
  2012-10-29 20:39     ` Stephane Eranian
  2012-10-30  8:43   ` Namhyung Kim
  4 siblings, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2012-10-29 19:42 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, acme, jolsa, ming.m.lin

> +
> +struct attribute *nhm_events_attrs[] = {
> +	EVENT_PTR(CPU_CYCLES),
> +	EVENT_PTR(INSTRUCTIONS),
> +	EVENT_PTR(CACHE_REFERENCES),
> +	EVENT_PTR(CACHE_MISSES),
> +	EVENT_PTR(BRANCH_INSTRUCTIONS),
> +	EVENT_PTR(BRANCH_MISSES),
> +	EVENT_PTR(BUS_CYCLES),
> +	EVENT_PTR(STALLED_CYCLES_FRONTEND),
> +	EVENT_PTR(STALLED_CYCLES_BACKEND),
> +	EVENT_PTR(REF_CPU_CYCLES),
> +	EVENT_PTR(mem_ld_nhm),
> +	NULL,
> +};

I thought Jiri's patch already exports all the generic ones?

Why do you need to replace the whole table?

BTW I still think my approach in the v4 Haswell patchkit 
is simpler and didn't rely on hardcoding these events.

-Andi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 19:42   ` Andi Kleen
@ 2012-10-29 20:39     ` Stephane Eranian
  2012-10-29 20:44       ` Peter Zijlstra
  2012-10-29 21:16       ` Andi Kleen
  0 siblings, 2 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 20:39 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Oct 29, 2012 at 8:42 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> +
>> +struct attribute *nhm_events_attrs[] = {
>> +     EVENT_PTR(CPU_CYCLES),
>> +     EVENT_PTR(INSTRUCTIONS),
>> +     EVENT_PTR(CACHE_REFERENCES),
>> +     EVENT_PTR(CACHE_MISSES),
>> +     EVENT_PTR(BRANCH_INSTRUCTIONS),
>> +     EVENT_PTR(BRANCH_MISSES),
>> +     EVENT_PTR(BUS_CYCLES),
>> +     EVENT_PTR(STALLED_CYCLES_FRONTEND),
>> +     EVENT_PTR(STALLED_CYCLES_BACKEND),
>> +     EVENT_PTR(REF_CPU_CYCLES),
>> +     EVENT_PTR(mem_ld_nhm),
>> +     NULL,
>> +};
>
> I thought Jiri's patch already exports all the generic ones?
>
> Why do you need to replace the whole table?
>
Because I am extending them with one or two events based on cpu
model. That was the easiest way of doing this instead of playing
some kind of malloc+copy trick.

> BTW I still think my approach in the v4 Haswell patchkit
> is simpler and didn't rely on hardcoding these events.
>
I don't care about those events. As I found out, they are not even
used by perf because they are all hardcoded and that's what gets
used. I assume they are exposed for reference only. I don't object
to that. But I think the right mechanism would be one where you
can add events at boot time based on CPU model. It could be used
to add the common events as well in the common part of the init
code.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 20:39     ` Stephane Eranian
@ 2012-10-29 20:44       ` Peter Zijlstra
  2012-10-29 21:16       ` Andi Kleen
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2012-10-29 20:44 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Andi Kleen, LKML, mingo, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, 2012-10-29 at 21:39 +0100, Stephane Eranian wrote:
> But I think the right mechanism would be one where you
> can add events at boot time based on CPU model. It could be used
> to add the common events as well in the common part of the init
> code. 

mlin once posted something like that, it did some scary things with
sysfs without using the driver model -- which might or might not be
'right' (TM).

If someone has the stomach to stare at that code and then convince
gregkh that its not 'wrong' I'm all for it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 20:39     ` Stephane Eranian
  2012-10-29 20:44       ` Peter Zijlstra
@ 2012-10-29 21:16       ` Andi Kleen
  2012-10-29 21:32         ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Andi Kleen @ 2012-10-29 21:16 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa

> > Why do you need to replace the whole table?
> >
> Because I am extending them with one or two events based on cpu
> model. That was the easiest way of doing this instead of playing
> some kind of malloc+copy trick.

I did malloc and copy.

> 
> > BTW I still think my approach in the v4 Haswell patchkit
> > is simpler and didn't rely on hardcoding these events.
> >
> I don't care about those events. As I found out, they are not even
> used by perf because they are all hardcoded and that's what gets
> used. I assume they are exposed for reference only. I don't object
> to that. But I think the right mechanism would be one where you
> can add events at boot time based on CPU model. It could be used
> to add the common events as well in the common part of the init
> code.

Yes that's what I did. 

I don't think copying everything for everything new is a good 
approach.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 21:16       ` Andi Kleen
@ 2012-10-29 21:32         ` Stephane Eranian
  2012-10-29 21:56           ` Andi Kleen
  0 siblings, 1 reply; 37+ messages in thread
From: Stephane Eranian @ 2012-10-29 21:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Oct 29, 2012 at 10:16 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> > Why do you need to replace the whole table?
>> >
>> Because I am extending them with one or two events based on cpu
>> model. That was the easiest way of doing this instead of playing
>> some kind of malloc+copy trick.
>
> I did malloc and copy.
>
>>
>> > BTW I still think my approach in the v4 Haswell patchkit
>> > is simpler and didn't rely on hardcoding these events.
>> >
>> I don't care about those events. As I found out, they are not even
>> used by perf because they are all hardcoded and that's what gets
>> used. I assume they are exposed for reference only. I don't object
>> to that. But I think the right mechanism would be one where you
>> can add events at boot time based on CPU model. It could be used
>> to add the common events as well in the common part of the init
>> code.
>
> Yes that's what I did.
>
> I don't think copying everything for everything new is a good
> approach.
>
I agree. I did that because it was the easiest thing I could think of.
Discovered I had to deal with all of this just two days ago when I
rebased. Wasn't too happy to have to deal with this at the last minute.

In any case, if you have something, then looks like I need to wait until
it's in before I can adjust this patch set.


> -Andi
>
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 21:32         ` Stephane Eranian
@ 2012-10-29 21:56           ` Andi Kleen
  0 siblings, 0 replies; 37+ messages in thread
From: Andi Kleen @ 2012-10-29 21:56 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, Peter Zijlstra, mingo, Arnaldo Carvalho de Melo, Jiri Olsa

> I agree. I did that because it was the easiest thing I could think of.
> Discovered I had to deal with all of this just two days ago when I
> rebased. Wasn't too happy to have to deal with this at the last minute.

I'll repost, as soon as I debugged why Jiri's version of my 31/33 parser patch
doesn't work.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency
  2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
                     ` (3 preceding siblings ...)
  2012-10-29 19:42   ` Andi Kleen
@ 2012-10-30  8:43   ` Namhyung Kim
  4 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2012-10-30  8:43 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, ming.m.lin

Hi Stephane,

On Mon, 29 Oct 2012 16:15:46 +0100, Stephane Eranian wrote:
> +	/*
> +	 * use the mapping table for bit 0-15
> +	 */
> +	val = pebs_data_source[dse.ld_dse];

I guess you meant bit 0-3, right?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store
  2012-10-29 15:15 ` [Patch v1 06/10] perf/x86: add support for PEBS Precise Store Stephane Eranian
  2012-10-29 15:40   ` Peter Zijlstra
@ 2012-10-31  5:21   ` Namhyung Kim
  2012-10-31 13:28     ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2012-10-31  5:21 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 29 Oct 2012 16:15:48 +0100, Stephane Eranian wrote:
> This patch adds support for PEBS Precise Store
> which is available on Intel Sandy Bridge and
> Ivy Bridge processors.
>
> To use Precise store, the proper PEBS event
> must be used: mem_trans_retired:precise_stores.
> For the perf tool, the generic mem-stores event
> exported via sysfs can be used directly.

Just trivial nitpicks..

>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
[snip]
> @@ -486,6 +524,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
>  	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
>  	INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
>  	INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
> +	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
>  	INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
>  	INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
>  	INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
> @@ -500,6 +539,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
>          INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
>          INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
>          INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
> +	INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */

White-space damaged?  Oh, it seems already broken with spaces.


>          INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
>          INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
>          INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
[snip]
> @@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>  	/*
>  	 * if PEBS-LL or PreciseStore
>  	 */
> -	if (fll) {
> +	if (fll || fst) {
>  		if (sample_type & PERF_SAMPLE_ADDR)
>  			data.addr = pebs->dla;
>  
> @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>  		if (sample_type & PERF_SAMPLE_DSRC) {
>  			if (fll)
>  				data.dsrc.val = load_latency_data(pebs->dse);
> +			else if (fst)

Looks like it can be converted to a plain 'else'.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 07/10] perf tools: add mem access sampling core support
  2012-10-29 15:15 ` [Patch v1 07/10] perf tools: add mem access sampling core support Stephane Eranian
  2012-10-29 16:55   ` Andi Kleen
@ 2012-10-31  5:51   ` Namhyung Kim
  2012-10-31 13:30     ` Stephane Eranian
  1 sibling, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2012-10-31  5:51 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 29 Oct 2012 16:15:49 +0100, Stephane Eranian wrote:
> This patch adds the sorting and histogram support
> functions to enable profiling of memory accesses.
>
> The following sorting orders are added:
>  - symbol_daddr: data address symbol (or raw address)
>  - dso_daddr: data address shared object
>  - cost: access cost
>  - locked: access uses locked transaction
>  - tlb : TLB access
>  - mem : memory level of the access (L1, L2, L3, RAM, ...)
>  - snoop: access snoop mode
>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
[snip]
> +/* --sort daddr_sym */
> +static int64_t
> +sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
> +{
> +	struct addr_map_symbol *l = &left->mem_info->daddr;
> +	struct addr_map_symbol *r = &right->mem_info->daddr;
> +
> +	return (int64_t)(r->addr - l->addr);
> +}

Doesn't it need to compare symbol (start address) if any, before doing
it with raw addresses?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 08/10] perf report: add support for mem access profiling
  2012-10-29 15:15 ` [Patch v1 08/10] perf report: add support for mem access profiling Stephane Eranian
@ 2012-10-31  6:01   ` Namhyung Kim
  0 siblings, 0 replies; 37+ messages in thread
From: Namhyung Kim @ 2012-10-31  6:01 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 29 Oct 2012 16:15:50 +0100, Stephane Eranian wrote:
> This patch adds the --mem-mode option to perf report.
>
> This mode requires a perf.data file created with memory
> access samples.
>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
[snip]
> +	cost = mi->cost;
> +	if (!cost)
> +		cost = 1;
> +
> +	/*
> +	 * The report shows the percentage of total branches captured
> +	 * and not events sampled. Thus we use a pseudo period of 1.

But cost won't be 1 anymore if PERF_SAMPLE_COST set, right?


> +	 * Only in the newt browser we are doing integrated annotation,
> +	 * so we don't allocated the extra space needed because the stdio
> +	 * code will not use it.

Yes, and gtk too.


> +	 */
> +	he = __hists__add_mem_entry(&evsel->hists, al, parent, mi,
> +				    cost);
> +	if (!he)
> +		return -ENOMEM;
> +
> +	if (sort__has_sym && he->ms.sym && use_browser > 0) {

So I'd rather write 'use_browser == 1' instead of '> 0'.


> +		struct annotation *notes = symbol__annotation(he->ms.sym);
> +
> +		assert(evsel != NULL);
> +
> +		if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
> +			goto out;
> +
> +		err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
> +		if (err)
> +			goto out;
> +	}
> +
> +	if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {

Ditto.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 10/10] perf tools: add new mem command for memory access profiling
  2012-10-29 15:15 ` [Patch v1 10/10] perf tools: add new mem command for memory " Stephane Eranian
@ 2012-10-31  6:57   ` Namhyung Kim
  2012-10-31 14:23     ` Stephane Eranian
  0 siblings, 1 reply; 37+ messages in thread
From: Namhyung Kim @ 2012-10-31  6:57 UTC (permalink / raw)
  To: Stephane Eranian; +Cc: linux-kernel, peterz, mingo, ak, acme, jolsa, ming.m.lin

On Mon, 29 Oct 2012 16:15:52 +0100, Stephane Eranian wrote:
> This new command is a wrapper on top of perf record and
> perf report to make it easier to configure for memory
> access profiling.

So this new command will be run only on speicific (PEBS > 2?) Intel
machines, right?  Is there anything we can do for others?  Or at least
it might emit a warning message..

>
> To record loads:
> $ perf mem -t load rec .....
>
> To record stores:
> $ perf mem -t store rec .....
>
> To get the report:
> $ perf mem -t load rep
>
> Signed-off-by: Stephane Eranian <eranian@google.com>
> ---
[snip]
> +perf-mem(1)
> +===========
> +
> +NAME
> +----
> +perf-mem - Profile memory accesses
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'perf mem' -t load record <command>
> +'perf mem' -t store record <command>
> +'perf mem' -t load report
> +'perf mem' -t store report

Is '-t' option mandatory?  AFAISC it seems optional and defaults to load.

And is <command> for record also mandatory?  Doesn't 'perf record' make
it optional?

If so, the above can be written like you did in 'mem_usage':

'perf mem' [<options>] (record [<command>] | report)


> +
> +DESCRIPTION
> +-----------
> +"perf mem -t <TYPE> record" runs a command and gathers memory operation data
> +from it, into perf.data. Perf record options are accepted and are passed through.
> +
> +"perf mem -t <TYPE> report" displays the result. It invokes perf report with the
> +right set of options to display a memory access profile.
> +
> +OPTIONS
> +-------
> +<command>...::
> +	Any command you can specify in a shell.
> +
> +-t::
> +--type=::
> +	Select the memory operation type: load or store

It'd better saying it defaults to load.

> +
> +-R::
> +--dump-raw-samples=::
> +	Dump the raw decoded samples on the screen in a format that is easy to parse with
> +	one sample per line.

Didn't we usually use -D switch for this?

> +
> +-x::
> +--field-separator::
> +	Specify the field separator used when dump raw samples (-R option). By default,
> +	The separator is the space character.

And using -t for this will make it consistent with perf report IMHO.

> +
> +-C::
> +--cpu-list::
> +	Restrict dump of raw samples to those provided via this option. Note that the same
> +	option can be passed in record mode. It will be interpreted the same way as perf
> +	record.
> +
> +SEE ALSO
> +--------
> +linkperf:perf-record[1], linkperf:perf-report[1]
[snip]
> +#define MEM_OPERATION_LOAD	"load"
> +#define MEM_OPERATION_STORE	"store"
> +
> +static char const	*input_name		= "perf.data";

We have a global 'input_name' as of commit 70cb4e963f77 ("perf tools:
Add a global variable 'const char *input_name'").


> +static const char	*mem_operation		= MEM_OPERATION_LOAD;
> +static const char	*csv_sep		= NULL;

Why not use symbol_conf.field_sep?

> +
> +struct perf_mem {
> +	struct perf_tool	tool;
> +	char const		*input_name;
> +	symbol_filter_t		annotate_init;
> +	bool			hide_unresolved;
> +	bool			dump_raw;
> +	const char		*cpu_list;
> +	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
> +};
> +
> +static const char * const mem_usage[] = {
> +	"perf mem [<options>] {record <command> |report}",
> +	NULL
> +};
[snip]
> +static int report_raw_events(struct perf_mem *mem)
> +{
> +	int err = -EINVAL;
> +	int ret;
> +	struct perf_session *session = perf_session__new(input_name, O_RDONLY,
> +							 0, false, &mem->tool);
> +
> +	if (mem->cpu_list) {
> +		ret = perf_session__cpu_bitmap(session, mem->cpu_list,
> +					       mem->cpu_bitmap);
> +		if (ret)
> +			goto out_delete;
> +	}
> +
> +	if (symbol__init() < 0)
> +		return -1;
> +
> +	if (session == NULL)
> +		return -ENOMEM;

This check should be moved before perf_session__cpu_bitmap() calls.

Thanks,
Namhyung

> +
> +	printf("# PID, TID, IP, ADDR, COST, DSRC, SYMBOL\n");
> +
> +	err = perf_session__process_events(session, &mem->tool);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +
> +out_delete:
> +	perf_session__delete(session);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 06/10] perf/x86: add support for PEBS Precise Store
  2012-10-31  5:21   ` Namhyung Kim
@ 2012-10-31 13:28     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-31 13:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Peter Zijlstra, mingo, ak, Arnaldo Carvalho de Melo,
	Jiri Olsa, Lin Ming

On Wed, Oct 31, 2012 at 6:21 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> On Mon, 29 Oct 2012 16:15:48 +0100, Stephane Eranian wrote:
>> This patch adds support for PEBS Precise Store
>> which is available on Intel Sandy Bridge and
>> Ivy Bridge processors.
>>
>> To use Precise store, the proper PEBS event
>> must be used: mem_trans_retired:precise_stores.
>> For the perf tool, the generic mem-stores event
>> exported via sysfs can be used directly.
>
> Just trivial nitpicks..
>
>>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> ---
> [snip]
>> @@ -486,6 +524,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
>>       INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
>>       INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
>>       INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
>> +     INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
>>       INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
>>       INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
>>       INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
>> @@ -500,6 +539,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
>>          INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
>>          INTEL_EVENT_CONSTRAINT(0xc5, 0xf),    /* BR_MISP_RETIRED.* */
>>          INTEL_PLD_CONSTRAINT(0x01cd, 0x8),    /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
>> +     INTEL_PST_CONSTRAINT(0x02cd, 0x8),    /* MEM_TRANS_RETIRED.PRECISE_STORES */
>
> White-space damaged?  Oh, it seems already broken with spaces.
>
Yes, it was already damaged with white spaces.

>
>>          INTEL_EVENT_CONSTRAINT(0xd0, 0xf),    /* MEM_UOP_RETIRED.* */
>>          INTEL_EVENT_CONSTRAINT(0xd1, 0xf),    /* MEM_LOAD_UOPS_RETIRED.* */
>>          INTEL_EVENT_CONSTRAINT(0xd2, 0xf),    /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
> [snip]
>> @@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>>       /*
>>        * if PEBS-LL or PreciseStore
>>        */
>> -     if (fll) {
>> +     if (fll || fst) {
>>               if (sample_type & PERF_SAMPLE_ADDR)
>>                       data.addr = pebs->dla;
>>
>> @@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>>               if (sample_type & PERF_SAMPLE_DSRC) {
>>                       if (fll)
>>                               data.dsrc.val = load_latency_data(pebs->dse);
>> +                     else if (fst)
>
> Looks like it can be converted to a plain 'else'.
>
Yeah. Cannot have fll && fst.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 07/10] perf tools: add mem access sampling core support
  2012-10-31  5:51   ` Namhyung Kim
@ 2012-10-31 13:30     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-31 13:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Peter Zijlstra, mingo, ak, Arnaldo Carvalho de Melo,
	Jiri Olsa, Lin Ming

On Wed, Oct 31, 2012 at 6:51 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> On Mon, 29 Oct 2012 16:15:49 +0100, Stephane Eranian wrote:
>> This patch adds the sorting and histogram support
>> functions to enable profiling of memory accesses.
>>
>> The following sorting orders are added:
>>  - symbol_daddr: data address symbol (or raw address)
>>  - dso_daddr: data address shared object
>>  - cost: access cost
>>  - locked: access uses locked transaction
>>  - tlb : TLB access
>>  - mem : memory level of the access (L1, L2, L3, RAM, ...)
>>  - snoop: access snoop mode
>>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> ---
> [snip]
>> +/* --sort daddr_sym */
>> +static int64_t
>> +sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
>> +{
>> +     struct addr_map_symbol *l = &left->mem_info->daddr;
>> +     struct addr_map_symbol *r = &right->mem_info->daddr;
>> +
>> +     return (int64_t)(r->addr - l->addr);
>> +}
>
> Doesn't it need to compare symbol (start address) if any, before doing
> it with raw addresses?
>
Possibly but we can't get to data symbols at the moment due to limitations
in kernel + perf tool.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [Patch v1 10/10] perf tools: add new mem command for memory access profiling
  2012-10-31  6:57   ` Namhyung Kim
@ 2012-10-31 14:23     ` Stephane Eranian
  0 siblings, 0 replies; 37+ messages in thread
From: Stephane Eranian @ 2012-10-31 14:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: LKML, Peter Zijlstra, mingo, ak, Arnaldo Carvalho de Melo,
	Jiri Olsa, Lin Ming

On Wed, Oct 31, 2012 at 7:57 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> On Mon, 29 Oct 2012 16:15:52 +0100, Stephane Eranian wrote:
>> This new command is a wrapper on top of perf record and
>> perf report to make it easier to configure for memory
>> access profiling.
>
> So this new command will be run only on speicific (PEBS > 2?) Intel
> machines, right?  Is there anything we can do for others?  Or at least
> it might emit a warning message..
>
>>
>> To record loads:
>> $ perf mem -t load rec .....
>>
>> To record stores:
>> $ perf mem -t store rec .....
>>
>> To get the report:
>> $ perf mem -t load rep
>>
>> Signed-off-by: Stephane Eranian <eranian@google.com>
>> ---
> [snip]
>> +perf-mem(1)
>> +===========
>> +
>> +NAME
>> +----
>> +perf-mem - Profile memory accesses
>> +
>> +SYNOPSIS
>> +--------
>> +[verse]
>> +'perf mem' -t load record <command>
>> +'perf mem' -t store record <command>
>> +'perf mem' -t load report
>> +'perf mem' -t store report
>
> Is '-t' option mandatory?  AFAISC it seems optional and defaults to load.
>
Yes, defaults to load. Fixed.

> And is <command> for record also mandatory?  Doesn't 'perf record' make
> it optional?
>
> If so, the above can be written like you did in 'mem_usage':
>
> 'perf mem' [<options>] (record [<command>] | report)
>
Fixed.

>
>> +
>> +DESCRIPTION
>> +-----------
>> +"perf mem -t <TYPE> record" runs a command and gathers memory operation data
>> +from it, into perf.data. Perf record options are accepted and are passed through.
>> +
>> +"perf mem -t <TYPE> report" displays the result. It invokes perf report with the
>> +right set of options to display a memory access profile.
>> +
>> +OPTIONS
>> +-------
>> +<command>...::
>> +     Any command you can specify in a shell.
>> +
>> +-t::
>> +--type=::
>> +     Select the memory operation type: load or store
>
> It'd better saying it defaults to load.
>
Done.

>> +
>> +-R::
>> +--dump-raw-samples=::
>> +     Dump the raw decoded samples on the screen in a format that is easy to parse with
>> +     one sample per line.
>
> Didn't we usually use -D switch for this?
>
Using -D to be consistent with report.

>> +
>> +-x::
>> +--field-separator::
>> +     Specify the field separator used when dump raw samples (-R option). By default,
>> +     The separator is the space character.
>
> And using -t for this will make it consistent with perf report IMHO.
>
>> +
No it's -x there too.

>> +-C::
>> +--cpu-list::
>> +     Restrict dump of raw samples to those provided via this option. Note that the same
>> +     option can be passed in record mode. It will be interpreted the same way as perf
>> +     record.
>> +
>> +SEE ALSO
>> +--------
>> +linkperf:perf-record[1], linkperf:perf-report[1]
> [snip]
>> +#define MEM_OPERATION_LOAD   "load"
>> +#define MEM_OPERATION_STORE  "store"
>> +
>> +static char const    *input_name             = "perf.data";
>
> We have a global 'input_name' as of commit 70cb4e963f77 ("perf tools:
> Add a global variable 'const char *input_name'").
>
>
Yes. Fixed.

>> +static const char    *mem_operation          = MEM_OPERATION_LOAD;
>> +static const char    *csv_sep                = NULL;
>
> Why not use symbol_conf.field_sep?
>
Done.

>> +
>> +struct perf_mem {
>> +     struct perf_tool        tool;
>> +     char const              *input_name;
>> +     symbol_filter_t         annotate_init;
>> +     bool                    hide_unresolved;
>> +     bool                    dump_raw;
>> +     const char              *cpu_list;
>> +     DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
>> +};
>> +
>> +static const char * const mem_usage[] = {
>> +     "perf mem [<options>] {record <command> |report}",
>> +     NULL
>> +};
> [snip]
>> +static int report_raw_events(struct perf_mem *mem)
>> +{
>> +     int err = -EINVAL;
>> +     int ret;
>> +     struct perf_session *session = perf_session__new(input_name, O_RDONLY,
>> +                                                      0, false, &mem->tool);
>> +
>> +     if (mem->cpu_list) {
>> +             ret = perf_session__cpu_bitmap(session, mem->cpu_list,
>> +                                            mem->cpu_bitmap);
>> +             if (ret)
>> +                     goto out_delete;
>> +     }
>> +
>> +     if (symbol__init() < 0)
>> +             return -1;
>> +
>> +     if (session == NULL)
>> +             return -ENOMEM;
>
> This check should be moved before perf_session__cpu_bitmap() calls.
>
Yes.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-10-31 14:23 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-29 15:15 [Patch v1 00/10] perf: add memory access sampling support Stephane Eranian
2012-10-29 15:15 ` [Patch v1 01/10] perf/x86: improve sysfs event mapping with event string Stephane Eranian
2012-10-29 19:25   ` Andi Kleen
2012-10-29 15:15 ` [Patch v1 02/10] perf/x86: add flags to event constraints Stephane Eranian
2012-10-29 15:15 ` [Patch v1 03/10] perf: add generic memory sampling interface Stephane Eranian
2012-10-29 15:15 ` [Patch v1 04/10] perf/x86: add memory profiling via PEBS Load Latency Stephane Eranian
2012-10-29 15:23   ` Peter Zijlstra
2012-10-29 15:24     ` Stephane Eranian
2012-10-29 15:35   ` Peter Zijlstra
2012-10-29 15:39     ` Stephane Eranian
2012-10-29 15:38   ` Peter Zijlstra
2012-10-29 15:43     ` Stephane Eranian
2012-10-29 15:44       ` Peter Zijlstra
2012-10-29 19:42   ` Andi Kleen
2012-10-29 20:39     ` Stephane Eranian
2012-10-29 20:44       ` Peter Zijlstra
2012-10-29 21:16       ` Andi Kleen
2012-10-29 21:32         ` Stephane Eranian
2012-10-29 21:56           ` Andi Kleen
2012-10-30  8:43   ` Namhyung Kim
2012-10-29 15:15 ` [Patch v1 05/10] perf/x86: export PEBS load latency threshold register to sysfs Stephane Eranian
2012-10-29 15:15 ` [Patch v1 06/10] perf/x86: add support for PEBS Precise Store Stephane Eranian
2012-10-29 15:40   ` Peter Zijlstra
2012-10-29 15:44     ` Stephane Eranian
2012-10-31  5:21   ` Namhyung Kim
2012-10-31 13:28     ` Stephane Eranian
2012-10-29 15:15 ` [Patch v1 07/10] perf tools: add mem access sampling core support Stephane Eranian
2012-10-29 16:55   ` Andi Kleen
2012-10-29 17:00     ` Stephane Eranian
2012-10-31  5:51   ` Namhyung Kim
2012-10-31 13:30     ` Stephane Eranian
2012-10-29 15:15 ` [Patch v1 08/10] perf report: add support for mem access profiling Stephane Eranian
2012-10-31  6:01   ` Namhyung Kim
2012-10-29 15:15 ` [Patch v1 09/10] perf record: " Stephane Eranian
2012-10-29 15:15 ` [Patch v1 10/10] perf tools: add new mem command for memory " Stephane Eranian
2012-10-31  6:57   ` Namhyung Kim
2012-10-31 14:23     ` Stephane Eranian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).