linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* perf PMU support for Haswell v4
@ 2012-10-26 20:29 Andi Kleen
  2012-10-26 20:29 ` [PATCH 01/33] perf, x86: Add PEBSv2 record support Andi Kleen
                   ` (32 more replies)
  0 siblings, 33 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo

[Updated version for the latest master tree and various fixes.
See end for details. This should be ready for merging now I hope.

Arnaldo, especially needs attention from you for the user space part.]

This adds perf PMU support for the upcoming Haswell core. The patchkit 
is fairly large, mainly due to various enhancement for TSX. TSX tuning
relies heavily on the PMU, so I tried hard to make all facilities 
easily available. In addition it also has some other enhancements.

This includes changes to the core perf code, to the x86 specific part,
to the perf user land tools and to KVM

High level overview:

- Basic Haswell PMU support
- Easy high level TSX measurement in perf stat -T
- Transaction events and attributes implemented with sysfs enumeration
- Export arch perfmon events in sysfs 
- Generic weightend profiling for memory latency and transaction abort costs.
- Support for address profiling
- Support for filtering events inside/outside transactions
- KVM support to do this from guests
- Support for filtering/sorting/bucketing transaction abort types based on 
PEBS information
- LBR support for transactions

For more details on the Haswell PMU please see the SDM. For more details on TSX
please see http://halobates.de/adding-lock-elision-to-linux.pdf

Some of the added features could be added to older CPUs too. I plan
to do this, but in separate patches.

Review appreciated.

v2: Removed generic transaction events and qualifiers and use sysfs
enumeration. Also export arch perfmon, so that the qualifiers work.
Fixed various issues this exposed. Don't use a special macro for the
TSX constraints anymore. Address other review feedback.
Added pdir event in sysfs.

v3: Fix various bugs and address review comments.
tx-aborts instead of cpu/tx-aborts/ works now (with some limitations)
cpu/instructions,intx=1/ works now

v4:
Addressed all review feedback (I hope). See changelog in individual patches.
KVM support now works again with more changes.
Forbid some more flag combinations that don't work well.

-Andi


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/33] perf, x86: Add PEBSv2 record support
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-29 10:08   ` Namhyung Kim
  2012-10-26 20:29 ` [PATCH 02/33] perf, x86: Basic Haswell PMU support v2 Andi Kleen
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add support for the v2 PEBS format. It has a superset of the v1 PEBS
fields, but has a longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which directly
gives the instruction, not off-by-one instruction. So with precise == 2
we use that directly and don't try to use LBRs and walking basic blocks.
This lowers the overhead significantly.

Some other features are added in later patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.c          |    2 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |  101 ++++++++++++++++++++++-------
 2 files changed, 79 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 3373f84..81b5e65 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -401,7 +401,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 		 * check that PEBS LBR correction does not conflict with
 		 * whatever the user is asking with attr->branch_sample_type
 		 */
-		if (event->attr.precise_ip > 1) {
+		if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
 			u64 *br_type = &event->attr.branch_sample_type;
 
 			if (has_branch_stack(event)) {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..9d0dae0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -41,6 +41,12 @@ struct pebs_record_nhm {
 	u64 status, dla, dse, lat;
 };
 
+struct pebs_record_v2 {
+	struct pebs_record_nhm nhm;
+	u64 eventingrip;
+	u64 tsx_tuning;
+};
+
 void init_debug_store_on_cpu(int cpu)
 {
 	struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -559,8 +565,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 {
 	/*
 	 * We cast to pebs_record_core since that is a subset of
-	 * both formats and we don't use the other fields in this
-	 * routine.
+	 * both formats.
 	 */
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct pebs_record_core *pebs = __pebs;
@@ -588,7 +593,10 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	regs.bp = pebs->bp;
 	regs.sp = pebs->sp;
 
-	if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
+	if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
+		regs.ip = ((struct pebs_record_v2 *)pebs)->eventingrip;
+		regs.flags |= PERF_EFLAGS_EXACT;
+	} else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
 		regs.flags |= PERF_EFLAGS_EXACT;
 	else
 		regs.flags &= ~PERF_EFLAGS_EXACT;
@@ -641,35 +649,21 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
 	__intel_pmu_pebs_event(event, iregs, at);
 }
 
-static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+static void intel_pmu_drain_pebs_common(struct pt_regs *iregs, void *at,
+					void *top)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct debug_store *ds = cpuc->ds;
-	struct pebs_record_nhm *at, *top;
 	struct perf_event *event = NULL;
 	u64 status = 0;
-	int bit, n;
-
-	if (!x86_pmu.pebs_active)
-		return;
-
-	at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
-	top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+	int bit;
 
 	ds->pebs_index = ds->pebs_buffer_base;
 
-	n = top - at;
-	if (n <= 0)
-		return;
+	for ( ; at < top; at += x86_pmu.pebs_record_size) {
+		struct pebs_record_nhm *p = at;
 
-	/*
-	 * Should not happen, we program the threshold at 1 and do not
-	 * set a reset value.
-	 */
-	WARN_ONCE(n > x86_pmu.max_pebs_events, "Unexpected number of pebs records %d\n", n);
-
-	for ( ; at < top; at++) {
-		for_each_set_bit(bit, (unsigned long *)&at->status, x86_pmu.max_pebs_events) {
+		for_each_set_bit(bit, (unsigned long *)&p->status, x86_pmu.max_pebs_events) {
 			event = cpuc->events[bit];
 			if (!test_bit(bit, cpuc->active_mask))
 				continue;
@@ -692,6 +686,61 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 	}
 }
 
+static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct debug_store *ds = cpuc->ds;
+	struct pebs_record_nhm *at, *top;
+	int n;
+
+	if (!x86_pmu.pebs_active)
+		return;
+
+	at  = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+	top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+
+	ds->pebs_index = ds->pebs_buffer_base;
+
+	n = top - at;
+	if (n <= 0)
+		return;
+
+	/*
+	 * Should not happen, we program the threshold at 1 and do not
+	 * set a reset value.
+	 */
+	WARN_ONCE(n > x86_pmu.max_pebs_events,
+		  "Unexpected number of pebs records %d\n", n);
+
+	return intel_pmu_drain_pebs_common(iregs, at, top);
+}
+
+static void intel_pmu_drain_pebs_v2(struct pt_regs *iregs)
+{
+	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+	struct debug_store *ds = cpuc->ds;
+	struct pebs_record_v2 *at, *top;
+	int n;
+
+	if (!x86_pmu.pebs_active)
+		return;
+
+	at  = (struct pebs_record_v2 *)(unsigned long)ds->pebs_buffer_base;
+	top = (struct pebs_record_v2 *)(unsigned long)ds->pebs_index;
+
+	n = top - at;
+	if (n <= 0)
+		return;
+	/*
+	 * Should not happen, we program the threshold at 1 and do not
+	 * set a reset value.
+	 */
+	WARN_ONCE(n > x86_pmu.max_pebs_events,
+		  "Unexpected number of pebs records %d\n", n);
+
+	return intel_pmu_drain_pebs_common(iregs, at, top);
+}
+
 /*
  * BTS, PEBS probe and setup
  */
@@ -723,6 +772,12 @@ void intel_ds_init(void)
 			x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
 			break;
 
+		case 2:
+			printk(KERN_CONT "PEBS fmt2%c, ", pebs_type);
+			x86_pmu.pebs_record_size = sizeof(struct pebs_record_v2);
+			x86_pmu.drain_pebs = intel_pmu_drain_pebs_v2;
+			break;
+
 		default:
 			printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
 			x86_pmu.pebs = 0;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/33] perf, x86: Basic Haswell PMU support v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
  2012-10-26 20:29 ` [PATCH 01/33] perf, x86: Add PEBSv2 record support Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 03/33] perf, x86: Basic Haswell PEBS support v3 Andi Kleen
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add basic Haswell PMU support.

Similar to SandyBridge, but has a few new events. Further
differences are handled in followon patches.

There are some new counter flags that need to be prevented
from being set on fixed counters.

Contains fixes from Stephane Eranian

v2: Folded TSX bits into standard FIXED_EVENT_CONSTRAINTS
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/perf_event.h      |    3 +++
 arch/x86/kernel/cpu/perf_event.h       |    5 ++++-
 arch/x86/kernel/cpu/perf_event_intel.c |   29 +++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 4fabcdf..4003bb6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
 
+#define HSW_INTX					(1ULL << 32)
+#define HSW_INTX_CHECKPOINTED				(1ULL << 33)
+
 #define AMD_PERFMON_EVENTSEL_GUESTONLY			(1ULL << 40)
 #define AMD_PERFMON_EVENTSEL_HOSTONLY			(1ULL << 41)
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 271d257..a9cac71 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -219,11 +219,14 @@ struct cpu_hw_events {
  *  - inv
  *  - edge
  *  - cnt-mask
+ *  - intx
+ *  - intx_cp
  *  The other filters are supported by fixed counters.
  *  The any-thread option is supported starting with v3.
  */
+#define FIXED_EVENT_FLAGS (X86_RAW_EVENT_MASK|HSW_INTX|HSW_INTX_CHECKPOINTED)
 #define FIXED_EVENT_CONSTRAINT(c, n)	\
-	EVENT_CONSTRAINT(c, (1ULL << (32+n)), X86_RAW_EVENT_MASK)
+	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
 /*
  * Constraint on the Event code + UMask
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 324bb52..b903eb0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -133,6 +133,17 @@ static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
 	EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_hsw_event_constraints[] =
+{
+	FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+	FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+	FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
+	INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.PENDING */
+	INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
+	INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
+	EVENT_CONSTRAINT_END
+};
+
 static u64 intel_pmu_event_map(int hw_event)
 {
 	return intel_perfmon_event_map[hw_event];
@@ -2098,6 +2109,24 @@ __init int intel_pmu_init(void)
 		break;
 
 
+	case 60: /* Haswell Client */
+	case 70:
+	case 71:
+		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+		       sizeof(hw_cache_event_ids));
+
+		intel_pmu_lbr_init_nhm();
+
+		x86_pmu.event_constraints = intel_hsw_event_constraints;
+
+		x86_pmu.extra_regs = intel_snb_extra_regs;
+		/* all extra regs are per-cpu when HT is on */
+		x86_pmu.er_flags |= ERF_HAS_RSP_1;
+		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+		pr_cont("Haswell events, ");
+		break;
+
 	default:
 		switch (x86_pmu.version) {
 		case 1:
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/33] perf, x86: Basic Haswell PEBS support v3
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
  2012-10-26 20:29 ` [PATCH 01/33] perf, x86: Add PEBSv2 record support Andi Kleen
  2012-10-26 20:29 ` [PATCH 02/33] perf, x86: Basic Haswell PMU support v2 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 04/33] perf, x86: Support the TSX intx/intx_cp qualifiers v2 Andi Kleen
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add basic PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new events.

v2: Readd missing pebs_aliases
v3: Readd missing hunk. Fix some constraints.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.h          |    2 ++
 arch/x86/kernel/cpu/perf_event_intel.c    |    6 ++++--
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   29 +++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a9cac71..e5da138 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -591,6 +591,8 @@ extern struct event_constraint intel_snb_pebs_event_constraints[];
 
 extern struct event_constraint intel_ivb_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_pebs_event_constraints[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index b903eb0..1770fb0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -826,7 +826,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
 		return true;
 
 	/* implicit branch sampling to correct PEBS skid */
-	if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1)
+	if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
+	    x86_pmu.intel_cap.pebs_format < 2)
 		return true;
 
 	return false;
@@ -2118,8 +2119,9 @@ __init int intel_pmu_init(void)
 		intel_pmu_lbr_init_nhm();
 
 		x86_pmu.event_constraints = intel_hsw_event_constraints;
-
+		x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_snb_extra_regs;
+		x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
 		/* all extra regs are per-cpu when HT is on */
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 9d0dae0..16d7c58 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -427,6 +427,35 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
         EVENT_CONSTRAINT_END
 };
 
+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+	INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+	INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+	INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
+	INTEL_EVENT_CONSTRAINT(0xc4, 0xf),    /* BR_INST_RETIRED.* */
+	INTEL_UEVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
+	INTEL_UEVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
+	INTEL_UEVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.NEAR_TAKEN */
+	INTEL_EVENT_CONSTRAINT(0xcd, 0x8),    /* MEM_TRANS_RETIRED.* */
+	INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+	INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
+	INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+	INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+	INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
+	INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+	INTEL_UEVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+	INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+	INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+	INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+	INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB */
+	INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
+	INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
+	INTEL_UEVENT_CONSTRAINT(0x02d3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
+	INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
+	INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
+
+	EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
 	struct event_constraint *c;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/33] perf, x86: Support the TSX intx/intx_cp qualifiers v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (2 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 03/33] perf, x86: Basic Haswell PEBS support v3 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3 Andi Kleen
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Implement the TSX transaction and checkpointed transaction qualifiers for
Haswell. This allows e.g. to profile the number of cycles in transactions.

The checkpointed qualifier requires forcing the event to
counter 2, implement this with a custom constraint for Haswell.

Also add sysfs format attributes for intx/intx_cp

[Updated from earlier version that used generic attributes, now does
raw + sysfs formats]
v2: Moved bad hunk. Forbid some bad combinations.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   61 ++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 1770fb0..9502c19 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -13,6 +13,7 @@
 #include <linux/slab.h>
 #include <linux/export.h>
 
+#include <asm/cpufeature.h>
 #include <asm/hardirq.h>
 #include <asm/apic.h>
 
@@ -1604,6 +1605,8 @@ PMU_FORMAT_ATTR(pc,	"config:19"	);
 PMU_FORMAT_ATTR(any,	"config:21"	); /* v3 + */
 PMU_FORMAT_ATTR(inv,	"config:23"	);
 PMU_FORMAT_ATTR(cmask,	"config:24-31"	);
+PMU_FORMAT_ATTR(intx,	"config:32"	);
+PMU_FORMAT_ATTR(intx_cp,"config:33"	);
 
 static struct attribute *intel_arch_formats_attr[] = {
 	&format_attr_event.attr,
@@ -1615,6 +1618,44 @@ static struct attribute *intel_arch_formats_attr[] = {
 	NULL,
 };
 
+static int hsw_hw_config(struct perf_event *event)
+{
+	int ret = intel_pmu_hw_config(event);
+
+	if (ret)
+		return ret;
+	if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
+		return 0;
+	event->hw.config |= event->attr.config & (HSW_INTX|HSW_INTX_CHECKPOINTED);
+
+	/* 
+ 	 * INTX/INTX-CP do not play well with PEBS or ANY thread mode.
+ 	 */
+	if ((event->hw.config & (HSW_INTX|HSW_INTX_CHECKPOINTED)) &&
+	     ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
+	      event->attr.precise_ip > 0))
+		return -EIO;
+	return 0;
+}
+
+static struct event_constraint counter2_constraint = 
+			EVENT_CONSTRAINT(0, 0x4, 0);
+
+static struct event_constraint *
+hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+	struct event_constraint *c = intel_get_event_constraints(cpuc, event);
+
+	/* Handle special quirk on intx_checkpointed only in counter 2 */
+	if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+		if (c->idxmsk64 & (1U << 2))
+			return &counter2_constraint;
+		return &emptyconstraint;
+	}
+
+	return c;
+}
+
 static __initconst const struct x86_pmu core_pmu = {
 	.name			= "core",
 	.handle_irq		= x86_pmu_handle_irq,
@@ -1753,6 +1794,23 @@ static struct attribute *intel_arch3_formats_attr[] = {
 	NULL,
 };
 
+/* Arch3 + TSX support */
+static struct attribute *intel_hsw_formats_attr[] __read_mostly = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_pc.attr,
+	&format_attr_any.attr,
+	&format_attr_inv.attr,
+	&format_attr_cmask.attr,
+	&format_attr_intx.attr,
+	&format_attr_intx_cp.attr,
+
+	&format_attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
+	NULL,
+};
+
+
 static __initconst const struct x86_pmu intel_pmu = {
 	.name			= "Intel",
 	.handle_irq		= intel_pmu_handle_irq,
@@ -2126,6 +2184,9 @@ __init int intel_pmu_init(void)
 		x86_pmu.er_flags |= ERF_HAS_RSP_1;
 		x86_pmu.er_flags |= ERF_NO_HT_SHARING;
 
+		x86_pmu.hw_config = hsw_hw_config;
+		x86_pmu.get_event_constraints = hsw_get_event_constraints;
+		x86_pmu.format_attrs = intel_hsw_formats_attr;
 		pr_cont("Haswell events, ");
 		break;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (3 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 04/33] perf, x86: Support the TSX intx/intx_cp qualifiers v2 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-30  9:25   ` Gleb Natapov
  2012-10-26 20:29 ` [PATCH 06/33] perf, x86: Support PERF_SAMPLE_ADDR on Haswell Andi Kleen
                   ` (27 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen, avi, gleb

From: Andi Kleen <ak@linux.intel.com>

This is not arch perfmon, but older CPUs will just ignore it. This makes
it possible to do at least some TSX measurements from a KVM guest

Cc: avi@redhat.com
Cc: gleb@redhat.com
v2: Various fixes to address review feedback
v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
Cc: gleb@redhat.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/pmu.c              |   34 ++++++++++++++++++++++++++--------
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b2e11f4..6783289 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -318,6 +318,7 @@ struct kvm_pmu {
 	u64 global_ovf_ctrl;
 	u64 counter_bitmask[2];
 	u64 global_ctrl_mask;
+	u64 cpuid_word9;
 	u8 version;
 	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index cfc258a..8bc954a 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
 
 static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		unsigned config, bool exclude_user, bool exclude_kernel,
-		bool intr)
+		bool intr, bool intx, bool intx_cp)
 {
 	struct perf_event *event;
 	struct perf_event_attr attr = {
@@ -173,6 +173,11 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		.exclude_kernel = exclude_kernel,
 		.config = config,
 	};
+	/* Will be ignored on CPUs that don't support this. */
+	if (intx)
+		attr.config |= HSW_INTX;
+	if (intx_cp)
+		attr.config |= HSW_INTX_CHECKPOINTED;
 
 	attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
 
@@ -206,7 +211,8 @@ static unsigned find_arch_event(struct kvm_pmu *pmu, u8 event_select,
 	return arch_events[i].event_type;
 }
 
-static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
+static void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, 
+				 u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
 	u8 event_select, unit_mask;
@@ -224,9 +230,16 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
 	unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
 
+	if (!(boot_cpu_has(X86_FEATURE_HLE) ||
+	      boot_cpu_has(X86_FEATURE_RTM)) ||
+	    !(pmu->cpuid_word9 & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
+		eventsel &= ~(HSW_INTX|HSW_INTX_CHECKPOINTED);
+
 	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
 				ARCH_PERFMON_EVENTSEL_INV |
-				ARCH_PERFMON_EVENTSEL_CMASK))) {
+				ARCH_PERFMON_EVENTSEL_CMASK |
+				HSW_INTX |
+				HSW_INTX_CHECKPOINTED))) {
 		config = find_arch_event(&pmc->vcpu->arch.pmu, event_select,
 				unit_mask);
 		if (config != PERF_COUNT_HW_MAX)
@@ -239,7 +252,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	reprogram_counter(pmc, type, config,
 			!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
 			!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-			eventsel & ARCH_PERFMON_EVENTSEL_INT);
+			eventsel & ARCH_PERFMON_EVENTSEL_INT,
+			(eventsel & HSW_INTX),
+			(eventsel & HSW_INTX_CHECKPOINTED));
 }
 
 static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
@@ -256,7 +271,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
 			arch_events[fixed_pmc_events[idx]].event_type,
 			!(en & 0x2), /* exclude user */
 			!(en & 0x1), /* exclude kernel */
-			pmi);
+			pmi, false, false);
 }
 
 static inline u8 fixed_en_pmi(u64 ctrl, int idx)
@@ -289,7 +304,7 @@ static void reprogram_idx(struct kvm_pmu *pmu, int idx)
 		return;
 
 	if (pmc_is_gp(pmc))
-		reprogram_gp_counter(pmc, pmc->eventsel);
+		reprogram_gp_counter(pmu, pmc, pmc->eventsel);
 	else {
 		int fidx = idx - INTEL_PMC_IDX_FIXED;
 		reprogram_fixed_counter(pmc,
@@ -400,8 +415,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
 			if (data == pmc->eventsel)
 				return 0;
-			if (!(data & 0xffffffff00200000ull)) {
-				reprogram_gp_counter(pmc, data);
+			if (!(data & 0xfffffffc00200000ull)) {
+				reprogram_gp_counter(pmu, pmc, data);
 				return 0;
 			}
 		}
@@ -470,6 +485,9 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
 	pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
 		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
 	pmu->global_ctrl_mask = ~pmu->global_ctrl;
+
+	entry = kvm_find_cpuid_entry(vcpu, 7, 0);
+	pmu->cpuid_word9 = entry ? entry->ebx : 0;
 }
 
 void kvm_pmu_init(struct kvm_vcpu *vcpu)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/33] perf, x86: Support PERF_SAMPLE_ADDR on Haswell
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (4 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 07/33] perf, x86: Support Haswell v4 LBR format Andi Kleen
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Haswell supplies the address for every PEBS memory event, so always fill it in
when the user requested it.  It will be 0 when not useful (no memory access)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 16d7c58..aa0f5fa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -630,6 +630,10 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	else
 		regs.flags &= ~PERF_EFLAGS_EXACT;
 
+	if ((event->attr.sample_type & PERF_SAMPLE_ADDR) &&
+		x86_pmu.intel_cap.pebs_format >= 2)
+		data.addr = ((struct pebs_record_v2 *)pebs)->nhm.dla;
+
 	if (has_branch_stack(event))
 		data.br_stack = &cpuc->lbr_stack;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/33] perf, x86: Support Haswell v4 LBR format
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (5 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 06/33] perf, x86: Support PERF_SAMPLE_ADDR on Haswell Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 08/33] perf, x86: Disable LBR recording for unknown LBR_FMT Andi Kleen
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the LBR format.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   18 +++++++++++++++---
 include/linux/perf_event.h                 |    7 ++++++-
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
 	LBR_FORMAT_LIP		= 0x01,
 	LBR_FORMAT_EIP		= 0x02,
 	LBR_FORMAT_EIP_FLAGS	= 0x03,
+	LBR_FORMAT_EIP_FLAGS2	= 0x04,
 };
 
 /*
@@ -56,6 +57,8 @@ enum {
 	 LBR_FAR)
 
 #define LBR_FROM_FLAG_MISPRED  (1ULL << 63)
+#define LBR_FROM_FLAG_INTX     (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT    (1ULL << 61)
 
 #define for_each_branch_sample_type(x) \
 	for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
 
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		unsigned long lbr_idx = (tos - i) & mask;
-		u64 from, to, mis = 0, pred = 0;
+		u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;
 
 		rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
 		rdmsrl(x86_pmu.lbr_to   + lbr_idx, to);
 
-		if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+		if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+		    lbr_format == LBR_FORMAT_EIP_FLAGS2) {
 			mis = !!(from & LBR_FROM_FLAG_MISPRED);
 			pred = !mis;
-			from = (u64)((((s64)from) << 1) >> 1);
+			if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+				from = (u64)((((s64)from) << 1) >> 1);
+			else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+				intx = !!(from & LBR_FROM_FLAG_INTX);
+				abort = !!(from & LBR_FROM_FLAG_ABORT);
+				from = (u64)((((s64)from) << 3) >> 3);
+			}
 		}
 
 		cpuc->lbr_entries[i].from	= from;
 		cpuc->lbr_entries[i].to		= to;
 		cpuc->lbr_entries[i].mispred	= mis;
 		cpuc->lbr_entries[i].predicted	= pred;
+		cpuc->lbr_entries[i].intx	= intx;
+		cpuc->lbr_entries[i].abort	= abort;
 		cpuc->lbr_entries[i].reserved	= 0;
 	}
 	cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e90235..0e528fc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -74,13 +74,18 @@ struct perf_raw_record {
  *
  * support for mispred, predicted is optional. In case it
  * is not supported mispred = predicted = 0.
+ *
+ *     intx: running in a hardware transaction
+ *     abort: aborting a hardware transaction
  */
 struct perf_branch_entry {
 	__u64	from;
 	__u64	to;
 	__u64	mispred:1,  /* target mispredicted */
 		predicted:1,/* target predicted */
-		reserved:62;
+		intx:1,	    /* in transaction */
+		abort:1,    /* transaction abort */
+		reserved:60;
 };
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/33] perf, x86: Disable LBR recording for unknown LBR_FMT
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (6 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 07/33] perf, x86: Support Haswell v4 LBR format Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 09/33] perf, x86: Support LBR filtering by INTX/NOTX/ABORT v2 Andi Kleen
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When the LBR format is unknown disable LBR recording. This prevents
crashes when the LBR address is misdecoded and mis-sign extended.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index 2af6695b..ad5af13 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -13,6 +13,7 @@ enum {
 	LBR_FORMAT_EIP		= 0x02,
 	LBR_FORMAT_EIP_FLAGS	= 0x03,
 	LBR_FORMAT_EIP_FLAGS2	= 0x04,
+	LBR_FORMAT_MAX_KNOWN	= LBR_FORMAT_EIP_FLAGS2,
 };
 
 /*
@@ -392,7 +393,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
 	/*
 	 * no LBR on this PMU
 	 */
-	if (!x86_pmu.lbr_nr)
+	if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > LBR_FORMAT_MAX_KNOWN)
 		return -EOPNOTSUPP;
 
 	/*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/33] perf, x86: Support LBR filtering by INTX/NOTX/ABORT v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (7 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 08/33] perf, x86: Disable LBR recording for unknown LBR_FMT Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2 Andi Kleen
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add LBR filtering for branch in transaction, branch not in transaction
or transaction abort. This is exposed as new sample types.

v2: Rename ABORT to ABORTTX
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_lbr.c |   31 +++++++++++++++++++++++++--
 include/uapi/linux/perf_event.h            |    5 +++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index ad5af13..5455a00 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -85,9 +85,13 @@ enum {
 	X86_BR_JMP      = 1 << 9, /* jump */
 	X86_BR_IRQ      = 1 << 10,/* hw interrupt or trap or fault */
 	X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+	X86_BR_ABORT    = 1 << 12,/* transaction abort */
+	X86_BR_INTX     = 1 << 13,/* in transaction */
+	X86_BR_NOTX     = 1 << 14,/* not in transaction */
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
+#define X86_BR_ANYTX (X86_BR_NOTX | X86_BR_INTX)
 
 #define X86_BR_ANY       \
 	(X86_BR_CALL    |\
@@ -99,6 +103,7 @@ enum {
 	 X86_BR_JCC     |\
 	 X86_BR_JMP	 |\
 	 X86_BR_IRQ	 |\
+	 X86_BR_ABORT	 |\
 	 X86_BR_IND_CALL)
 
 #define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
@@ -347,6 +352,16 @@ static void intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 
 	if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
 		mask |= X86_BR_IND_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_ABORTTX)
+		mask |= X86_BR_ABORT;
+
+	if (br_type & PERF_SAMPLE_BRANCH_INTX)
+		mask |= X86_BR_INTX;
+
+	if (br_type & PERF_SAMPLE_BRANCH_NOTX)
+		mask |= X86_BR_NOTX;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -393,7 +408,8 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
 	/*
 	 * no LBR on this PMU
 	 */
-	if (!x86_pmu.lbr_nr || x86_pmu.intel_cap.lbr_format > LBR_FORMAT_MAX_KNOWN)
+	if (!x86_pmu.lbr_nr ||
+	    x86_pmu.intel_cap.lbr_format > LBR_FORMAT_MAX_KNOWN)
 		return -EOPNOTSUPP;
 
 	/*
@@ -421,7 +437,7 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
  * decoded (e.g., text page not present), then X86_BR_NONE is
  * returned.
  */
-static int branch_type(unsigned long from, unsigned long to)
+static int branch_type(unsigned long from, unsigned long to, int abort)
 {
 	struct insn insn;
 	void *addr;
@@ -441,6 +457,9 @@ static int branch_type(unsigned long from, unsigned long to)
 	if (from == 0 || to == 0)
 		return X86_BR_NONE;
 
+	if (abort)
+		return X86_BR_ABORT | to_plm;
+
 	if (from_plm == X86_BR_USER) {
 		/*
 		 * can happen if measuring at the user level only
@@ -577,7 +596,13 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 		from = cpuc->lbr_entries[i].from;
 		to = cpuc->lbr_entries[i].to;
 
-		type = branch_type(from, to);
+		type = branch_type(from, to, cpuc->lbr_entries[i].abort);
+		if (type != X86_BR_NONE && (br_sel & X86_BR_ANYTX)) {
+			if (cpuc->lbr_entries[i].intx)
+				type |= X86_BR_INTX;
+			else
+				type |= X86_BR_NOTX;
+		}
 
 		/* if type does not correspond, then discard */
 		if (type == X86_BR_NONE || (br_sel & type) != type) {
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..8e38823 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -155,8 +155,11 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_ANY_CALL	= 1U << 4, /* any call branch */
 	PERF_SAMPLE_BRANCH_ANY_RETURN	= 1U << 5, /* any return branch */
 	PERF_SAMPLE_BRANCH_IND_CALL	= 1U << 6, /* indirect calls */
+	PERF_SAMPLE_BRANCH_ABORTTX	= 1U << 7, /* transaction aborts */
+	PERF_SAMPLE_BRANCH_INTX		= 1U << 8, /* in transaction (flag) */
+	PERF_SAMPLE_BRANCH_NOTX		= 1U << 9, /* not in transaction (flag) */
 
-	PERF_SAMPLE_BRANCH_MAX		= 1U << 7, /* non-ABI */
+	PERF_SAMPLE_BRANCH_MAX		= 1U << 10, /* non-ABI */
 };
 
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (8 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 09/33] perf, x86: Support LBR filtering by INTX/NOTX/ABORT v2 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-29 10:19   ` Namhyung Kim
  2012-10-26 20:29 ` [PATCH 11/33] perf, tools: Support sorting by intx, abort branch flags Andi Kleen
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Make perf report -j aware of the new intx,notx,abort branch qualifiers.

v2: ABORT -> ABORTTX
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt |    3 +++
 tools/perf/builtin-record.c              |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b38a1f9..4b9f477 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -172,6 +172,9 @@ following filters are defined:
         - u:  only when the branch target is at the user level
         - k: only when the branch target is in the kernel
         - hv: only when the target is at the hypervisor level
+	- intx: only when the target is in a hardware transaction
+	- notx: only when the target is not in a hardware transaction
+	- aborttx: only when the target is a hardware transaction abort
 
 +
 The option requires at least one branch type among any, any_call, any_ret, ind_call.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e9231659..88ecbbd 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -725,6 +725,9 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
 	BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
 	BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
+	BRANCH_OPT("aborttx", PERF_SAMPLE_BRANCH_ABORTTX),
+	BRANCH_OPT("intx", PERF_SAMPLE_BRANCH_INTX),
+	BRANCH_OPT("notx", PERF_SAMPLE_BRANCH_NOTX),
 	BRANCH_END
 };
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/33] perf, tools: Support sorting by intx, abort branch flags
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (9 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 12/33] perf, x86: Support full width counting Andi Kleen
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Extend the perf branch sorting code to support sorting by intx
or abort qualifiers. Also print out those qualifiers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-report.c |    3 +-
 tools/perf/builtin-top.c    |    4 ++-
 tools/perf/perf.h           |    4 ++-
 tools/perf/util/hist.h      |    2 +
 tools/perf/util/sort.c      |   55 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/sort.h      |    2 +
 6 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a61725d..d46f887 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -595,7 +595,8 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Use the stdio interface"),
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
-		   " dso_from, symbol_to, symbol_from, mispredict"),
+		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+		   " abort, intx"),
 	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
 		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ff6db80..3861118 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1221,7 +1221,9 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_INCR('v', "verbose", &verbose,
 		    "be more verbose (show counter open errors, etc)"),
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
-		   "sort by key(s): pid, comm, dso, symbol, parent"),
+		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
+		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
+		   " abort, intx"),
 	OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples,
 		    "Show a column with the number of samples"),
 	OPT_CALLBACK_DEFAULT('G', "call-graph", &top, "output_type,min_percent, call_order",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c50985e..22a5502 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -194,7 +194,9 @@ struct ip_callchain {
 struct branch_flags {
 	u64 mispred:1;
 	u64 predicted:1;
-	u64 reserved:62;
+	u64 intx:1;
+	u64 abort:1;
+	u64 reserved:60;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 66cb31f..d918a1a 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -43,6 +43,8 @@ enum hist_column {
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_MISPREDICT,
+	HISTC_INTX,
+	HISTC_ABORT,
 	HISTC_SYMBOL_FROM,
 	HISTC_SYMBOL_TO,
 	HISTC_DSO_FROM,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..a8d1f1a 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -476,6 +476,55 @@ struct sort_entry sort_mispredict = {
 	.se_width_idx	= HISTC_MISPREDICT,
 };
 
+static int64_t
+sort__abort_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->branch_info->flags.abort !=
+		right->branch_info->flags.abort;
+}
+
+static int hist_entry__abort_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	static const char *out = ".";
+
+	if (self->branch_info->flags.abort)
+		out = "A";
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_abort = {
+	.se_header	= "Transaction abort",
+	.se_cmp		= sort__abort_cmp,
+	.se_snprintf	= hist_entry__abort_snprintf,
+	.se_width_idx	= HISTC_ABORT,
+};
+
+static int64_t
+sort__intx_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->branch_info->flags.intx !=
+		right->branch_info->flags.intx;
+}
+
+static int hist_entry__intx_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	static const char *out = ".";
+
+	if (self->branch_info->flags.intx)
+		out = "T";
+
+	return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+struct sort_entry sort_intx = {
+	.se_header	= "Branch in transaction",
+	.se_cmp		= sort__intx_cmp,
+	.se_snprintf	= hist_entry__intx_snprintf,
+	.se_width_idx	= HISTC_INTX,
+};
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -497,6 +546,8 @@ static struct sort_dimension sort_dimensions[] = {
 	DIM(SORT_CPU, "cpu", sort_cpu),
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_SRCLINE, "srcline", sort_srcline),
+	DIM(SORT_ABORT, "abort", sort_abort),
+	DIM(SORT_INTX, "intx", sort_intx)
 };
 
 int sort_dimension__add(const char *tok)
@@ -553,6 +604,10 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_DSO_TO;
 			else if (!strcmp(sd->name, "mispredict"))
 				sort__first_dimension = SORT_MISPREDICT;
+			else if (!strcmp(sd->name, "intx"))
+				sort__first_dimension = SORT_INTX;
+			else if (!strcmp(sd->name, "abort"))
+				sort__first_dimension = SORT_ABORT;
 		}
 
 		list_add_tail(&sd->entry->list, &hist_entry__sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 5786f32..10cf155 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -100,6 +100,8 @@ enum sort_type {
 	SORT_SYM_TO,
 	SORT_MISPREDICT,
 	SORT_SRCLINE,
+	SORT_ABORT,
+	SORT_INTX,
 };
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/33] perf, x86: Support full width counting
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (10 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 11/33] perf, tools: Support sorting by intx, abort branch flags Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 13/33] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3 Andi Kleen
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Recent Intel CPUs have a new alternative MSR range for perfctrs that allows
writing the full counter width. Enable this range if the hardware reports it
using a new capability bit. This lowers overhead of perf stat slightly because
it has to do less interrupts to accumulate the counter value. On Haswell it
also avoids some problems with TSX aborting when the end of the counter
range is reached.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h       |    3 +++
 arch/x86/kernel/cpu/perf_event.h       |    1 +
 arch/x86/kernel/cpu/perf_event_intel.c |    6 ++++++
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7f0edce..2070f46 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -126,6 +126,9 @@
 #define MSR_KNC_EVNTSEL0               0x00000028
 #define MSR_KNC_EVNTSEL1               0x00000029
 
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0			0x000004c1
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
    complete list. */
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index e5da138..17cb08f 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -278,6 +278,7 @@ union perf_capabilities {
 		u64	pebs_arch_reg:1;
 		u64	pebs_format:4;
 		u64	smm_freeze:1;
+		u64	fw_write:1;
 	};
 	u64	capabilities;
 };
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 9502c19..9bff694 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2238,5 +2238,11 @@ __init int intel_pmu_init(void)
 		}
 	}
 
+	/* Support full width counters using alternative MSR range */
+	if (x86_pmu.intel_cap.fw_write) {
+		x86_pmu.max_period = x86_pmu.cntval_mask;
+		x86_pmu.perfctr = MSR_IA32_PMC0;
+	}
+
 	return 0;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/33] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (11 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 12/33] perf, x86: Support full width counting Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 14/33] perf, core: Add a concept of a weightened sample Andi Kleen
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with a
ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:
- Using the full counter width for counting counters (previous patch)
- Forbid sampling for checkpointed counters. It's not too useful anyways,
checkpointing is mainly for counting.
- On a PMI always set back checkpointed counters to zero.

v2: Add unlikely. Add comment
v3: Allow large sampling periods with CP for KVM
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   33 ++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 9bff694..bbd00cc 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1079,6 +1079,17 @@ static void intel_pmu_enable_event(struct perf_event *event)
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
 	x86_perf_event_update(event);
+	/*
+	 * For a checkpointed counter always reset back to 0.  This
+	 * avoids a situation where the counter overflows, aborts the
+	 * transaction and is then set back to shortly before the
+	 * overflow, and overflows and aborts again.
+	 */
+	if (unlikely(event->hw.config & HSW_INTX_CHECKPOINTED)) {
+		/* No race with NMIs because the counter should not be armed */
+		wrmsrl(event->hw.event_base, 0);
+		local64_set(&event->hw.prev_count, 0);
+	}
 	return x86_perf_event_set_period(event);
 }
 
@@ -1162,6 +1173,15 @@ again:
 		x86_pmu.drain_pebs(regs);
 	}
 
+	/*
+ 	 * To avoid spurious interrupts with perf stat always reset checkpointed
+ 	 * counters.
+ 	 *
+	 * XXX move somewhere else.
+	 */
+	if (cpuc->events[2] && (cpuc->events[2]->hw.config & HSW_INTX_CHECKPOINTED))
+		status |= (1ULL << 2);
+
 	for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
 		struct perf_event *event = cpuc->events[bit];
 
@@ -1635,6 +1655,19 @@ static int hsw_hw_config(struct perf_event *event)
 	     ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
 	      event->attr.precise_ip > 0))
 		return -EIO;
+	if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+		/*
+		 * Sampling of checkpointed events can cause situations where
+		 * the CPU constantly aborts because of a overflow, which is
+		 * then checkpointed back and ignored. Forbid checkpointing
+		 * for sampling.
+		 *
+		 * But still allow a long sampling period, so that perf stat
+		 * from KVM works.
+		 */
+		if (event->attr.sample_period < 0x7fffffff)
+			return -EIO;
+	}
 	return 0;
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/33] perf, core: Add a concept of a weightened sample
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (12 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 13/33] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 15/33] perf, x86: Support weight samples for PEBS Andi Kleen
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was.  This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms)  The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/perf_event.h      |    2 ++
 include/uapi/linux/perf_event.h |    8 ++++++--
 kernel/events/core.c            |    6 ++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0e528fc..f4ded17 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -588,6 +588,7 @@ struct perf_sample_data {
 	struct perf_branch_stack	*br_stack;
 	struct perf_regs_user		regs_user;
 	u64				stack_user_size;
+	u64				weight;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -601,6 +602,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
+	data->weight = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 8e38823..809a5fd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_BRANCH_STACK		= 1U << 11,
 	PERF_SAMPLE_REGS_USER			= 1U << 12,
 	PERF_SAMPLE_STACK_USER			= 1U << 13,
+	PERF_SAMPLE_WEIGHT			= 1U << 14,
+
+	PERF_SAMPLE_MAX = 1U << 15,		/* non-ABI */
 
-	PERF_SAMPLE_MAX = 1U << 14,		/* non-ABI */
 };
 
 /*
@@ -201,8 +203,9 @@ enum perf_event_read_format {
 	PERF_FORMAT_TOTAL_TIME_RUNNING		= 1U << 1,
 	PERF_FORMAT_ID				= 1U << 2,
 	PERF_FORMAT_GROUP			= 1U << 3,
+	PERF_FORMAT_WEIGHT			= 1U << 4,
 
-	PERF_FORMAT_MAX = 1U << 4,		/* non-ABI */
+	PERF_FORMAT_MAX = 1U << 5,		/* non-ABI */
 };
 
 #define PERF_ATTR_SIZE_VER0	64	/* sizeof first published struct */
@@ -562,6 +565,7 @@ enum perf_event_type {
 	 *	{ u64			stream_id;} && PERF_SAMPLE_STREAM_ID
 	 *	{ u32			cpu, res; } && PERF_SAMPLE_CPU
 	 *	{ u64			period;   } && PERF_SAMPLE_PERIOD
+	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
 	 *
 	 *	{ struct read_format	values;	  } && PERF_SAMPLE_READ
 	 *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..d633581 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_PERIOD)
 		size += sizeof(data->period);
 
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		size += sizeof(data->weight);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		size += event->read_size;
 
@@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_PERIOD)
 		perf_output_put(handle, data->period);
 
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		perf_output_put(handle, data->weight);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		perf_output_read(handle, event);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/33] perf, x86: Support weight samples for PEBS
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (13 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 14/33] perf, core: Add a concept of a weightened sample Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:29 ` [PATCH 16/33] perf, tools: Add support for weight v2 Andi Kleen
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When a weighted sample is requested, first try to report the TSX abort cost
on Haswell. If that is not available report the memory latency. This
allows profiling both by abort cost and by memory latencies.

Memory latencies requires enabling a different PEBS mode (LL).
When both address and weight is requested address wins.

The LL mode only works for memory related PEBS events, so add a
separate event constraint table for those.

I only did this for Haswell for now, but it could be added
for several other Intel CPUs too by just adding the right
table for them.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.h          |    4 ++
 arch/x86/kernel/cpu/perf_event_intel.c    |    4 ++
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   47 +++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 17cb08f..89394e1 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -168,6 +168,7 @@ struct cpu_hw_events {
 	u64				perf_ctr_virt_mask;
 
 	void				*kfree_on_online;
+	u8				*memory_latency_events;
 };
 
 #define __EVENT_CONSTRAINT(c, n, m, w, o) {\
@@ -388,6 +389,7 @@ struct x86_pmu {
 	struct event_constraint *pebs_constraints;
 	void		(*pebs_aliases)(struct perf_event *event);
 	int 		max_pebs_events;
+	struct event_constraint *memory_lat_events;
 
 	/*
 	 * Intel LBR
@@ -594,6 +596,8 @@ extern struct event_constraint intel_ivb_pebs_event_constraints[];
 
 extern struct event_constraint intel_hsw_pebs_event_constraints[];
 
+extern struct event_constraint intel_hsw_memory_latency_events[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index bbd00cc..3a7b962 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1644,6 +1644,9 @@ static int hsw_hw_config(struct perf_event *event)
 
 	if (ret)
 		return ret;
+	/* PEBS cannot capture both */
+	if (event->attr.sample_type & PERF_SAMPLE_ADDR)
+		event->attr.sample_type &= ~PERF_SAMPLE_WEIGHT;
 	if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
 		return 0;
 	event->hw.config |= event->attr.config & (HSW_INTX|HSW_INTX_CHECKPOINTED);
@@ -2220,6 +2223,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.hw_config = hsw_hw_config;
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		x86_pmu.format_attrs = intel_hsw_formats_attr;
+		x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
 		pr_cont("Haswell events, ");
 		break;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index aa0f5fa..3094caa 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -456,6 +456,17 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
 	EVENT_CONSTRAINT_END
 };
 
+/* Subset of PEBS events supporting memory latency. Not used for scheduling */
+
+struct event_constraint intel_hsw_memory_latency_events[] = {
+	INTEL_EVENT_CONSTRAINT(0xcd, 0), /* MEM_TRANS_RETIRED.* */
+	INTEL_EVENT_CONSTRAINT(0xd0, 0), /* MEM_UOPS_RETIRED.* */
+	INTEL_EVENT_CONSTRAINT(0xd1, 0), /* MEM_LOAD_UOPS_RETIRED.* */
+	INTEL_EVENT_CONSTRAINT(0xd2, 0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+	INTEL_EVENT_CONSTRAINT(0xd3, 0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+	EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
 	struct event_constraint *c;
@@ -473,6 +484,21 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 	return &emptyconstraint;
 }
 
+static bool is_memory_lat_event(struct perf_event *event)
+{
+	struct event_constraint *c;
+
+	if (x86_pmu.intel_cap.pebs_format < 1)
+		return false;
+	if (!x86_pmu.memory_lat_events)
+		return false;
+	for_each_event_constraint(c, x86_pmu.memory_lat_events) {
+		if ((event->hw.config & c->cmask) == c->code)
+			return true;
+	}
+	return false;
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
@@ -480,7 +506,12 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
 	hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;
 
-	cpuc->pebs_enabled |= 1ULL << hwc->idx;
+	/* When weight is requested enable LL instead of normal PEBS */
+	if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) &&
+		is_memory_lat_event(event))
+		cpuc->pebs_enabled |= 1ULL << (32 + hwc->idx);
+	else
+		cpuc->pebs_enabled |= 1ULL << hwc->idx;
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -488,7 +519,11 @@ void intel_pmu_pebs_disable(struct perf_event *event)
 	struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 
-	cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
+	if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) &&
+		is_memory_lat_event(event))
+		cpuc->pebs_enabled &= ~(1ULL << (32 + hwc->idx));
+	else
+		cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
 	if (cpuc->enabled)
 		wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
 
@@ -634,6 +669,14 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 		x86_pmu.intel_cap.pebs_format >= 2)
 		data.addr = ((struct pebs_record_v2 *)pebs)->nhm.dla;
 
+	if ((event->attr.sample_type & PERF_SAMPLE_WEIGHT) &&
+	    x86_pmu.intel_cap.pebs_format >= 2) {
+		data.weight = ((struct pebs_record_v2 *)pebs)->tsx_tuning &
+				0xffffffff;
+		if (!data.weight)
+			data.weight = ((struct pebs_record_v2 *)pebs)->nhm.lat;
+	}
+
 	if (has_branch_stack(event))
 		data.br_stack = &cpuc->lbr_stack;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/33] perf, tools: Add support for weight v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (14 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 15/33] perf, x86: Support weight samples for PEBS Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-29 10:44   ` Namhyung Kim
  2012-10-26 20:29 ` [PATCH 17/33] perf, tools: Handle XBEGIN like a jump Andi Kleen
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

perf record has a new option -W that enables weightened sampling.

Add sorting support in top/report for the average weight per sample and the
total weight sum. This allows to both compare relative cost per event
and the total cost over the measurement period.

Add the necessary glue to perf report, record and the library.

v2: Merge with new hist refactoring.
Rename global_weight to weight and weight to local_weight.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt |    6 +++
 tools/perf/builtin-annotate.c            |    2 +-
 tools/perf/builtin-diff.c                |    7 ++--
 tools/perf/builtin-record.c              |    2 +
 tools/perf/builtin-report.c              |    7 ++--
 tools/perf/builtin-top.c                 |    5 ++-
 tools/perf/perf.h                        |    1 +
 tools/perf/util/event.h                  |    1 +
 tools/perf/util/evsel.c                  |   10 ++++++
 tools/perf/util/hist.c                   |   25 ++++++++++----
 tools/perf/util/hist.h                   |    8 +++-
 tools/perf/util/session.c                |    3 ++
 tools/perf/util/sort.c                   |   51 +++++++++++++++++++++++++++++-
 tools/perf/util/sort.h                   |    3 ++
 14 files changed, 112 insertions(+), 19 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 4b9f477..0ffb436 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -185,6 +185,12 @@ is enabled for all the sampling events. The sampled branch type is the same for
 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
+-W::
+--weight::
+Enable weightened sampling. When the event supports an additional weight per sample scale
+the histogram by this weight. This currently works for TSX abort events and some memory events
+in precise mode on modern Intel CPUs.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 9ea3854..8f144ad 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, 1);
+	he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index a0b531c..1a41949 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -26,9 +26,10 @@ static bool  force;
 static bool show_displacement;
 
 static int hists__add_entry(struct hists *self,
-			    struct addr_location *al, u64 period)
+			    struct addr_location *al, u64 period,
+			    u64 weight)
 {
-	if (__hists__add_entry(self, al, NULL, period) != NULL)
+	if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
 		return 0;
 	return -ENOMEM;
 }
@@ -50,7 +51,7 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused,
 	if (al.filtered || al.sym == NULL)
 		return 0;
 
-	if (hists__add_entry(&evsel->hists, &al, sample->period)) {
+	if (hists__add_entry(&evsel->hists, &al, sample->period, sample->weight)) {
 		pr_warning("problem incrementing symbol period, skipping event\n");
 		return -1;
 	}
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 88ecbbd..9a25116 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1006,6 +1006,8 @@ const struct option record_options[] = {
 	OPT_CALLBACK('j', "branch-filter", &record.opts.branch_stack,
 		     "branch filter mask", "branch stack filter modes",
 		     parse_branch_stack),
+	OPT_BOOLEAN('W', "weight", &record.opts.sample_weight,
+		    "sample by weight (on special events only)"),
 	OPT_END()
 };
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d46f887..833821b 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -88,7 +88,7 @@ static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
 		 * and not events sampled. Thus we use a pseudo period of 1.
 		 */
 		he = __hists__add_branch_entry(&evsel->hists, al, parent,
-				&bi[i], 1);
+				&bi[i], 1, 1);
 		if (he) {
 			struct annotation *notes;
 			err = -ENOMEM;
@@ -146,7 +146,8 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 			return err;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, parent, sample->period);
+	he = __hists__add_entry(&evsel->hists, al, parent, sample->period,
+					sample->weight);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -596,7 +597,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
 		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-		   " abort, intx"),
+		   " abort, intx,  weight, local_weight"),
 	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
 		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 3861118..cd0bdeb 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -270,7 +270,8 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 {
 	struct hist_entry *he;
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period);
+	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period,
+				sample->weight);
 	if (he == NULL)
 		return NULL;
 
@@ -1223,7 +1224,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
 		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-		   " abort, intx"),
+		   " abort, intx, weight, local_weight"),
 	OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples,
 		    "Show a column with the number of samples"),
 	OPT_CALLBACK_DEFAULT('G', "call-graph", &top, "output_type,min_percent, call_order",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 22a5502..2365abf 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -234,6 +234,7 @@ struct perf_record_opts {
 	bool	     pipe_output;
 	bool	     raw_samples;
 	bool	     sample_address;
+	bool	     sample_weight;
 	bool	     sample_time;
 	bool	     sample_id_all_missing;
 	bool	     exclude_guest_missing;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 21b99e7..d60015b 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -87,6 +87,7 @@ struct perf_sample {
 	u64 id;
 	u64 stream_id;
 	u64 period;
+	u64 weight;
 	u32 cpu;
 	u32 raw_size;
 	void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 618d411..3800fb5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -445,6 +445,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 		attr->mmap_data = track;
 	}
 
+	if (opts->sample_weight)
+		attr->sample_type	|= PERF_SAMPLE_WEIGHT;
+
 	if (opts->call_graph) {
 		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
 
@@ -870,6 +873,7 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 	data->cpu = data->pid = data->tid = -1;
 	data->stream_id = data->id = data->time = -1ULL;
 	data->period = 1;
+	data->weight = 0;
 
 	if (event->header.type != PERF_RECORD_SAMPLE) {
 		if (!evsel->attr.sample_id_all)
@@ -941,6 +945,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->weight = 0;
+	if (type & PERF_SAMPLE_WEIGHT) {
+		data->weight = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_READ) {
 		fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
 		return -1;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 277947a..c17e273 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -151,9 +151,11 @@ static void hist_entry__add_cpumode_period(struct hist_entry *he,
 	}
 }
 
-static void he_stat__add_period(struct he_stat *he_stat, u64 period)
+static void he_stat__add_period(struct he_stat *he_stat, u64 period,
+				u64 weight)
 {
 	he_stat->period		+= period;
+	he_stat->weight		+= weight;
 	he_stat->nr_events	+= 1;
 }
 
@@ -165,12 +167,14 @@ static void he_stat__add_stat(struct he_stat *dest, struct he_stat *src)
 	dest->period_guest_sys	+= src->period_guest_sys;
 	dest->period_guest_us	+= src->period_guest_us;
 	dest->nr_events		+= src->nr_events;
+	dest->weight		+= src->weight;
 }
 
 static void hist_entry__decay(struct hist_entry *he)
 {
 	he->stat.period = (he->stat.period * 7) / 8;
 	he->stat.nr_events = (he->stat.nr_events * 7) / 8;
+	/* XXX need decay for weight too? */
 }
 
 static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
@@ -268,13 +272,17 @@ static u8 symbol__parent_filter(const struct symbol *parent)
 static struct hist_entry *add_hist_entry(struct hists *hists,
 				      struct hist_entry *entry,
 				      struct addr_location *al,
-				      u64 period)
+				      u64 period,
+				      u64 weight)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	int cmp;
 
+	if (weight == 0)
+		weight = 1;
+
 	pthread_mutex_lock(&hists->lock);
 
 	p = &hists->entries_in->rb_node;
@@ -286,7 +294,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 		cmp = hist_entry__cmp(entry, he);
 
 		if (!cmp) {
-			he_stat__add_period(&he->stat, period);
+			he_stat__add_period(&he->stat, period, weight);
 
 			/* If the map of an existing hist_entry has
 			 * become out-of-date due to an exec() or
@@ -314,6 +322,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
 
 	rb_link_node(&he->rb_node_in, parent, p);
 	rb_insert_color(&he->rb_node_in, hists->entries_in);
+	he->stat.weight += weight;
 out:
 	hist_entry__add_cpumode_period(he, al->cpumode, period);
 out_unlock:
@@ -325,7 +334,8 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
 					     struct branch_info *bi,
-					     u64 period)
+					     u64 period,
+					     u64 weight)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -346,12 +356,13 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 		.hists	= self,
 	};
 
-	return add_hist_entry(self, &entry, al, period);
+	return add_hist_entry(self, &entry, al, period, weight);
 }
 
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
-				      struct symbol *sym_parent, u64 period)
+				      struct symbol *sym_parent, u64 period,
+				      u64 weight)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -371,7 +382,7 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.hists	= self,
 	};
 
-	return add_hist_entry(self, &entry, al, period);
+	return add_hist_entry(self, &entry, al, period, weight);
 }
 
 int64_t
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index d918a1a..4a01553 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -50,6 +50,8 @@ enum hist_column {
 	HISTC_DSO_FROM,
 	HISTC_DSO_TO,
 	HISTC_SRCLINE,
+	HISTC_WEIGHT,
+	HISTC_GLOBAL_WEIGHT,
 	HISTC_NR_COLS, /* Last entry */
 };
 
@@ -74,7 +76,8 @@ struct hists {
 
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
-				      struct symbol *parent, u64 period);
+				      struct symbol *parent, u64 period,
+				      u64 weight);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size,
@@ -85,7 +88,8 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 					     struct addr_location *al,
 					     struct symbol *sym_parent,
 					     struct branch_info *bi,
-					     u64 period);
+					     u64 period,
+					     u64 weight);
 
 void hists__output_resort(struct hists *self);
 void hists__output_resort_threaded(struct hists *hists);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8cdd232..2009665 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1006,6 +1006,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_STACK_USER)
 		stack_user__printf(&sample->user_stack);
+
+	if (sample_type & PERF_SAMPLE_WEIGHT)
+		printf("... weight: %" PRIu64 "\n", sample->weight);
 }
 
 static struct machine *
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index a8d1f1a..a7a9691 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -525,6 +525,49 @@ struct sort_entry sort_intx = {
 	.se_width_idx	= HISTC_INTX,
 };
 
+static u64 he_weight(struct hist_entry *he)
+{
+	return he->stat.nr_events ? he->stat.weight / he->stat.nr_events : 0;
+}
+
+static int64_t
+sort__local_weight_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return he_weight(left) - he_weight(right);
+}
+
+static int hist_entry__local_weight_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*llu", width, he_weight(self));
+}
+
+struct sort_entry sort_local_weight = {
+	.se_header	= "Local Weight",
+	.se_cmp		= sort__local_weight_cmp,
+	.se_snprintf	= hist_entry__local_weight_snprintf,
+	.se_width_idx	= HISTC_WEIGHT,
+};
+
+static int64_t
+sort__global_weight_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->stat.weight - right->stat.weight;
+}
+
+static int hist_entry__global_weight_snprintf(struct hist_entry *self, char *bf,
+					      size_t size, unsigned int width)
+{
+	return repsep_snprintf(bf, size, "%-*llu", width, self->stat.weight);
+}
+
+struct sort_entry sort_global_weight = {
+	.se_header	= "Weight",
+	.se_cmp		= sort__global_weight_cmp,
+	.se_snprintf	= hist_entry__global_weight_snprintf,
+	.se_width_idx	= HISTC_GLOBAL_WEIGHT,
+};
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -547,7 +590,9 @@ static struct sort_dimension sort_dimensions[] = {
 	DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
 	DIM(SORT_SRCLINE, "srcline", sort_srcline),
 	DIM(SORT_ABORT, "abort", sort_abort),
-	DIM(SORT_INTX, "intx", sort_intx)
+	DIM(SORT_INTX, "intx", sort_intx),
+	DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight),
+	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
 };
 
 int sort_dimension__add(const char *tok)
@@ -608,6 +653,10 @@ int sort_dimension__add(const char *tok)
 				sort__first_dimension = SORT_INTX;
 			else if (!strcmp(sd->name, "abort"))
 				sort__first_dimension = SORT_ABORT;
+			else if (!strcmp(sd->name, "weight"))
+				sort__first_dimension = SORT_GLOBAL_WEIGHT;
+			else if (!strcmp(sd->name, "local_weight"))
+				sort__first_dimension = SORT_LOCAL_WEIGHT;
 		}
 
 		list_add_tail(&sd->entry->list, &hist_entry__sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 10cf155..e278012 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -49,6 +49,7 @@ struct he_stat {
 	u64			period_us;
 	u64			period_guest_sys;
 	u64			period_guest_us;
+	u64			weight;
 	u32			nr_events;
 };
 
@@ -102,6 +103,8 @@ enum sort_type {
 	SORT_SRCLINE,
 	SORT_ABORT,
 	SORT_INTX,
+	SORT_LOCAL_WEIGHT,
+	SORT_GLOBAL_WEIGHT,
 };
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/33] perf, tools: Handle XBEGIN like a jump
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (15 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 16/33] perf, tools: Add support for weight v2 Andi Kleen
@ 2012-10-26 20:29 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 18/33] perf, x86: Support for printing PMU state on spurious PMIs v3 Andi Kleen
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

So that the browser still shows the abort label

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/annotate.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f0a9103..a34a1ae 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -400,6 +400,8 @@ static struct ins instructions[] = {
 	{ .name = "testb", .ops  = &mov_ops, },
 	{ .name = "testl", .ops  = &mov_ops, },
 	{ .name = "xadd",  .ops  = &mov_ops, },
+	{ .name = "xbeginl", .ops  = &jump_ops, },
+	{ .name = "xbeginq", .ops  = &jump_ops, },
 };
 
 static int ins__cmp(const void *name, const void *insp)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/33] perf, x86: Support for printing PMU state on spurious PMIs v3
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (16 preceding siblings ...)
  2012-10-26 20:29 ` [PATCH 17/33] perf, tools: Handle XBEGIN like a jump Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 19/33] perf, core: Add generic transaction flags Andi Kleen
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

I had some problems with spurious PMIs, so print the PMU state
on a spurious one. This will not interact well with other NMI users.
Disabled by default, has to be explicitely enabled through sysfs.

Optional, but useful for debugging.

v2: Move to /sys/devices/cpu
v3: Print in more cases
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.c       |    3 +++
 arch/x86/kernel/cpu/perf_event.h       |    2 ++
 arch/x86/kernel/cpu/perf_event_intel.c |   11 ++++++-----
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 81b5e65..4a35eef 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -38,6 +38,7 @@
 #include "perf_event.h"
 
 struct x86_pmu x86_pmu __read_mostly;
+int	       print_spurious_pmi __read_mostly;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.enabled = 1,
@@ -1636,9 +1637,11 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
 }
 
 static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+static DEVICE_INT_ATTR(print_spurious_pmi, 0644, print_spurious_pmi);
 
 static struct attribute *x86_pmu_attrs[] = {
 	&dev_attr_rdpmc.attr,
+	&dev_attr_print_spurious_pmi.attr.attr,
 	NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 89394e1..7b43503 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -659,3 +659,5 @@ static inline struct intel_shared_regs *allocate_shared_regs(int cpu)
 }
 
 #endif /* CONFIG_CPU_SUP_INTEL */
+
+extern int print_spurious_pmi;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 3a7b962..bb1a539 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1146,11 +1146,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	intel_pmu_disable_all();
 	handled = intel_pmu_drain_bts_buffer();
 	status = intel_pmu_get_status();
-	if (!status) {
-		intel_pmu_enable_all(0);
-		return handled;
-	}
-
+	if (!status)
+		goto done;
 	loops = 0;
 again:
 	intel_pmu_ack_status(status);
@@ -1210,6 +1207,10 @@ again:
 		goto again;
 
 done:
+	if (!handled && print_spurious_pmi) {
+		pr_debug("Spurious PMI\n");
+		perf_event_print_debug();
+	}
 	intel_pmu_enable_all(0);
 	return handled;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 19/33] perf, core: Add generic transaction flags
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (17 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 18/33] perf, x86: Support for printing PMU state on spurious PMIs v3 Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 20/33] perf, x86: Add Haswell specific transaction flag reporting Andi Kleen
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a generic qualifier for transaction events, as a new sample
type that returns a flag word. This is particularly useful
for qualifying aborts: to distinguish aborts which happen
due to asynchronous events (like conflicts caused by another
CPU) versus instructions that lead to an abort.

The tuning strategies are very different for those cases,
so it's important to distinguish them easily and early.

Since it's inconvenient and inflexible to filter for this
in the kernel we report all the events out and allow
some post processing in user space.

The flags are based on the Intel TSX events, but should be fairly
generic and mostly applicable to other architectures too. In addition
to various flag words there's also reserved space to report an
program supplied abort code. For TSX this is used to distinguish specific
classes of aborts, like a lock busy abort when doing lock elision.

This adds the perf core glue needed for reporting the new flag word out.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/perf_event.h      |    2 ++
 include/uapi/linux/perf_event.h |   24 ++++++++++++++++++++++--
 kernel/events/core.c            |    6 ++++++
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f4ded17..7e6a4b6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -589,6 +589,7 @@ struct perf_sample_data {
 	struct perf_regs_user		regs_user;
 	u64				stack_user_size;
 	u64				weight;
+	u64				transaction;
 };
 
 static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -603,6 +604,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->regs_user.regs = NULL;
 	data->stack_user_size = 0;
 	data->weight = 0;
+	data->transaction = 0;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 809a5fd..7155205 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -133,9 +133,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_REGS_USER			= 1U << 12,
 	PERF_SAMPLE_STACK_USER			= 1U << 13,
 	PERF_SAMPLE_WEIGHT			= 1U << 14,
+	PERF_SAMPLE_TRANSACTION			= 1U << 15,
 
-	PERF_SAMPLE_MAX = 1U << 15,		/* non-ABI */
-
+	PERF_SAMPLE_MAX = 1U << 16,		/* non-ABI */
 };
 
 /*
@@ -179,6 +179,26 @@ enum perf_sample_regs_abi {
 };
 
 /*
+ * Values for the transaction event qualifier, mostly for abort events.
+ */
+enum {
+	PERF_SAMPLE_TXN_ELISION     = (1 << 0), /* From elision */
+	PERF_SAMPLE_TXN_TRANSACTION = (1 << 1), /* From transaction */
+	PERF_SAMPLE_TXN_SYNC        = (1 << 2), /* Instruction is related */
+	PERF_SAMPLE_TXN_ASYNC       = (1 << 3), /* Instruction not related */
+	PERF_SAMPLE_TXN_RETRY       = (1 << 4), /* Retry possible */
+	PERF_SAMPLE_TXN_CONFLICT    = (1 << 5), /* Conflict abort */
+	PERF_SAMPLE_TXN_CAPACITY    = (1 << 6), /* Capacity abort */
+
+	PERF_SAMPLE_TXN_MAX	    = (1 << 7),  /* non-ABI */
+
+	/* bits 24..31 are reserved for the abort code */
+
+	PERF_SAMPLE_TXN_ABORT_MASK  = 0xff000000,
+	PERF_SAMPLE_TXN_ABORT_SHIFT = 24,
+};
+
+/*
  * The format of the data returned by read() on a perf event fd,
  * as specified by attr.read_format:
  *
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d633581..534810d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -955,6 +955,9 @@ static void perf_event__header_size(struct perf_event *event)
 	if (sample_type & PERF_SAMPLE_WEIGHT)
 		size += sizeof(data->weight);
 
+	if (sample_type & PERF_SAMPLE_TRANSACTION)
+		size += sizeof(data->transaction);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		size += event->read_size;
 
@@ -4086,6 +4089,9 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_WEIGHT)
 		perf_output_put(handle, data->weight);
 
+	if (sample_type & PERF_SAMPLE_TRANSACTION)
+		perf_output_put(handle, data->transaction);
+
 	if (sample_type & PERF_SAMPLE_READ)
 		perf_output_read(handle, event);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 20/33] perf, x86: Add Haswell specific transaction flag reporting
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (18 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 19/33] perf, core: Add generic transaction flags Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 21/33] perf, tools: Add support for record transaction flags Andi Kleen
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 3094caa..4b657c2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -677,6 +677,15 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 			data.weight = ((struct pebs_record_v2 *)pebs)->nhm.lat;
 	}
 
+	if ((event->attr.sample_type & PERF_SAMPLE_TRANSACTION) &&
+	    x86_pmu.intel_cap.pebs_format >= 2) {
+		data.transaction =
+		     ((struct pebs_record_v2 *)pebs)->tsx_tuning >> 32;
+		if ((data.transaction & PERF_SAMPLE_TXN_TRANSACTION) &&
+		    (pebs->ax & 1))
+			data.transaction |= pebs->ax & 0xff000000;
+	}
+
 	if (has_branch_stack(event))
 		data.br_stack = &cpuc->lbr_stack;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 21/33] perf, tools: Add support for record transaction flags
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (19 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 20/33] perf, x86: Add Haswell specific transaction flag reporting Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-29 10:49   ` Namhyung Kim
  2012-10-26 20:30 ` [PATCH 22/33] perf, tools: Point --sort documentation to --help Andi Kleen
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add the glue in the user tools to record transaction flags with
--transaction (-T was already taken) and dump them.

Followon patches will use them.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt |    5 ++++-
 tools/perf/builtin-record.c              |    2 ++
 tools/perf/perf.h                        |    1 +
 tools/perf/util/event.h                  |    1 +
 tools/perf/util/evsel.c                  |    9 +++++++++
 tools/perf/util/session.c                |    3 +++
 6 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 0ffb436..34f4f1a 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -185,12 +185,15 @@ is enabled for all the sampling events. The sampled branch type is the same for
 The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
 Note that this feature may not be available on all processors.
 
--W::
 --weight::
 Enable weightened sampling. When the event supports an additional weight per sample scale
 the histogram by this weight. This currently works for TSX abort events and some memory events
 in precise mode on modern Intel CPUs.
 
+-T::
+--transaction::
+Record transaction flags for transaction related events.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 9a25116..49de48e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1008,6 +1008,8 @@ const struct option record_options[] = {
 		     parse_branch_stack),
 	OPT_BOOLEAN('W', "weight", &record.opts.sample_weight,
 		    "sample by weight (on special events only)"),
+	OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction,
+		    "sample transaction flags (special events only)"),
 	OPT_END()
 };
 
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 2365abf..395d216 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -246,6 +246,7 @@ struct perf_record_opts {
 	u64	     default_interval;
 	u64	     user_interval;
 	u16	     stack_dump_size;
+	bool	     sample_transaction;
 };
 
 #endif
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index d60015b..28fd2eb 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -88,6 +88,7 @@ struct perf_sample {
 	u64 stream_id;
 	u64 period;
 	u64 weight;
+	u64 transaction;
 	u32 cpu;
 	u32 raw_size;
 	void *raw_data;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 3800fb5..5c9790d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -448,6 +448,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
 	if (opts->sample_weight)
 		attr->sample_type	|= PERF_SAMPLE_WEIGHT;
 
+	if (opts->sample_transaction)
+		attr->sample_type	|= PERF_SAMPLE_TRANSACTION;
+
 	if (opts->call_graph) {
 		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
 
@@ -951,6 +954,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	data->transaction = 0;
+	if (type & PERF_SAMPLE_TRANSACTION) {
+		data->transaction = *array;
+		array++;
+	}
+
 	if (type & PERF_SAMPLE_READ) {
 		fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
 		return -1;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 2009665..316dd91 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1009,6 +1009,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
 
 	if (sample_type & PERF_SAMPLE_WEIGHT)
 		printf("... weight: %" PRIu64 "\n", sample->weight);
+
+	if (sample_type & PERF_SAMPLE_TRANSACTION)
+		printf("... transaction: %" PRIx64 "\n", sample->transaction);
 }
 
 static struct machine *
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 22/33] perf, tools: Point --sort documentation to --help
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (20 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 21/33] perf, tools: Add support for record transaction flags Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 23/33] perf, tools: Add browser support for transaction flags Andi Kleen
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The --sort documentation for top and report was hopelessly out-of-date
Instead of having two more places that would need to be updated,
just point to --help.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |    2 +-
 tools/perf/Documentation/perf-top.txt    |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f4d91be..7cd5d0a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -57,7 +57,7 @@ OPTIONS
 
 -s::
 --sort=::
-	Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+	Sort by key(s): See --help for a full list.
 
 -p::
 --parent=<regex>::
diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 5b80d84..0f0fa3e 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -112,7 +112,7 @@ Default is to monitor all CPUS.
 
 -s::
 --sort::
-	Sort by key(s): pid, comm, dso, symbol, parent, srcline.
+	Sort by key(s): see --help for a full list.
 
 -n::
 --show-nr-samples::
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 23/33] perf, tools: Add browser support for transaction flags
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (21 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 22/33] perf, tools: Point --sort documentation to --help Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options Andi Kleen
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add histogram support for the transaction flags. Each flags instance becomes
a separate histogram. Support sorting and displaying the flags in report
and top.

The patch is fairly large, but it's really mostly just plumbing to pass the
flags around.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-annotate.c |    2 +-
 tools/perf/builtin-diff.c     |    8 ++++--
 tools/perf/builtin-report.c   |    4 +-
 tools/perf/builtin-top.c      |    4 +-
 tools/perf/util/hist.c        |    3 +-
 tools/perf/util/hist.h        |    3 +-
 tools/perf/util/sort.c        |   50 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/sort.h        |    2 +
 8 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 8f144ad..e91a01c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -62,7 +62,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		return 0;
 	}
 
-	he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1);
+	he = __hists__add_entry(&evsel->hists, al, NULL, 1, 1, 0);
 	if (he == NULL)
 		return -ENOMEM;
 
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 1a41949..8a69268 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -27,9 +27,10 @@ static bool show_displacement;
 
 static int hists__add_entry(struct hists *self,
 			    struct addr_location *al, u64 period,
-			    u64 weight)
+			    u64 weight, u64 transaction)
 {
-	if (__hists__add_entry(self, al, NULL, period, weight) != NULL)
+	if (__hists__add_entry(self, al, NULL, period, weight, transaction)
+	    != NULL)
 		return 0;
 	return -ENOMEM;
 }
@@ -51,7 +52,8 @@ static int diff__process_sample_event(struct perf_tool *tool __maybe_unused,
 	if (al.filtered || al.sym == NULL)
 		return 0;
 
-	if (hists__add_entry(&evsel->hists, &al, sample->period, sample->weight)) {
+	if (hists__add_entry(&evsel->hists, &al, sample->period, sample->weight,
+			     sample->transaction)) {
 		pr_warning("problem incrementing symbol period, skipping event\n");
 		return -1;
 	}
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 833821b..290b1cc 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -147,7 +147,7 @@ static int perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	}
 
 	he = __hists__add_entry(&evsel->hists, al, parent, sample->period,
-					sample->weight);
+				sample->weight, sample->transaction);
 	if (he == NULL)
 		return -ENOMEM;
 
@@ -597,7 +597,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
 		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-		   " abort, intx,  weight, local_weight"),
+		   " abort, intx,  weight, local_weight, transaction"),
 	OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
 		    "Show sample percentage for different cpu modes"),
 	OPT_STRING('p', "parent", &parent_pattern, "regex",
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index cd0bdeb..0188dc0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -271,7 +271,7 @@ static struct hist_entry *perf_evsel__add_hist_entry(struct perf_evsel *evsel,
 	struct hist_entry *he;
 
 	he = __hists__add_entry(&evsel->hists, al, NULL, sample->period,
-				sample->weight);
+				sample->weight, sample->transaction);
 	if (he == NULL)
 		return NULL;
 
@@ -1224,7 +1224,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
 		   "sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
 		   " dso_from, symbol_to, symbol_from, mispredict, srcline,"
-		   " abort, intx, weight, local_weight"),
+		   " abort, intx, weight, local_weight, transaction"),
 	OPT_BOOLEAN('n', "show-nr-samples", &symbol_conf.show_nr_samples,
 		    "Show a column with the number of samples"),
 	OPT_CALLBACK_DEFAULT('G', "call-graph", &top, "output_type,min_percent, call_order",
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c17e273..e143b4e 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -362,7 +362,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
 				      struct symbol *sym_parent, u64 period,
-				      u64 weight)
+				      u64 weight, u64 transaction)
 {
 	struct hist_entry entry = {
 		.thread	= al->thread,
@@ -380,6 +380,7 @@ struct hist_entry *__hists__add_entry(struct hists *self,
 		.parent = sym_parent,
 		.filtered = symbol__parent_filter(sym_parent),
 		.hists	= self,
+		.transaction = transaction,
 	};
 
 	return add_hist_entry(self, &entry, al, period, weight);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 4a01553..196839f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -52,6 +52,7 @@ enum hist_column {
 	HISTC_SRCLINE,
 	HISTC_WEIGHT,
 	HISTC_GLOBAL_WEIGHT,
+	HISTC_TRANSACTION,
 	HISTC_NR_COLS, /* Last entry */
 };
 
@@ -77,7 +78,7 @@ struct hists {
 struct hist_entry *__hists__add_entry(struct hists *self,
 				      struct addr_location *al,
 				      struct symbol *parent, u64 period,
-				      u64 weight);
+				      u64 weight, u64 transaction);
 int64_t hist_entry__cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t hist_entry__collapse(struct hist_entry *left, struct hist_entry *right);
 int hist_entry__sort_snprintf(struct hist_entry *self, char *bf, size_t size,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index a7a9691..6bf7a2b 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -568,6 +568,55 @@ struct sort_entry sort_global_weight = {
 	.se_width_idx	= HISTC_GLOBAL_WEIGHT,
 };
 
+static int64_t
+sort__transaction_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	return left->transaction - right->transaction;
+}
+
+static inline char *add_str(char *p, const char *str)
+{
+	strcpy(p, str);
+	return p + strlen(str);
+}
+
+static int hist_entry__transaction_snprintf(struct hist_entry *self, char *bf,
+				    size_t size, unsigned int width)
+{
+	u64 t = self->transaction;
+	char buf[128];
+	char *p = buf;
+
+	if (t & PERF_SAMPLE_TXN_ELISION)
+		*p++ = 'E';
+	if (t & PERF_SAMPLE_TXN_TRANSACTION)
+		*p++ = 'T';
+	if (t & PERF_SAMPLE_TXN_SYNC)
+		*p++ = 'I';
+	if (t & PERF_SAMPLE_TXN_RETRY)
+		*p++ = 'R';
+	*p = 0;
+	if (t & PERF_SAMPLE_TXN_CONFLICT)
+		p = add_str(p, ":con");
+	if (t & PERF_SAMPLE_TXN_CONFLICT)
+		p = add_str(p, ":cap");
+	if (t & PERF_SAMPLE_TXN_ABORT_MASK) {
+		sprintf(p, ":%" PRIx64,
+			(t & PERF_SAMPLE_TXN_ABORT_MASK) >>
+			PERF_SAMPLE_TXN_ABORT_SHIFT);
+		p += strlen(p);
+	}
+
+	return repsep_snprintf(bf, size, "%-*s", width, buf);
+}
+
+struct sort_entry sort_transaction = {
+	.se_header	= "Transaction",
+	.se_cmp		= sort__transaction_cmp,
+	.se_snprintf	= hist_entry__transaction_snprintf,
+	.se_width_idx	= HISTC_TRANSACTION,
+};
+
 struct sort_dimension {
 	const char		*name;
 	struct sort_entry	*entry;
@@ -593,6 +642,7 @@ static struct sort_dimension sort_dimensions[] = {
 	DIM(SORT_INTX, "intx", sort_intx),
 	DIM(SORT_LOCAL_WEIGHT, "local_weight", sort_local_weight),
 	DIM(SORT_GLOBAL_WEIGHT, "weight", sort_global_weight),
+	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
 };
 
 int sort_dimension__add(const char *tok)
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index e278012..b0e43f0 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -66,6 +66,7 @@ struct hist_entry {
 	struct map_symbol	ms;
 	struct thread		*thread;
 	u64			ip;
+	u64			transaction;
 	s32			cpu;
 
 	/* XXX These two should move to some tree widget lib */
@@ -105,6 +106,7 @@ enum sort_type {
 	SORT_INTX,
 	SORT_LOCAL_WEIGHT,
 	SORT_GLOBAL_WEIGHT,
+	SORT_TRANSACTION,
 };
 
 /*
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (22 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 23/33] perf, tools: Add browser support for transaction flags Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 19:08   ` Jiri Olsa
  2012-10-30 11:58   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  2012-10-26 20:30 ` [PATCH 25/33] perf, tools: Support events with - in the name Andi Kleen
                   ` (8 subsequent siblings)
  32 siblings, 2 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The callers of parse_events usually have their own error handling.
Move the fprintf for a bad event to parse_events_options, which
is the only one who should need it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 75c7b0f..409da3e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -827,8 +827,6 @@ int parse_events(struct perf_evlist *evlist, const char *str,
 	 * Both call perf_evlist__delete in case of error, so we dont
 	 * need to bother.
 	 */
-	fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
-	fprintf(stderr, "Run 'perf list' for a list of valid events\n");
 	return ret;
 }
 
@@ -836,7 +834,13 @@ int parse_events_option(const struct option *opt, const char *str,
 			int unset __maybe_unused)
 {
 	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
-	return parse_events(evlist, str, unset);
+	int ret = parse_events(evlist, str, unset);
+
+	if (ret) {
+		fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
+		fprintf(stderr, "Run 'perf list' for a list of valid events\n");
+	}
+	return ret;
 }
 
 int parse_filter(const struct option *opt, const char *str,
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 25/33] perf, tools: Support events with - in the name
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (23 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 19:32   ` Jiri Olsa
  2012-10-26 20:30 ` [PATCH 26/33] perf, x86: Report the arch perfmon events in sysfs Andi Kleen
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

- looks nicer than _, so allow - in the event names. Used for various
of the arch perfmon and Haswell events.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.l |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c87efc1..ef602f0 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -80,7 +80,7 @@ event		[^,{}/]+
 num_dec		[0-9]+
 num_hex		0x[a-fA-F0-9]+
 num_raw_hex	[a-fA-F0-9]+
-name		[a-zA-Z_*?][a-zA-Z0-9_*?]*
+name		[a-zA-Z_*?][a-zA-Z0-9\-_*?]*
 modifier_event	[ukhpGH]{1,8}
 modifier_bp	[rwx]{1,3}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 26/33] perf, x86: Report the arch perfmon events in sysfs
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (24 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 25/33] perf, tools: Support events with - in the name Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 27/33] tools, perf: Add a precise event qualifier Andi Kleen
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Report all the supported arch perfmon events as event aliases in
/sys/devices/cpu/...

This is needed to use the TSX intx,intx_cp attributes with
symbolic events, at least for these basic events.

Currently cpu/instructions/ doesn't work because instructions
is also a generic event. It works for all events which are not
the same as generic events though.

Probably needs to be fixed in the perf events parser.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.c       |    7 ++++
 arch/x86/kernel/cpu/perf_event.h       |    1 +
 arch/x86/kernel/cpu/perf_event_intel.c |   56 ++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4a35eef..08e61a6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1315,6 +1315,11 @@ static struct attribute_group x86_pmu_format_group = {
 	.attrs = NULL,
 };
 
+static struct attribute_group x86_pmu_events_group = {
+	.name = "events",
+	.attrs = NULL,
+};
+
 static int __init init_hw_perf_events(void)
 {
 	struct x86_pmu_quirk *quirk;
@@ -1360,6 +1365,7 @@ static int __init init_hw_perf_events(void)
 
 	x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
 	x86_pmu_format_group.attrs = x86_pmu.format_attrs;
+	x86_pmu_events_group.attrs = x86_pmu.events_attrs;
 
 	pr_info("... version:                %d\n",     x86_pmu.version);
 	pr_info("... bit width:              %d\n",     x86_pmu.cntval_bits);
@@ -1652,6 +1658,7 @@ static struct attribute_group x86_pmu_attr_group = {
 static const struct attribute_group *x86_pmu_attr_groups[] = {
 	&x86_pmu_attr_group,
 	&x86_pmu_format_group,
+	&x86_pmu_events_group,
 	NULL,
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 7b43503..d3b3bb7 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -358,6 +358,7 @@ struct x86_pmu {
 	 */
 	int		attr_rdpmc;
 	struct attribute **format_attrs;
+	struct attribute **events_attrs;
 
 	/*
 	 * CPU Hotplug hooks
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index bb1a539..c3beee1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -34,6 +34,18 @@ static u64 intel_perfmon_event_map[PERF_COUNT_HW_MAX] __read_mostly =
 	[PERF_COUNT_HW_REF_CPU_CYCLES]		= 0x0300, /* pseudo-encoding */
 };
 
+static const char *intel_perfmon_names[PERF_COUNT_HW_MAX] __read_mostly =
+{
+	[PERF_COUNT_HW_CPU_CYCLES]		= "cycles",
+	[PERF_COUNT_HW_INSTRUCTIONS]		= "instructions",
+	[PERF_COUNT_HW_CACHE_REFERENCES]	= "cache-references",
+	[PERF_COUNT_HW_CACHE_MISSES]		= "cache-misses",
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS]	= "branches",
+	[PERF_COUNT_HW_BRANCH_MISSES]		= "branch-misses",
+	[PERF_COUNT_HW_BUS_CYCLES]		= "bus-cycles",
+	[PERF_COUNT_HW_REF_CPU_CYCLES]		= "ref-cycles"
+};
+
 static struct event_constraint intel_core_event_constraints[] __read_mostly =
 {
 	INTEL_EVENT_CONSTRAINT(0x11, 0x2), /* FP_ASSIST */
@@ -2000,6 +2012,48 @@ static __init void intel_nehalem_quirk(void)
 	}
 }
 
+static struct attribute *intel_arch_events[PERF_COUNT_HW_MAX + 1] __read_mostly;
+
+struct event_attribute {
+	struct device_attribute		attr;
+	u64				config;
+};
+
+static struct event_attribute intel_arch_event_attr[PERF_COUNT_HW_MAX];
+
+static ssize_t show_event(struct device *dev,
+			  struct device_attribute *attr,
+			  char *page)
+{
+	struct event_attribute *e = container_of(attr, struct event_attribute, attr);
+
+	return sprintf(page, "event=%#llx,umask=%#llx",
+		       e->config & 0xff,
+		       (e->config >> 8) & 0xff);
+}
+
+static __init void intel_gen_arch_events(void)
+{
+	int j, i;
+
+	j = 0;
+	for_each_clear_bit(i, x86_pmu.events_mask, ARRAY_SIZE(intel_arch_events_map)) {
+		struct event_attribute *e = intel_arch_event_attr + j;
+		struct device_attribute *d = &e->attr;
+		struct attribute *a = &d->attr;
+		int id = intel_arch_events_map[i].id;
+
+		e->config = intel_perfmon_event_map[id];
+		intel_arch_events[j] = a;
+		a->name = intel_perfmon_names[id];
+		a->mode = 0444;
+		d->show = show_event;
+		j++;
+	}
+	intel_arch_events[j] = NULL;
+	x86_pmu.events_attrs = intel_arch_events;
+}
+
 __init int intel_pmu_init(void)
 {
 	union cpuid10_edx edx;
@@ -2045,6 +2099,8 @@ __init int intel_pmu_init(void)
 
 	x86_pmu.max_pebs_events		= min_t(unsigned, MAX_PEBS_EVENTS, x86_pmu.num_counters);
 
+	intel_gen_arch_events();
+
 	/*
 	 * Quirk: v2 perfmon does not report fixed-purpose events, so
 	 * assume at least 3 events:
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 27/33] tools, perf: Add a precise event qualifier
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (25 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 26/33] perf, x86: Report the arch perfmon events in sysfs Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 19:35   ` Jiri Olsa
  2012-10-26 20:30 ` [PATCH 28/33] perf, x86: Add Haswell TSX event aliases Andi Kleen
                   ` (5 subsequent siblings)
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a precise qualifier, like cpu/event=0x3c,precise=1/

This is needed so that the kernel can request enabling PEBS
for TSX events. The parser bails out on any sysfs parse errors,
so this is needed in any case to handle any event on the TSX
perf kernel.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.c |    6 ++++++
 tools/perf/util/parse-events.h |    1 +
 tools/perf/util/parse-events.l |    1 +
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 409da3e..f800765 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -526,6 +526,12 @@ do {								\
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 		CHECK_TYPE_VAL(STR);
 		break;
+	case PARSE_EVENTS__TERM_TYPE_PRECISE:
+		CHECK_TYPE_VAL(NUM);
+		if ((unsigned)term->val.num > 2)
+			return -EINVAL;
+		attr->precise_ip = term->val.num;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 839230c..0c78bb8 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -49,6 +49,7 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_NAME,
 	PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
 	PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
+	PARSE_EVENTS__TERM_TYPE_PRECISE,
 };
 
 struct parse_events__term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index ef602f0..c2e5142 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -168,6 +168,7 @@ period			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
+precise			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PRECISE); }
 }
 
 mem:			{ BEGIN(mem); return PE_PREFIX_MEM; }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 28/33] perf, x86: Add Haswell TSX event aliases
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (26 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 27/33] tools, perf: Add a precise event qualifier Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 29/33] perf, tools: Add perf stat --transaction v2 Andi Kleen
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add infrastructure to generate event aliases in /sys/devices/cpu/events/

And use this to set up user friendly aliases for the common TSX events.
TSX tuning relies heavily on the PMU, so it's important to be user friendly.

This replaces the generic transaction events in an earlier version
of this patchkit.

tx-start/commit/abort  to count RTM transactions
el-start/commit/abort  to count HLE ("elision") transactions
tx-conflict/overflow   to count conflict/overflow for both combined.

The general abort events exist in precise and non precise variants
Since the common case is sampling plain "tx-aborts" in precise.

This is very important because abort sampling only really works
with PEBS enabled, otherwise it would report the IP after the abort,
not the abort point. But counting with PEBS has more overhead,
so also have tx/el-abort-count aliases that do not enable PEBS
for perf stat.

It would be nice to switch automatically between those two, like in the
previous version, but that would need more new infrastructure for sysfs
first.

There is an tx-abort<->tx-aborts alias too, because I found myself
using both variants.

Also added friendly aliases for cpu/cycles,intx=1/ and
cpu/cycles,intx=1,intx_cp=1/ and the same for instructions.
These will be used by perf stat -T, and are also useful for users directly.

So for example to get transactional cycles can use "perf stat -e cycles-t"

Some of the sysfs macros/functions could probably move to generic code, but
I left it in the Intel code for now until there are more users.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |   95 ++++++++++++++++++++++++++++++++
 1 files changed, 95 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index c3beee1..e9706f0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2054,6 +2054,99 @@ static __init void intel_gen_arch_events(void)
 	x86_pmu.events_attrs = intel_arch_events;
 }
 
+struct sevent_attribute {
+	struct device_attribute		attr;
+	const char			*val;
+};
+
+#define PMU_EVENT(_name, _id, _val)					\
+	static struct sevent_attribute attr_ ## _name =			\
+	{ .attr =							\
+	  { .attr = { .name = _id, .mode = 0444 },			\
+	    .show = show_sevent },					\
+	  .val = _val }
+
+static ssize_t show_sevent(struct device *dev,
+			  struct device_attribute *attr,
+			  char *page)
+{
+	struct sevent_attribute *e = container_of(attr, struct sevent_attribute, attr);
+
+	return sprintf(page, "%s", e->val);
+}
+
+/* Haswell special events */
+PMU_EVENT(tx_start,       "tx-start",       "event=0xc9,umask=0x1");
+PMU_EVENT(tx_commit,      "tx-commit",      "event=0xc9,umask=0x2");
+PMU_EVENT(tx_abort,       "tx-abort",       "event=0xc9,umask=0x4,precise=2");
+PMU_EVENT(tx_abort_count, "tx-abort-count", "event=0xc9,umask=0x4");
+/* alias */
+PMU_EVENT(tx_aborts,      "tx-aborts",      "event=0xc9,umask=0x4,precise=2");
+PMU_EVENT(tx_capacity,    "tx-capacity",    "event=0x54,umask=0x2");
+PMU_EVENT(tx_conflict,    "tx-conflict",    "event=0x54,umask=0x1");
+PMU_EVENT(el_start,       "el-start",       "event=0xc8,umask=0x1");
+PMU_EVENT(el_commit,      "el-commit",      "event=0xc8,umask=0x2");
+PMU_EVENT(el_abort,       "el-abort",       "event=0xc8,umask=0x4,precise=2");
+PMU_EVENT(el_abort_count, "el-abort-count", "event=0xc8,umask=0x4");
+/* alias */
+PMU_EVENT(el_aborts,      "el-aborts",      "event=0xc8,umask=0x4,precise=2");
+/* shared with tx-* */
+PMU_EVENT(el_capacity,    "el-capacity",    "event=0x54,umask=0x2");
+/* shared with tx-* */
+PMU_EVENT(el_conflict,    "el-conflict",    "event=0x54,umask=0x1");
+PMU_EVENT(cycles_t,       "cycles-t",       "event=0x3c,intx=1");
+PMU_EVENT(cycles_ct,      "cycles-ct",      "event=0x3c,intx=1,intx_cp=1");
+PMU_EVENT(insns_t,        "instructions-t", "event=0xc0,intx=1");
+PMU_EVENT(insns_ct,       "instructions-ct","event=0xc0,intx=1,intx_cp=1");
+
+#define PMU_EVENT_PTR(x) &attr_ ## x .attr.attr
+
+static struct attribute *hsw_events_attrs[] = {
+	PMU_EVENT_PTR(tx_start),
+	PMU_EVENT_PTR(tx_commit),
+	PMU_EVENT_PTR(tx_abort),
+	PMU_EVENT_PTR(tx_aborts),
+	PMU_EVENT_PTR(tx_abort_count),
+	PMU_EVENT_PTR(tx_capacity),
+	PMU_EVENT_PTR(tx_conflict),
+	PMU_EVENT_PTR(el_start),
+	PMU_EVENT_PTR(el_commit),
+	PMU_EVENT_PTR(el_abort),
+	PMU_EVENT_PTR(el_aborts),
+	PMU_EVENT_PTR(el_abort_count),
+	PMU_EVENT_PTR(el_capacity),
+	PMU_EVENT_PTR(el_conflict),
+	PMU_EVENT_PTR(cycles_t),
+	PMU_EVENT_PTR(cycles_ct),
+	PMU_EVENT_PTR(insns_t),
+	PMU_EVENT_PTR(insns_ct),
+	NULL
+};
+
+/* Merge two pointer arrays */
+static __init struct attribute **merge_attr(struct attribute **a,
+					    struct attribute **b)
+{
+	struct attribute **new;
+	int j, i;
+
+	for (j = 0; a[j]; j++)
+		;
+	for (i = 0; b[i]; i++)
+		j++;
+	j++;
+	new = kmalloc(sizeof(struct attribute *) * j, GFP_KERNEL);
+	if (!new)
+		return a;
+	j = 0;
+	for (i = 0; a[i]; i++)
+		new[j++] = a[i];
+	for (i = 0; b[i]; i++)
+		new[j++] = b[i];
+	new[j] = NULL;
+	return new;
+}
+
 __init int intel_pmu_init(void)
 {
 	union cpuid10_edx edx;
@@ -2281,6 +2374,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.get_event_constraints = hsw_get_event_constraints;
 		x86_pmu.format_attrs = intel_hsw_formats_attr;
 		x86_pmu.memory_lat_events = intel_hsw_memory_latency_events;
+		x86_pmu.events_attrs = merge_attr(x86_pmu.events_attrs,
+						  hsw_events_attrs);
 		pr_cont("Haswell events, ");
 		break;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 29/33] perf, tools: Add perf stat --transaction v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (27 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 28/33] perf, x86: Add Haswell TSX event aliases Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 30/33] perf, x86: Add a Haswell precise instructions event Andi Kleen
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add support to perf stat to print the basic transactional execution statistics:
Total cycles, Cycles in Transaction, Cycles in aborted transsactions
using the intx and intx_checkpoint qualifiers.
Transaction Starts and Elision Starts, to compute the average transaction length.

This is a reasonable overview over the success of the transactions.

Enable with a new --transaction / -T option.

This requires measuring these events in a group, since they depend on each
other.

This is implemented by using TM sysfs events exported by the kernel

v2: Only print the extended statistics when the option is enabled.
This avoids negative output when the user specifies the -T events
in separate groups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |    3 +
 tools/perf/builtin-stat.c              |  101 +++++++++++++++++++++++++++++++-
 tools/perf/util/evsel.h                |    6 ++
 3 files changed, 107 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 2fa173b..653bdbd 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -108,7 +108,10 @@ with it.  --append may be used here.  Examples:
      3>results  perf stat --log-fd 3          -- $cmd
      3>>results perf stat --log-fd 3 --append -- $cmd
 
+-T::
+--transaction::
 
+Print statistics of transactional execution if supported.
 
 EXAMPLES
 --------
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 93b9011..a451490 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -64,6 +64,29 @@
 #define CNTR_NOT_SUPPORTED	"<not supported>"
 #define CNTR_NOT_COUNTED	"<not counted>"
 
+static const char *transaction_attrs[] = {
+	"task-clock",
+	"{"
+	"instructions,"
+	"cycles,"
+	"cpu/cycles-t/,"
+	"cpu/cycles-ct/,"
+	"cpu/tx-start/,"
+	"cpu/el-start/"
+	"}"
+};
+
+/* must match the transaction_attrs above */
+enum {
+	T_TASK_CLOCK,
+	T_INSTRUCTIONS,
+	T_CYCLES,
+	T_CYCLES_INTX,
+	T_CYCLES_INTX_CP,
+	T_TRANSACTION_START,
+	T_ELISION_START
+};
+
 static struct perf_evlist	*evsel_list;
 
 static struct perf_target	target = {
@@ -77,6 +100,7 @@ static bool			no_aggr				= false;
 static pid_t			child_pid			= -1;
 static bool			null_run			=  false;
 static int			detailed_run			=  0;
+static bool			transaction_run			=  false;
 static bool			big_num				=  true;
 static int			big_num_opt			=  -1;
 static const char		*csv_sep			= NULL;
@@ -123,7 +147,11 @@ static struct stats runtime_l1_icache_stats[MAX_NR_CPUS];
 static struct stats runtime_ll_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_itlb_cache_stats[MAX_NR_CPUS];
 static struct stats runtime_dtlb_cache_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intx_stats[MAX_NR_CPUS];
+static struct stats runtime_cycles_intxcp_stats[MAX_NR_CPUS];
 static struct stats walltime_nsecs_stats;
+static struct stats runtime_transaction_stats[MAX_NR_CPUS];
+static struct stats runtime_elision_stats[MAX_NR_CPUS];
 
 static int create_perf_stat_counter(struct perf_evsel *evsel,
 				    struct perf_evsel *first)
@@ -183,6 +211,18 @@ static inline int nsec_counter(struct perf_evsel *evsel)
 	return 0;
 }
 
+static struct perf_evsel *nth_evsel(int n)
+{
+	struct perf_evsel *ev;
+	int j;
+
+	j = 0;
+	list_for_each_entry (ev, &evsel_list->entries, node)
+		if (j++ == n)
+			return ev;
+	return NULL;
+}
+
 /*
  * Update various tracking values we maintain to print
  * more semantic information such as miss/hit ratios,
@@ -194,8 +234,14 @@ static void update_shadow_stats(struct perf_evsel *counter, u64 *count)
 		update_stats(&runtime_nsecs_stats[0], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_CPU_CYCLES))
 		update_stats(&runtime_cycles_stats[0], count[0]);
-	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
-		update_stats(&runtime_stalled_cycles_front_stats[0], count[0]);
+	else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX)))
+		update_stats(&runtime_cycles_intx_stats[0], count[0]);
+	else if (perf_evsel__cmp(counter, nth_evsel(T_CYCLES_INTX_CP)))
+		update_stats(&runtime_cycles_intxcp_stats[0], count[0]);
+	else if (perf_evsel__cmp(counter, nth_evsel(T_TRANSACTION_START)))
+		update_stats(&runtime_transaction_stats[0], count[0]);
+	else if (perf_evsel__cmp(counter, nth_evsel(T_ELISION_START)))
+		update_stats(&runtime_elision_stats[0], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
 		update_stats(&runtime_stalled_cycles_back_stats[0], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_BRANCH_INSTRUCTIONS))
@@ -633,7 +679,7 @@ static void print_ll_cache_misses(int cpu,
 
 static void abs_printout(int cpu, struct perf_evsel *evsel, double avg)
 {
-	double total, ratio = 0.0;
+	double total, ratio = 0.0, total2;
 	char cpustr[16] = { '\0', };
 	const char *fmt;
 
@@ -733,6 +779,41 @@ static void abs_printout(int cpu, struct perf_evsel *evsel, double avg)
 			ratio = 1.0 * avg / total;
 
 		fprintf(output, " # %8.3f GHz                    ", ratio);
+	} else if (perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_INTX)) && 
+		   transaction_run) {
+		total = avg_stats(&runtime_cycles_stats[cpu]);
+		if (total)
+			fprintf(output,
+				" #   %5.2f%% transactional cycles   ",
+				100.0 * (avg / total));
+	} else if (perf_evsel__cmp(evsel, nth_evsel(T_CYCLES_INTX_CP)) &&
+		   transaction_run) {
+		total = avg_stats(&runtime_cycles_stats[cpu]);
+		total2 = avg_stats(&runtime_cycles_intx_stats[cpu]);
+		if (total)
+			fprintf(output,
+				" #   %5.2f%% aborted cycles         ",
+				100.0 * ((total2-avg) / total));
+	} else if (perf_evsel__cmp(evsel, nth_evsel(T_TRANSACTION_START)) &&
+		   avg > 0 &&
+		   runtime_cycles_intx_stats[cpu].n != 0 &&
+		   transaction_run) {
+		total = avg_stats(&runtime_cycles_intx_stats[cpu]);
+
+		if (total)
+			ratio = total / avg;
+
+		fprintf(output, " # %8.0f cycles / transaction ", ratio);
+	} else if (perf_evsel__cmp(evsel, nth_evsel(T_ELISION_START)) &&
+		   avg > 0 &&
+		   runtime_cycles_intx_stats[cpu].n != 0 &&
+		   transaction_run) {
+		total = avg_stats(&runtime_cycles_intx_stats[cpu]);
+
+		if (total)
+			ratio = total / avg;
+
+		fprintf(output, " # %8.0f cycles / elision     ", ratio);
 	} else if (runtime_nsecs_stats[cpu].n != 0) {
 		char unit = 'M';
 
@@ -1039,6 +1120,18 @@ static int add_default_attributes(void)
 	if (null_run)
 		return 0;
 
+	if (transaction_run) {
+		unsigned i;
+
+		for (i = 0; i < ARRAY_SIZE(transaction_attrs); i++) {
+			if (parse_events(evsel_list, transaction_attrs[i], 0)) {
+				fprintf(stderr, "Cannot set up transaction events\n");
+				return -1;
+			}
+		}
+		return 0;
+	}
+
 	if (!evsel_list->nr_entries) {
 		if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
 			return -1;
@@ -1114,6 +1207,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "append", &append_file, "append to the output file"),
 	OPT_INTEGER(0, "log-fd", &output_fd,
 		    "log output to fd, instead of stderr"),
+	OPT_BOOLEAN('T', "transaction", &transaction_run,
+		    "hardware transaction statistics"),
 	OPT_END()
 	};
 	const char * const stat_usage[] = {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 6f94d6d..418405e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -158,6 +158,12 @@ static inline bool perf_evsel__match2(struct perf_evsel *e1,
 	       (e1->attr.config == e2->attr.config);
 }
 
+#define perf_evsel__cmp(a, b)			\
+	((a) &&					\
+	 (b) &&					\
+	 (a)->attr.type == (b)->attr.type &&	\
+	 (a)->attr.config == (b)->attr.config)
+
 int __perf_evsel__read_on_cpu(struct perf_evsel *evsel,
 			      int cpu, int thread, bool scale);
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 30/33] perf, x86: Add a Haswell precise instructions event
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (28 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 29/33] perf, tools: Add perf stat --transaction v2 Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-26 20:30 ` [PATCH 31/33] perf, tools: Support generic events as pmu event names v2 Andi Kleen
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a instructions-p event alias that uses the PDIR randomized instruction
retirement event. This is useful to avoid some systematic sampling shadow
problems. Normally PEBS sampling has a systematic shadow. With PDIR
enabled the hardware adds some randomization that statistically avoids
this problem. In this sense, it's more precise over a whole sampling
interval, but an individual sample can be less precise. But since we
sample overall it's a more precise event.

This could be used before using the explicit event code syntax, but it's easier
and more user friendly to use with an "instructions-p" alias. I expect
this will eventually become a common use case.

Right now for Haswell, will add to Ivy Bridge later too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e9706f0..707fd5c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2098,6 +2098,7 @@ PMU_EVENT(cycles_t,       "cycles-t",       "event=0x3c,intx=1");
 PMU_EVENT(cycles_ct,      "cycles-ct",      "event=0x3c,intx=1,intx_cp=1");
 PMU_EVENT(insns_t,        "instructions-t", "event=0xc0,intx=1");
 PMU_EVENT(insns_ct,       "instructions-ct","event=0xc0,intx=1,intx_cp=1");
+PMU_EVENT(insns_prec,     "instructions-p", "event=0xc0,umask=0x01,precise=2");
 
 #define PMU_EVENT_PTR(x) &attr_ ## x .attr.attr
 
@@ -2120,6 +2121,7 @@ static struct attribute *hsw_events_attrs[] = {
 	PMU_EVENT_PTR(cycles_ct),
 	PMU_EVENT_PTR(insns_t),
 	PMU_EVENT_PTR(insns_ct),
+	PMU_EVENT_PTR(insns_prec),
 	NULL
 };
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 31/33] perf, tools: Support generic events as pmu event names v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (29 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 30/33] perf, x86: Add a Haswell precise instructions event Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 19:42   ` Jiri Olsa
  2012-10-26 20:30 ` [PATCH 32/33] perf, tools: Default to cpu// for events v2 Andi Kleen
  2012-10-26 20:30 ` [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2 Andi Kleen
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Extend the parser/lexer to allow generic event names like
"instructions" as a sysfs supplied PMU event name.

This resolves the problem that cpu/instructions/ gives a parse
error, even when the kernel supplies a "instructions" event

This is useful to add sysfs specified qualifiers to these
events, for example cpu/instructions,intx=1/ and needed
for the TSX events

Simply extend the grammar to handle this case. The lexer
needs minor changes to save the original string.

v2: Remove bogus returns in grammar.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.l |    3 ++-
 tools/perf/util/parse-events.y |   27 +++++++++++++++++++++++----
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index c2e5142..dd3a901 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -55,7 +55,8 @@ static int sym(yyscan_t scanner, int type, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 
-	yylval->num = (type << 16) + config;
+	yylval->namenum.num = (type << 16) + config;
+	yylval->namenum.name = strdup(parse_events_get_text(scanner));
 	return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
 }
 
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index cd88209..25f0123 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -34,8 +34,8 @@ do { \
 %token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
 %token PE_ERROR
 %type <num> PE_VALUE
-%type <num> PE_VALUE_SYM_HW
-%type <num> PE_VALUE_SYM_SW
+%type <namenum> PE_VALUE_SYM_HW
+%type <namenum> PE_VALUE_SYM_SW
 %type <num> PE_RAW
 %type <num> PE_TERM
 %type <str> PE_NAME
@@ -65,6 +65,7 @@ do { \
 
 %union
 {
+	struct { char *name; u64 num; } namenum;
 	char *str;
 	u64 num;
 	struct list_head *head;
@@ -195,9 +196,9 @@ PE_NAME '/' event_config '/'
 }
 
 value_sym:
-PE_VALUE_SYM_HW
+PE_VALUE_SYM_HW 	{ free($1.name); $$ = $1.num; }
 |
-PE_VALUE_SYM_SW
+PE_VALUE_SYM_SW		{ free($1.name); $$ = $1.num; }
 
 event_legacy_symbol:
 value_sym '/' event_config '/'
@@ -361,6 +362,24 @@ PE_NAME
 	$$ = term;
 }
 |
+PE_VALUE_SYM_HW
+{
+	struct parse_events__term *term;
+
+	ABORT_ON(parse_events__term_num(&term, PARSE_EVENTS__TERM_TYPE_USER,
+					$1.name, 1));
+	$$ = term;
+}
+|
+PE_VALUE_SYM_SW
+{
+	struct parse_events__term *term;
+
+	ABORT_ON(parse_events__term_num(&term, PARSE_EVENTS__TERM_TYPE_USER,
+					$1.name, 1));
+	$$ = term;
+}
+|
 PE_TERM '=' PE_NAME
 {
 	struct parse_events__term *term;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 32/33] perf, tools: Default to cpu// for events v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (30 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 31/33] perf, tools: Support generic events as pmu event names v2 Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 20:16   ` Jiri Olsa
  2012-10-26 20:30 ` [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2 Andi Kleen
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When an event fails to parse and it's not in a new style format,
try to parse it again as a cpu event.

This allows to use sysfs exported events directly without //, so I can use

perf record -e tx-aborts ...

instead of

perf record -e cpu/tx-aborts/

v2: Handle multiple events
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/parse-events.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f800765..ee6a73c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -768,6 +768,23 @@ int parse_events_name(struct list_head *list, char *name)
 	return 0;
 }
 
+static void str_append(char **s, int *len, const char *a)
+{
+	int olen = *s ? strlen(*s) : 0;
+	int nlen = olen + strlen(a) + 1;
+	if (*len < nlen) { 
+		*len = *len * 2;
+		if (*len < nlen)
+			*len = nlen;
+		*s = realloc(*s, *len);
+		if (!*s) 
+			exit(ENOMEM);
+		if (olen == 0)
+			**s = 0;
+	}
+	strcat(*s, a);
+}
+
 static int parse_events__scanner(const char *str, void *data, int start_token)
 {
 	YY_BUFFER_STATE buffer;
@@ -788,6 +805,26 @@ static int parse_events__scanner(const char *str, void *data, int start_token)
 	parse_events__flush_buffer(buffer, scanner);
 	parse_events__delete_buffer(buffer, scanner);
 	parse_events_lex_destroy(scanner);
+
+	if (ret && !strchr(str, '/')) {
+		char *o = strdup(str);
+		char *s = NULL;
+		char *t = o;
+		char *p;
+		int len = 0;
+
+		if (!o)
+			return ret;
+		while ((p = strsep(&t, ",")) != NULL) {
+			if (s)
+				str_append(&s, &len, ",");
+			str_append(&s, &len, "cpu/");
+			str_append(&s, &len, p);
+			str_append(&s, &len, "/");
+		}
+		free(o);
+		ret = parse_events__scanner(s, data, start_token);
+	}
 	return ret;
 }
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2
  2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
                   ` (31 preceding siblings ...)
  2012-10-26 20:30 ` [PATCH 32/33] perf, tools: Default to cpu// for events v2 Andi Kleen
@ 2012-10-26 20:30 ` Andi Kleen
  2012-10-27 20:20   ` Jiri Olsa
  32 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-26 20:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: acme, peterz, jolsa, eranian, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

List the kernel supplied pmu event aliases in perf list

It's better when the users can actually see them.

v2: Fix pattern matching
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-list.txt |    4 +-
 tools/perf/builtin-list.c              |    3 +
 tools/perf/util/parse-events.c         |    5 ++-
 tools/perf/util/pmu.c                  |   72 ++++++++++++++++++++++++++++++++
 tools/perf/util/pmu.h                  |    3 +
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index d1e39dc..826f3d6 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -8,7 +8,7 @@ perf-list - List all symbolic event types
 SYNOPSIS
 --------
 [verse]
-'perf list' [hw|sw|cache|tracepoint|event_glob]
+'perf list' [hw|sw|cache|tracepoint|pmu|event_glob]
 
 DESCRIPTION
 -----------
@@ -104,6 +104,8 @@ To limit the list use:
   'subsys_glob:event_glob' to filter by tracepoint subsystems such as sched,
   block, etc.
 
+. 'pmu' to print the kernel supplied PMU events.
+
 . If none of the above is matched, it will apply the supplied glob to all
   events, printing the ones that match.
 
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 1948ece..e79f423 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -13,6 +13,7 @@
 
 #include "util/parse-events.h"
 #include "util/cache.h"
+#include "util/pmu.h"
 
 int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
 {
@@ -37,6 +38,8 @@ int cmd_list(int argc, const char **argv, const char *prefix __maybe_unused)
 			else if (strcmp(argv[i], "cache") == 0 ||
 				 strcmp(argv[i], "hwcache") == 0)
 				print_hwcache_events(NULL, false);
+			else if (strcmp(argv[i], "pmu") == 0)
+				print_pmu_events(NULL, false);
 			else if (strcmp(argv[i], "--raw-dump") == 0)
 				print_events(NULL, true);
 			else {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index ee6a73c..2c472ca 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1063,6 +1063,8 @@ int print_hwcache_events(const char *event_glob, bool name_only)
 		}
 	}
 
+	if (printed)
+		printf("\n");
 	return printed;
 }
 
@@ -1117,11 +1119,12 @@ void print_events(const char *event_glob, bool name_only)
 
 	print_hwcache_events(event_glob, name_only);
 
+	print_pmu_events(event_glob, name_only);
+
 	if (event_glob != NULL)
 		return;
 
 	if (!name_only) {
-		printf("\n");
 		printf("  %-50s [%s]\n",
 		       "rNNN",
 		       event_type_descriptors[PERF_TYPE_RAW]);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8a2229d..1e98952 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -551,6 +551,78 @@ void perf_pmu__set_format(unsigned long *bits, long from, long to)
 		set_bit(b, bits);
 }
 
+static char *format_alias(char *buf, int len, struct perf_pmu *pmu,
+			  struct perf_pmu__alias *alias)
+{
+	snprintf(buf, len, "%s/%s/", pmu->name, alias->name);
+	return buf;
+}
+
+static char *format_alias_or(char *buf, int len, struct perf_pmu *pmu,
+			     struct perf_pmu__alias *alias)
+{
+	snprintf(buf, len, "%s OR %s/%s/", alias->name, pmu->name, alias->name);
+	return buf;
+}
+
+static int cmp_string(const void *a, const void *b)
+{
+	const char * const *as = a;
+	const char * const *bs = b;
+	return strcmp(*as, *bs);
+}
+
+void print_pmu_events(const char *event_glob, bool name_only)
+{
+	struct perf_pmu *pmu;
+	struct perf_pmu__alias *alias;
+	char buf[1024];
+	int printed = 0;
+	int len, j;
+	char **aliases;
+
+	pmu = NULL;
+	len = 0;
+	while ((pmu = perf_pmu__scan(pmu)) != NULL)
+		list_for_each_entry (alias, &pmu->aliases, list)
+			len++;
+	aliases = malloc(sizeof(char *) * len);
+	if (!aliases)
+		return;
+	pmu = NULL;
+	j = 0;
+	while ((pmu = perf_pmu__scan(pmu)) != NULL)
+		list_for_each_entry (alias, &pmu->aliases, list) {
+			char *name = format_alias(buf, sizeof buf, pmu, alias);
+			bool is_cpu = !strcmp(pmu->name, "cpu");
+
+			if (event_glob != NULL &&
+			    !(strglobmatch(name, event_glob) ||
+			      (!is_cpu && strglobmatch(alias->name, event_glob))))
+				continue;
+			aliases[j] = name;
+			if (is_cpu && !name_only)
+				aliases[j] = format_alias_or(buf, sizeof buf,
+							      pmu, alias);
+			aliases[j] = strdup(aliases[j]);
+			j++;
+		}
+	len = j;
+	qsort(aliases, len, sizeof(char *), cmp_string);
+	for (j = 0; j < len; j++) {
+		if (name_only) {
+			printf("%s ", aliases[j]);
+			continue;
+		}
+		printf("  %-50s [Kernel PMU event]\n", aliases[j]);
+		free(aliases[j]);
+		printed++;
+	}
+	if (printed)
+		printf("\n");
+	free(aliases);
+}
+
 /* Simulated format definitions. */
 static struct test_format {
 	const char *name;
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 39f3aba..da53c2a 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -3,6 +3,7 @@
 
 #include <linux/bitops.h>
 #include "../../../include/uapi/linux/perf_event.h"
+#include <stdbool.h>
 
 enum {
 	PERF_PMU_FORMAT_VALUE_CONFIG,
@@ -49,5 +50,7 @@ void perf_pmu__set_format(unsigned long *bits, long from, long to);
 
 struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
 
+void print_pmu_events(const char *event_glob, bool name_only);
+
 int perf_pmu__test(void);
 #endif /* __PMU_H */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options
  2012-10-26 20:30 ` [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options Andi Kleen
@ 2012-10-27 19:08   ` Jiri Olsa
  2012-10-30 11:58   ` [tip:perf/core] perf " tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 19:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:06PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> The callers of parse_events usually have their own error handling.
> Move the fprintf for a bad event to parse_events_options, which
> is the only one who should need it.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/util/parse-events.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 75c7b0f..409da3e 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -827,8 +827,6 @@ int parse_events(struct perf_evlist *evlist, const char *str,
>  	 * Both call perf_evlist__delete in case of error, so we dont
>  	 * need to bother.
>  	 */
> -	fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
> -	fprintf(stderr, "Run 'perf list' for a list of valid events\n");
>  	return ret;
>  }
>  
> @@ -836,7 +834,13 @@ int parse_events_option(const struct option *opt, const char *str,
>  			int unset __maybe_unused)
>  {
>  	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
> -	return parse_events(evlist, str, unset);
> +	int ret = parse_events(evlist, str, unset);
> +
> +	if (ret) {
> +		fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
> +		fprintf(stderr, "Run 'perf list' for a list of valid events\n");
> +	}
> +	return ret;
>  }
>  
>  int parse_filter(const struct option *opt, const char *str,
> -- 
> 1.7.7.6
> 

Acked-by: Jiri Olsa <jolsa@redhat.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 25/33] perf, tools: Support events with - in the name
  2012-10-26 20:30 ` [PATCH 25/33] perf, tools: Support events with - in the name Andi Kleen
@ 2012-10-27 19:32   ` Jiri Olsa
  0 siblings, 0 replies; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 19:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:07PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> - looks nicer than _, so allow - in the event names. Used for various
> of the arch perfmon and Haswell events.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/util/parse-events.l |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index c87efc1..ef602f0 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -80,7 +80,7 @@ event		[^,{}/]+
>  num_dec		[0-9]+
>  num_hex		0x[a-fA-F0-9]+
>  num_raw_hex	[a-fA-F0-9]+
> -name		[a-zA-Z_*?][a-zA-Z0-9_*?]*
> +name		[a-zA-Z_*?][a-zA-Z0-9\-_*?]*

this breaks cache events parsing since they are '-' separated
and having '-' in 'name' patern will endup with PE_NAME
being matched instead of  PE_NAME_CACHE_* terms

I guess you want '-' being used within 'cpu/..t=v../' terms right?
That could be done via start conditions '%x'

jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 27/33] tools, perf: Add a precise event qualifier
  2012-10-26 20:30 ` [PATCH 27/33] tools, perf: Add a precise event qualifier Andi Kleen
@ 2012-10-27 19:35   ` Jiri Olsa
  2012-10-28 19:13     ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 19:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:09PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a precise qualifier, like cpu/event=0x3c,precise=1/
hm, I think this works already via 'p' modifier like:
   cpu/event=0x3c/p

jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 31/33] perf, tools: Support generic events as pmu event names v2
  2012-10-26 20:30 ` [PATCH 31/33] perf, tools: Support generic events as pmu event names v2 Andi Kleen
@ 2012-10-27 19:42   ` Jiri Olsa
  2012-10-28 19:12     ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 19:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:13PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Extend the parser/lexer to allow generic event names like
> "instructions" as a sysfs supplied PMU event name.
> 
> This resolves the problem that cpu/instructions/ gives a parse
> error, even when the kernel supplies a "instructions" event
> 
> This is useful to add sysfs specified qualifiers to these
> events, for example cpu/instructions,intx=1/ and needed
> for the TSX events
> 
> Simply extend the grammar to handle this case. The lexer
> needs minor changes to save the original string.

ops, I think you need to check recent changes:

3f3a206 perf test: Add automated tests for pmu sysfs translated events
1d33d6d perf tools: Add support to specify hw event as PMU event term
3fded96 perf tools: Fix PMU object alias initialization
20550a4 perf/x86: Add hardware events translations for Intel P6 cpus
0bf79d4 perf/x86: Add hardware events translations for AMD cpus
43c032f perf/x86: Add hardware events translations for Intel cpus
8300daa perf/x86: Filter out undefined events from sysfs events attribute
a474739 perf/x86: Make hardware event translations available in sysfs

jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 32/33] perf, tools: Default to cpu// for events v2
  2012-10-26 20:30 ` [PATCH 32/33] perf, tools: Default to cpu// for events v2 Andi Kleen
@ 2012-10-27 20:16   ` Jiri Olsa
  0 siblings, 0 replies; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 20:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:14PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> When an event fails to parse and it's not in a new style format,
> try to parse it again as a cpu event.
> 
> This allows to use sysfs exported events directly without //, so I can use
> 
> perf record -e tx-aborts ...

hum, seems usefull and hacky ;)

would not work for modifier stuff like:

  tx-aborts:u (not sure if 'u' makes sense for 'tx-aborts'..)

but nevermind, seems like usefull shortcut

> 
> instead of
> 
> perf record -e cpu/tx-aborts/
> 
> v2: Handle multiple events
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/util/parse-events.c |   37 +++++++++++++++++++++++++++++++++++++
>  1 files changed, 37 insertions(+), 0 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index f800765..ee6a73c 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -768,6 +768,23 @@ int parse_events_name(struct list_head *list, char *name)
>  	return 0;
>  }
>  
> +static void str_append(char **s, int *len, const char *a)
> +{
> +	int olen = *s ? strlen(*s) : 0;
> +	int nlen = olen + strlen(a) + 1;
> +	if (*len < nlen) { 
> +		*len = *len * 2;
> +		if (*len < nlen)
> +			*len = nlen;
> +		*s = realloc(*s, *len);
> +		if (!*s) 

trailing whitespace

> +			exit(ENOMEM);
> +		if (olen == 0)
> +			**s = 0;
> +	}
> +	strcat(*s, a);
> +}
> +
>  static int parse_events__scanner(const char *str, void *data, int start_token)
>  {
>  	YY_BUFFER_STATE buffer;
> @@ -788,6 +805,26 @@ static int parse_events__scanner(const char *str, void *data, int start_token)
>  	parse_events__flush_buffer(buffer, scanner);
>  	parse_events__delete_buffer(buffer, scanner);
>  	parse_events_lex_destroy(scanner);
> +
> +	if (ret && !strchr(str, '/')) {
> +		char *o = strdup(str);
> +		char *s = NULL;
> +		char *t = o;
> +		char *p;
> +		int len = 0;
> +
> +		if (!o)
> +			return ret;
> +		while ((p = strsep(&t, ",")) != NULL) {
> +			if (s)
> +				str_append(&s, &len, ",");
> +			str_append(&s, &len, "cpu/");
> +			str_append(&s, &len, p);
> +			str_append(&s, &len, "/");
> +		}
> +		free(o);
> +		ret = parse_events__scanner(s, data, start_token);

any chance above could be in separated function?

> +	}
>  	return ret;
>  }
>  
> -- 
> 1.7.7.6
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2
  2012-10-26 20:30 ` [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2 Andi Kleen
@ 2012-10-27 20:20   ` Jiri Olsa
  2012-10-28 19:05     ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2012-10-27 20:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Fri, Oct 26, 2012 at 01:30:15PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> List the kernel supplied pmu event aliases in perf list
> 
> It's better when the users can actually see them.

with the HW events being part of PMU 'events' dir
we get single HW events listed twice

  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]
  ref-cycles OR cpu/ref-cycles/                      [Kernel PMU event]
  stalled-cycles-backend OR cpu/stalled-cycles-backend/ [Kernel PMU event]
  stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event]
  uncore_cbox_0/clockticks/                          [Kernel PMU event]
  uncore_cbox_1/clockticks/                          [Kernel PMU event]
  uncore_cbox_2/clockticks/                          [Kernel PMU event]
  uncore_cbox_3/clockticks/                          [Kernel PMU event]


jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2
  2012-10-27 20:20   ` Jiri Olsa
@ 2012-10-28 19:05     ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-28 19:05 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Sat, Oct 27, 2012 at 10:20:21PM +0200, Jiri Olsa wrote:
> On Fri, Oct 26, 2012 at 01:30:15PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > List the kernel supplied pmu event aliases in perf list
> > 
> > It's better when the users can actually see them.
> 
> with the HW events being part of PMU 'events' dir
> we get single HW events listed twice

Cannot change it. Not listing at all is no alternative.

In theory could do a dedup step but I don't think it's worth it.

-Andi

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 31/33] perf, tools: Support generic events as pmu event names v2
  2012-10-27 19:42   ` Jiri Olsa
@ 2012-10-28 19:12     ` Andi Kleen
  2012-10-29  9:23       ` Peter Zijlstra
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Sat, Oct 27, 2012 at 09:42:00PM +0200, Jiri Olsa wrote:
> On Fri, Oct 26, 2012 at 01:30:13PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Extend the parser/lexer to allow generic event names like
> > "instructions" as a sysfs supplied PMU event name.
> > 
> > This resolves the problem that cpu/instructions/ gives a parse
> > error, even when the kernel supplies a "instructions" event
> > 
> > This is useful to add sysfs specified qualifiers to these
> > events, for example cpu/instructions,intx=1/ and needed
> > for the TSX events
> > 
> > Simply extend the grammar to handle this case. The lexer
> > needs minor changes to save the original string.
> 
> ops, I think you need to check recent changes:

Note I wrote and posted all this before you posted last week, but the wheels
of perf review grind so slowly that you overtook me.

Peter Z., to be honest all these later patches are just caused by not having
generic TSX events/modifiers and you not liking my original approach.

I'm now at 10+ patches for the sysfs stuff and counting and have conflicts
with parallel work by Jiri and in general it's still all somewhat hacky and
actually far more code than the original patches.

I'm very tempted to go back to the original approach with generic
events and modifiers, that was far simpler and cleaner and really
did work better, and it was a far less intrusive patchkit.

Comments?

-Andi


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 27/33] tools, perf: Add a precise event qualifier
  2012-10-27 19:35   ` Jiri Olsa
@ 2012-10-28 19:13     ` Andi Kleen
  2012-10-28 19:24       ` Jiri Olsa
  0 siblings, 1 reply; 55+ messages in thread
From: Andi Kleen @ 2012-10-28 19:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andi Kleen, linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Sat, Oct 27, 2012 at 09:35:44PM +0200, Jiri Olsa wrote:
> On Fri, Oct 26, 2012 at 01:30:09PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Add a precise qualifier, like cpu/event=0x3c,precise=1/
> hm, I think this works already via 'p' modifier like:
>    cpu/event=0x3c/p

Not for kernel specified events, which is why this was needed.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 27/33] tools, perf: Add a precise event qualifier
  2012-10-28 19:13     ` Andi Kleen
@ 2012-10-28 19:24       ` Jiri Olsa
  2012-10-28 20:06         ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Olsa @ 2012-10-28 19:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, eranian, mingo, Andi Kleen

On Sun, Oct 28, 2012 at 08:13:13PM +0100, Andi Kleen wrote:
> On Sat, Oct 27, 2012 at 09:35:44PM +0200, Jiri Olsa wrote:
> > On Fri, Oct 26, 2012 at 01:30:09PM -0700, Andi Kleen wrote:
> > > From: Andi Kleen <ak@linux.intel.com>
> > > 
> > > Add a precise qualifier, like cpu/event=0x3c,precise=1/
> > hm, I think this works already via 'p' modifier like:
> >    cpu/event=0x3c/p
> 
> Not for kernel specified events, which is why this was needed.

I haven't checked deeply, but I thought the modifier is always
applied, what do you mean by 'kernel specified events' ?

jirka

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 27/33] tools, perf: Add a precise event qualifier
  2012-10-28 19:24       ` Jiri Olsa
@ 2012-10-28 20:06         ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-28 20:06 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, linux-kernel, acme, peterz, eranian, mingo

On Sun, Oct 28, 2012 at 08:24:49PM +0100, Jiri Olsa wrote:
> On Sun, Oct 28, 2012 at 08:13:13PM +0100, Andi Kleen wrote:
> > On Sat, Oct 27, 2012 at 09:35:44PM +0200, Jiri Olsa wrote:
> > > On Fri, Oct 26, 2012 at 01:30:09PM -0700, Andi Kleen wrote:
> > > > From: Andi Kleen <ak@linux.intel.com>
> > > > 
> > > > Add a precise qualifier, like cpu/event=0x3c,precise=1/
> > > hm, I think this works already via 'p' modifier like:
> > >    cpu/event=0x3c/p
> > 
> > Not for kernel specified events, which is why this was needed.
> 
> I haven't checked deeply, but I thought the modifier is always
> applied, what do you mean by 'kernel specified events' ?

Events exported in sysfs.

See the earlier patches in the patchkit.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 31/33] perf, tools: Support generic events as pmu event names v2
  2012-10-28 19:12     ` Andi Kleen
@ 2012-10-29  9:23       ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2012-10-29  9:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, linux-kernel, acme, eranian, mingo, Andi Kleen

On Sun, 2012-10-28 at 20:12 +0100, Andi Kleen wrote:
> 
> Note I wrote and posted all this before you posted last week, but the wheels
> of perf review grind so slowly that you overtook me.
> 
> Peter Z., to be honest all these later patches are just caused by not having
> generic TSX events/modifiers and you not liking my original approach.

He posted his patches first in June, your haswell stuff came in Sep, you
completely ignored his sysfs work, your problem.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/33] perf, x86: Add PEBSv2 record support
  2012-10-26 20:29 ` [PATCH 01/33] perf, x86: Add PEBSv2 record support Andi Kleen
@ 2012-10-29 10:08   ` Namhyung Kim
  2012-10-29 10:13     ` Andi Kleen
  2012-10-29 10:23     ` Peter Zijlstra
  0 siblings, 2 replies; 55+ messages in thread
From: Namhyung Kim @ 2012-10-29 10:08 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, jolsa, eranian, mingo, Andi Kleen

Hi Andi,

On Fri, 26 Oct 2012 13:29:43 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Add support for the v2 PEBS format. It has a superset of the v1 PEBS
> fields, but has a longer record so we need to adjust the code paths.
>
> The main advantage is the new "EventingRip" support which directly
> gives the instruction, not off-by-one instruction. So with precise == 2
> we use that directly and don't try to use LBRs and walking basic blocks.
> This lowers the overhead significantly.

That means it can support precise == 3?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/33] perf, x86: Add PEBSv2 record support
  2012-10-29 10:08   ` Namhyung Kim
@ 2012-10-29 10:13     ` Andi Kleen
  2012-10-29 10:23     ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-29 10:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, linux-kernel, acme, peterz, jolsa, eranian, mingo,
	Andi Kleen

On Mon, Oct 29, 2012 at 07:08:04PM +0900, Namhyung Kim wrote:
> Hi Andi,
> 
> On Fri, 26 Oct 2012 13:29:43 -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> >
> > Add support for the v2 PEBS format. It has a superset of the v1 PEBS
> > fields, but has a longer record so we need to adjust the code paths.
> >
> > The main advantage is the new "EventingRip" support which directly
> > gives the instruction, not off-by-one instruction. So with precise == 2
> > we use that directly and don't try to use LBRs and walking basic blocks.
> > This lowers the overhead significantly.
> 
> That means it can support precise == 3?

No. It's still 2

I tried to introduced precise == 3 some time ago for PDIR, but it was
rejected. 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2
  2012-10-26 20:29 ` [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2 Andi Kleen
@ 2012-10-29 10:19   ` Namhyung Kim
  0 siblings, 0 replies; 55+ messages in thread
From: Namhyung Kim @ 2012-10-29 10:19 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, jolsa, eranian, mingo, Andi Kleen

Just minor nitpicks..


On Fri, 26 Oct 2012 13:29:52 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Make perf report -j aware of the new intx,notx,abort branch qualifiers.

s/report/record/

The same goes to the subject line too.

>
> v2: ABORT -> ABORTTX
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-record.txt |    3 +++
>  tools/perf/builtin-record.c              |    3 +++
>  2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index b38a1f9..4b9f477 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -172,6 +172,9 @@ following filters are defined:
>          - u:  only when the branch target is at the user level
>          - k: only when the branch target is in the kernel
>          - hv: only when the target is at the hypervisor level
> +	- intx: only when the target is in a hardware transaction
> +	- notx: only when the target is not in a hardware transaction
> +	- aborttx: only when the target is a hardware transaction abort

Mixed tabs and spaces for indentation.


>  
>  +
>  The option requires at least one branch type among any, any_call, any_ret, ind_call.
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index e9231659..88ecbbd 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -725,6 +725,9 @@ static const struct branch_mode branch_modes[] = {
>  	BRANCH_OPT("any_call", PERF_SAMPLE_BRANCH_ANY_CALL),
>  	BRANCH_OPT("any_ret", PERF_SAMPLE_BRANCH_ANY_RETURN),
>  	BRANCH_OPT("ind_call", PERF_SAMPLE_BRANCH_IND_CALL),
> +	BRANCH_OPT("aborttx", PERF_SAMPLE_BRANCH_ABORTTX),
> +	BRANCH_OPT("intx", PERF_SAMPLE_BRANCH_INTX),
> +	BRANCH_OPT("notx", PERF_SAMPLE_BRANCH_NOTX),

How about make them abort_tx, in_tx and no_tx?  I often read them like
int_x and not_x. ;)

Thanks,
Namhyung


>  	BRANCH_END
>  };

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 01/33] perf, x86: Add PEBSv2 record support
  2012-10-29 10:08   ` Namhyung Kim
  2012-10-29 10:13     ` Andi Kleen
@ 2012-10-29 10:23     ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2012-10-29 10:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, linux-kernel, acme, jolsa, eranian, mingo, Andi Kleen

On Mon, 2012-10-29 at 19:08 +0900, Namhyung Kim wrote:
> That means it can support precise == 3?

It should, the difference between 2 and 3 is allowing for !EXACT_IP
samples. Not needing the LBR based fixup we should never have that, so
HSW might indeed allow for 3.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/33] perf, tools: Add support for weight v2
  2012-10-26 20:29 ` [PATCH 16/33] perf, tools: Add support for weight v2 Andi Kleen
@ 2012-10-29 10:44   ` Namhyung Kim
  2012-10-29 11:02     ` Andi Kleen
  0 siblings, 1 reply; 55+ messages in thread
From: Namhyung Kim @ 2012-10-29 10:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, jolsa, eranian, mingo, Andi Kleen

On Fri, 26 Oct 2012 13:29:58 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> perf record has a new option -W that enables weightened sampling.
>
> Add sorting support in top/report for the average weight per sample and the
> total weight sum. This allows to both compare relative cost per event
> and the total cost over the measurement period.

I expected the weight is used for scaling a sample period somehow - and
wondered the *somehow* part - when reading previous patch descriptions
but you just added sort keys.  Is it what you intended originally?


>
> Add the necessary glue to perf report, record and the library.
>
> v2: Merge with new hist refactoring.
> Rename global_weight to weight and weight to local_weight.

But I think (total_)weight and avg_weight looks more natural.


> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
[snip]
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 618d411..3800fb5 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -445,6 +445,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
>  		attr->mmap_data = track;
>  	}
>  
> +	if (opts->sample_weight)
> +		attr->sample_type	|= PERF_SAMPLE_WEIGHT;
> +
>  	if (opts->call_graph) {
>  		attr->sample_type	|= PERF_SAMPLE_CALLCHAIN;
>  
> @@ -870,6 +873,7 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
>  	data->cpu = data->pid = data->tid = -1;
>  	data->stream_id = data->id = data->time = -1ULL;
>  	data->period = 1;
> +	data->weight = 0;

Why not set it to 1 at first and ...

>  
>  	if (event->header.type != PERF_RECORD_SAMPLE) {
>  		if (!evsel->attr.sample_id_all)
> @@ -941,6 +945,12 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
>  		array++;
>  	}
>  
> +	data->weight = 0;

(It seems unnecessary)

> +	if (type & PERF_SAMPLE_WEIGHT) {
> +		data->weight = *array;
> +		array++;
> +	}
> +
>  	if (type & PERF_SAMPLE_READ) {
>  		fprintf(stderr, "PERF_SAMPLE_READ is unsupported for now\n");
>  		return -1;
[snip]
> @@ -268,13 +272,17 @@ static u8 symbol__parent_filter(const struct symbol *parent)
>  static struct hist_entry *add_hist_entry(struct hists *hists,
>  				      struct hist_entry *entry,
>  				      struct addr_location *al,
> -				      u64 period)
> +				      u64 period,
> +				      u64 weight)
>  {
>  	struct rb_node **p;
>  	struct rb_node *parent = NULL;
>  	struct hist_entry *he;
>  	int cmp;
>  
> +	if (weight == 0)
> +		weight = 1;
> +

... get rid of the above?


>  	pthread_mutex_lock(&hists->lock);
>  
>  	p = &hists->entries_in->rb_node;
> @@ -286,7 +294,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
>  		cmp = hist_entry__cmp(entry, he);
>  
>  		if (!cmp) {
> -			he_stat__add_period(&he->stat, period);
> +			he_stat__add_period(&he->stat, period, weight);
>  
>  			/* If the map of an existing hist_entry has
>  			 * become out-of-date due to an exec() or
> @@ -314,6 +322,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
>  
>  	rb_link_node(&he->rb_node_in, parent, p);
>  	rb_insert_color(&he->rb_node_in, hists->entries_in);
> +	he->stat.weight += weight;

I'd suggest that the weight should be set to the 'entry' so that it can
be added when hist_entry__new() called.

Thanks,
Namhyung


>  out:
>  	hist_entry__add_cpumode_period(he, al->cpumode, period);
>  out_unlock:

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 21/33] perf, tools: Add support for record transaction flags
  2012-10-26 20:30 ` [PATCH 21/33] perf, tools: Add support for record transaction flags Andi Kleen
@ 2012-10-29 10:49   ` Namhyung Kim
  0 siblings, 0 replies; 55+ messages in thread
From: Namhyung Kim @ 2012-10-29 10:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, acme, peterz, jolsa, eranian, mingo, Andi Kleen

On Fri, 26 Oct 2012 13:30:03 -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Add the glue in the user tools to record transaction flags with
> --transaction (-T was already taken) and dump them.
>
> Followon patches will use them.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-record.txt |    5 ++++-
>  tools/perf/builtin-record.c              |    2 ++
>  tools/perf/perf.h                        |    1 +
>  tools/perf/util/event.h                  |    1 +
>  tools/perf/util/evsel.c                  |    9 +++++++++
>  tools/perf/util/session.c                |    3 +++
>  6 files changed, 20 insertions(+), 1 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 0ffb436..34f4f1a 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -185,12 +185,15 @@ is enabled for all the sampling events. The sampled branch type is the same for
>  The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
>  Note that this feature may not be available on all processors.
>  
> --W::
>  --weight::
>  Enable weightened sampling. When the event supports an additional weight per sample scale
>  the histogram by this weight. This currently works for TSX abort events and some memory events
>  in precise mode on modern Intel CPUs.
>  
> +-T::

No, as you said, -T was already taken. ;)

Thanks,
Namhyung


> +--transaction::
> +Record transaction flags for transaction related events.
> +
>  SEE ALSO
>  --------
>  linkperf:perf-stat[1], linkperf:perf-list[1]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 16/33] perf, tools: Add support for weight v2
  2012-10-29 10:44   ` Namhyung Kim
@ 2012-10-29 11:02     ` Andi Kleen
  0 siblings, 0 replies; 55+ messages in thread
From: Andi Kleen @ 2012-10-29 11:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, linux-kernel, acme, peterz, jolsa, eranian, mingo

On Mon, Oct 29, 2012 at 07:44:17PM +0900, Namhyung Kim wrote:
> On Fri, 26 Oct 2012 13:29:58 -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> >
> > perf record has a new option -W that enables weightened sampling.
> >
> > Add sorting support in top/report for the average weight per sample and the
> > total weight sum. This allows to both compare relative cost per event
> > and the total cost over the measurement period.
> 
> I expected the weight is used for scaling a sample period somehow - and
> wondered the *somehow* part - when reading previous patch descriptions
> but you just added sort keys.  Is it what you intended originally?

I originally simply multiplied, but then I changed over to separate sort 
keys which give the same result and are more flexible and easier to understand

You're right the manpage is stale, will fix.

> 
> 
> >
> > Add the necessary glue to perf report, record and the library.
> >
> > v2: Merge with new hist refactoring.
> > Rename global_weight to weight and weight to local_weight.
> 
> But I think (total_)weight and avg_weight looks more natural.

Frankly I think your names as as arbitary as mine, so I'll stay
with mine for now.

> >  			 * become out-of-date due to an exec() or
> > @@ -314,6 +322,7 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
> >  
> >  	rb_link_node(&he->rb_node_in, parent, p);
> >  	rb_insert_color(&he->rb_node_in, hists->entries_in);
> > +	he->stat.weight += weight;
> 
> I'd suggest that the weight should be set to the 'entry' so that it can
> be added when hist_entry__new() called.

I did that originally, but ran into some problem, so I moved to this way

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3
  2012-10-26 20:29 ` [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3 Andi Kleen
@ 2012-10-30  9:25   ` Gleb Natapov
  0 siblings, 0 replies; 55+ messages in thread
From: Gleb Natapov @ 2012-10-30  9:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, acme, peterz, jolsa, eranian, mingo, Andi Kleen, avi

On Fri, Oct 26, 2012 at 01:29:47PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> This is not arch perfmon, but older CPUs will just ignore it. This makes
> it possible to do at least some TSX measurements from a KVM guest
> 
> Cc: avi@redhat.com
> Cc: gleb@redhat.com
> v2: Various fixes to address review feedback
> v3: Ignore the bits when no CPUID. No #GP. Force raw events with TSX bits.
> Cc: gleb@redhat.com
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/pmu.c              |   34 ++++++++++++++++++++++++++--------
>  2 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b2e11f4..6783289 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -318,6 +318,7 @@ struct kvm_pmu {
>  	u64 global_ovf_ctrl;
>  	u64 counter_bitmask[2];
>  	u64 global_ctrl_mask;
> +	u64 cpuid_word9;
>  	u8 version;
>  	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
>  	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index cfc258a..8bc954a 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -160,7 +160,7 @@ static void stop_counter(struct kvm_pmc *pmc)
>  
>  static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
>  		unsigned config, bool exclude_user, bool exclude_kernel,
> -		bool intr)
> +		bool intr, bool intx, bool intx_cp)
>  {
>  	struct perf_event *event;
>  	struct perf_event_attr attr = {
> @@ -173,6 +173,11 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
>  		.exclude_kernel = exclude_kernel,
>  		.config = config,
>  	};
> +	/* Will be ignored on CPUs that don't support this. */
> +	if (intx)
> +		attr.config |= HSW_INTX;
> +	if (intx_cp)
> +		attr.config |= HSW_INTX_CHECKPOINTED;
>  
>  	attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
>  
> @@ -206,7 +211,8 @@ static unsigned find_arch_event(struct kvm_pmu *pmu, u8 event_select,
>  	return arch_events[i].event_type;
>  }
>  
> -static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
> +static void reprogram_gp_counter(struct kvm_pmu *pmu, struct kvm_pmc *pmc, 
> +				 u64 eventsel)
>  {
>  	unsigned config, type = PERF_TYPE_RAW;
>  	u8 event_select, unit_mask;
> @@ -224,9 +230,16 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>  	event_select = eventsel & ARCH_PERFMON_EVENTSEL_EVENT;
>  	unit_mask = (eventsel & ARCH_PERFMON_EVENTSEL_UMASK) >> 8;
>  
> +	if (!(boot_cpu_has(X86_FEATURE_HLE) ||
> +	      boot_cpu_has(X86_FEATURE_RTM)) ||
> +	    !(pmu->cpuid_word9 & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
> +		eventsel &= ~(HSW_INTX|HSW_INTX_CHECKPOINTED);
If you put this check into kvm_pmu_cpuid_update() and disallow guest to
set those bits in the first place by choosing appropriate reserved mask
you will not need this check here. This will simplify the code and will
make emulation more correct.

> +
>  	if (!(eventsel & (ARCH_PERFMON_EVENTSEL_EDGE |
>  				ARCH_PERFMON_EVENTSEL_INV |
> -				ARCH_PERFMON_EVENTSEL_CMASK))) {
> +				ARCH_PERFMON_EVENTSEL_CMASK |
> +				HSW_INTX |
> +				HSW_INTX_CHECKPOINTED))) {
>  		config = find_arch_event(&pmc->vcpu->arch.pmu, event_select,
>  				unit_mask);
>  		if (config != PERF_COUNT_HW_MAX)
> @@ -239,7 +252,9 @@ static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
>  	reprogram_counter(pmc, type, config,
>  			!(eventsel & ARCH_PERFMON_EVENTSEL_USR),
>  			!(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> -			eventsel & ARCH_PERFMON_EVENTSEL_INT);
> +			eventsel & ARCH_PERFMON_EVENTSEL_INT,
> +			(eventsel & HSW_INTX),
> +			(eventsel & HSW_INTX_CHECKPOINTED));
>  }
>  
>  static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
> @@ -256,7 +271,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 en_pmi, int idx)
>  			arch_events[fixed_pmc_events[idx]].event_type,
>  			!(en & 0x2), /* exclude user */
>  			!(en & 0x1), /* exclude kernel */
> -			pmi);
> +			pmi, false, false);
>  }
>  
>  static inline u8 fixed_en_pmi(u64 ctrl, int idx)
> @@ -289,7 +304,7 @@ static void reprogram_idx(struct kvm_pmu *pmu, int idx)
>  		return;
>  
>  	if (pmc_is_gp(pmc))
> -		reprogram_gp_counter(pmc, pmc->eventsel);
> +		reprogram_gp_counter(pmu, pmc, pmc->eventsel);
>  	else {
>  		int fidx = idx - INTEL_PMC_IDX_FIXED;
>  		reprogram_fixed_counter(pmc,
> @@ -400,8 +415,8 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
>  		} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
>  			if (data == pmc->eventsel)
>  				return 0;
> -			if (!(data & 0xffffffff00200000ull)) {
> -				reprogram_gp_counter(pmc, data);
> +			if (!(data & 0xfffffffc00200000ull)) {
> +				reprogram_gp_counter(pmu, pmc, data);
>  				return 0;
>  			}
>  		}
> @@ -470,6 +485,9 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
>  	pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
>  		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
>  	pmu->global_ctrl_mask = ~pmu->global_ctrl;
> +
> +	entry = kvm_find_cpuid_entry(vcpu, 7, 0);
> +	pmu->cpuid_word9 = entry ? entry->ebx : 0;
>  }
>  
>  void kvm_pmu_init(struct kvm_vcpu *vcpu)
> -- 
> 1.7.7.6

--
			Gleb.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [tip:perf/core] perf tools: Move parse_events error printing to parse_events_options
  2012-10-26 20:30 ` [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options Andi Kleen
  2012-10-27 19:08   ` Jiri Olsa
@ 2012-10-30 11:58   ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 55+ messages in thread
From: tip-bot for Andi Kleen @ 2012-10-30 11:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, eranian, hpa, mingo, peterz, jolsa, ak, tglx

Commit-ID:  9175ce1f1244edd9f4d39605aa06ce5b0a50b8e0
Gitweb:     http://git.kernel.org/tip/9175ce1f1244edd9f4d39605aa06ce5b0a50b8e0
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Fri, 26 Oct 2012 13:30:06 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Sun, 28 Oct 2012 11:29:52 -0200

perf tools: Move parse_events error printing to parse_events_options

The callers of parse_events usually have their own error handling.  Move
the fprintf for a bad event to parse_events_options, which is the only
one who should need it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1351283415-13170-25-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3a3efcf..c0b785b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -827,8 +827,6 @@ int parse_events(struct perf_evlist *evlist, const char *str,
 	 * Both call perf_evlist__delete in case of error, so we dont
 	 * need to bother.
 	 */
-	fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
-	fprintf(stderr, "Run 'perf list' for a list of valid events\n");
 	return ret;
 }
 
@@ -836,7 +834,13 @@ int parse_events_option(const struct option *opt, const char *str,
 			int unset __maybe_unused)
 {
 	struct perf_evlist *evlist = *(struct perf_evlist **)opt->value;
-	return parse_events(evlist, str, unset);
+	int ret = parse_events(evlist, str, unset);
+
+	if (ret) {
+		fprintf(stderr, "invalid or unsupported event: '%s'\n", str);
+		fprintf(stderr, "Run 'perf list' for a list of valid events\n");
+	}
+	return ret;
 }
 
 int parse_filter(const struct option *opt, const char *str,

^ permalink raw reply related	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2012-10-30 11:59 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-26 20:29 perf PMU support for Haswell v4 Andi Kleen
2012-10-26 20:29 ` [PATCH 01/33] perf, x86: Add PEBSv2 record support Andi Kleen
2012-10-29 10:08   ` Namhyung Kim
2012-10-29 10:13     ` Andi Kleen
2012-10-29 10:23     ` Peter Zijlstra
2012-10-26 20:29 ` [PATCH 02/33] perf, x86: Basic Haswell PMU support v2 Andi Kleen
2012-10-26 20:29 ` [PATCH 03/33] perf, x86: Basic Haswell PEBS support v3 Andi Kleen
2012-10-26 20:29 ` [PATCH 04/33] perf, x86: Support the TSX intx/intx_cp qualifiers v2 Andi Kleen
2012-10-26 20:29 ` [PATCH 05/33] perf, kvm: Support the intx/intx_cp modifiers in KVM arch perfmon emulation v3 Andi Kleen
2012-10-30  9:25   ` Gleb Natapov
2012-10-26 20:29 ` [PATCH 06/33] perf, x86: Support PERF_SAMPLE_ADDR on Haswell Andi Kleen
2012-10-26 20:29 ` [PATCH 07/33] perf, x86: Support Haswell v4 LBR format Andi Kleen
2012-10-26 20:29 ` [PATCH 08/33] perf, x86: Disable LBR recording for unknown LBR_FMT Andi Kleen
2012-10-26 20:29 ` [PATCH 09/33] perf, x86: Support LBR filtering by INTX/NOTX/ABORT v2 Andi Kleen
2012-10-26 20:29 ` [PATCH 10/33] perf, tools: Add abort,notx,intx branch filter options to perf report -j v2 Andi Kleen
2012-10-29 10:19   ` Namhyung Kim
2012-10-26 20:29 ` [PATCH 11/33] perf, tools: Support sorting by intx, abort branch flags Andi Kleen
2012-10-26 20:29 ` [PATCH 12/33] perf, x86: Support full width counting Andi Kleen
2012-10-26 20:29 ` [PATCH 13/33] perf, x86: Avoid checkpointed counters causing excessive TSX aborts v3 Andi Kleen
2012-10-26 20:29 ` [PATCH 14/33] perf, core: Add a concept of a weightened sample Andi Kleen
2012-10-26 20:29 ` [PATCH 15/33] perf, x86: Support weight samples for PEBS Andi Kleen
2012-10-26 20:29 ` [PATCH 16/33] perf, tools: Add support for weight v2 Andi Kleen
2012-10-29 10:44   ` Namhyung Kim
2012-10-29 11:02     ` Andi Kleen
2012-10-26 20:29 ` [PATCH 17/33] perf, tools: Handle XBEGIN like a jump Andi Kleen
2012-10-26 20:30 ` [PATCH 18/33] perf, x86: Support for printing PMU state on spurious PMIs v3 Andi Kleen
2012-10-26 20:30 ` [PATCH 19/33] perf, core: Add generic transaction flags Andi Kleen
2012-10-26 20:30 ` [PATCH 20/33] perf, x86: Add Haswell specific transaction flag reporting Andi Kleen
2012-10-26 20:30 ` [PATCH 21/33] perf, tools: Add support for record transaction flags Andi Kleen
2012-10-29 10:49   ` Namhyung Kim
2012-10-26 20:30 ` [PATCH 22/33] perf, tools: Point --sort documentation to --help Andi Kleen
2012-10-26 20:30 ` [PATCH 23/33] perf, tools: Add browser support for transaction flags Andi Kleen
2012-10-26 20:30 ` [PATCH 24/33] perf, tools: Move parse_events error printing to parse_events_options Andi Kleen
2012-10-27 19:08   ` Jiri Olsa
2012-10-30 11:58   ` [tip:perf/core] perf " tip-bot for Andi Kleen
2012-10-26 20:30 ` [PATCH 25/33] perf, tools: Support events with - in the name Andi Kleen
2012-10-27 19:32   ` Jiri Olsa
2012-10-26 20:30 ` [PATCH 26/33] perf, x86: Report the arch perfmon events in sysfs Andi Kleen
2012-10-26 20:30 ` [PATCH 27/33] tools, perf: Add a precise event qualifier Andi Kleen
2012-10-27 19:35   ` Jiri Olsa
2012-10-28 19:13     ` Andi Kleen
2012-10-28 19:24       ` Jiri Olsa
2012-10-28 20:06         ` Andi Kleen
2012-10-26 20:30 ` [PATCH 28/33] perf, x86: Add Haswell TSX event aliases Andi Kleen
2012-10-26 20:30 ` [PATCH 29/33] perf, tools: Add perf stat --transaction v2 Andi Kleen
2012-10-26 20:30 ` [PATCH 30/33] perf, x86: Add a Haswell precise instructions event Andi Kleen
2012-10-26 20:30 ` [PATCH 31/33] perf, tools: Support generic events as pmu event names v2 Andi Kleen
2012-10-27 19:42   ` Jiri Olsa
2012-10-28 19:12     ` Andi Kleen
2012-10-29  9:23       ` Peter Zijlstra
2012-10-26 20:30 ` [PATCH 32/33] perf, tools: Default to cpu// for events v2 Andi Kleen
2012-10-27 20:16   ` Jiri Olsa
2012-10-26 20:30 ` [PATCH 33/33] perf, tools: List kernel supplied event aliases in perf list v2 Andi Kleen
2012-10-27 20:20   ` Jiri Olsa
2012-10-28 19:05     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).