linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Add top down metrics to perf stat
@ 2016-03-22 23:08 Andi Kleen
  2016-03-22 23:08 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel

[v2: Address review feedback.
Metrics are now always printed, but colored when crossing threshold.
--topdown implies --metric-only.
Various smaller fixes, see individual patches]
[v3: Add --single-thread option and support it with HT off.
Clean up old HT workaround.
Improve documentation.
Various smaller fixes, see individual patches.]

Note to reviewers: includes both tools and kernel patches.
The kernel patches are at the beginning.

This patchkit adds support for TopDown measurements to perf stat
It applies on top of my earlier metrics patchkit, posted
separately.

TopDown is intended to replace the frontend cycles idle/
backend cycles idle metrics in standard perf stat output.
These metrics are not reliable in many workloads, 
due to out of order effects.

This implements a new --topdown mode in perf stat
(similar to --transaction) that measures the pipe line
bottlenecks using standardized formulas. The measurement
can be all done with 5 counters (one fixed counter)

The result are four metrics:
FrontendBound, BackendBound, BadSpeculation, Retiring

that describe the CPU pipeline behavior on a high level.

FrontendBound and BackendBound
BadSpeculation is a higher

The full top down methology has many hierarchical metrics.
This implementation only supports level 1 which can be
collected without multiplexing. A full implementation
of top down on top of perf is available in pmu-tools toplev.
(http://github.com/andikleen/pmu-tools)

The current version works on Intel Core CPUs starting
with Sandy Bridge, and Atom CPUs starting with Silvermont.
In principle the generic metrics should be also implementable
on other out of order CPUs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):
    
topdown-total-slots   Available slots in the pipeline
topdown-slots-issued          Slots issued into the pipeline
topdown-slots-retired         Slots successfully retired
topdown-fetch-bubbles         Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
                          from misspeculation
    
These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.
    
The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.
    
The kernel declares the events supported by the current
CPU and perf stat then computes the formulas based on the
available metrics.


Example output:

$ perf stat --topdown -I 1000 cmd
     1.000735655                   frontend bound       retiring             bad speculation      backend bound        
     1.000735655 S0-C0           2    47.84%              11.69%               8.37%              32.10%           
     1.000735655 S0-C1           2    45.53%              11.39%               8.52%              34.56%           
     2.003978563 S0-C0           2    49.47%              12.22%               8.65%              29.66%           
     2.003978563 S0-C1           2    47.21%              12.98%               8.77%              31.04%           
     3.004901432 S0-C0           2    49.35%              12.26%               8.68%              29.70%           
     3.004901432 S0-C1           2    47.23%              12.67%               8.76%              31.35%           
     4.005766611 S0-C0           2    48.44%              12.14%               8.59%              30.82%           
     4.005766611 S0-C1           2    46.07%              12.41%               8.67%              32.85%           
     5.006580592 S0-C0           2    47.91%              12.08%               8.57%              31.44%           
     5.006580592 S0-C1           2    45.57%              12.27%               8.63%              33.53%           
     6.007545125 S0-C0           2    47.45%              12.02%               8.57%              31.96%           
     6.007545125 S0-C1           2    45.13%              12.17%               8.57%              34.14%           
     7.008539347 S0-C0           2    47.07%              12.03%               8.61%              32.29%           
...


 
For Level 1 Top Down computes metrics per core instead of per logical CPU
on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown 
is per thread)

In this case perf stat automatically enables --per-core mode and also requires
global mode (-a) and avoiding other filters (no cgroup mode)

When Hyper Threading is off this can be overriden with the --single-thread
option. When Hyper Threading is on it is enforced, the only way to
not require -a here is to off line the logical CPUs of the second
threads.

One side effect is that this may require root rights or a
kernel.perf_event_paranoid=-1 setting. 

Full tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 02/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a way to show different sysfs events attributes depending on
HyperThreading is on or off. This is difficult to determine
early at boot, so we just do it dynamically when the sysfs
attribute is read.

v2:
Compute HT status only once in CPU online/offline hooks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c       | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h | 15 +++++++++++++++
 include/linux/perf_event.h   |  7 +++++++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5e830d0..8762874 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1477,6 +1477,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	unsigned int cpu = (long)hcpu;
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	int i, ret = NOTIFY_OK;
+	bool ht_on;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
@@ -1496,6 +1497,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 			kfree(cpuc->kfree_on_online[i]);
 			cpuc->kfree_on_online[i] = NULL;
 		}
+		x86_pmu.ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1;
 		break;
 
 	case CPU_DYING:
@@ -1507,6 +1509,15 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DEAD:
 		if (x86_pmu.cpu_dead)
 			x86_pmu.cpu_dead(cpu);
+		/* Recompute HT state for all CPUs on offline */
+		ht_on = false;
+		for_each_online_cpu (cpu) {
+			if (cpumask_weight(topology_sibling_cpumask(cpu)) > 1) {
+				ht_on = true;
+				break;
+			}
+		}
+		x86_pmu.ht_on = ht_on;
 		break;
 
 	default:
@@ -1616,6 +1627,30 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 	return x86_pmu.events_sysfs_show(page, config);
 }
 
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page)
+{
+	struct perf_pmu_events_ht_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_ht_attr, attr);
+
+	/*
+	 * Report conditional events depending on Hyper-Threading.
+	 *
+	 * This is overly conservative as usually the HT special
+	 * handling is not needed if the other CPU thread is idle.
+	 *
+	 * Note this does not (cannot) handle the case when thread
+	 * siblings are invisible, for example with virtualization
+	 * if they are owned by some other guest.  The user tool
+	 * has to re-read when a thread sibling gets onlined later.
+	 */
+
+	return sprintf(page, "%s",
+			x86_pmu.ht_on ?
+			pmu_attr->event_str_ht :
+			pmu_attr->event_str_noht);
+}
+
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
 EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 68155ca..c8ce03c3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -617,6 +617,11 @@ struct x86_pmu {
 	 * Intel host/guest support (KVM)
 	 */
 	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+	/*
+	 * Hyper Threading on?
+	 */
+	bool ht_on;
 };
 
 struct x86_perf_task_context {
@@ -662,6 +667,14 @@ static struct perf_pmu_events_attr event_attr_##v = {			\
 	.event_str	= str,						\
 };
 
+#define EVENT_ATTR_STR_HT(_name, v, noht, ht)				\
+static struct perf_pmu_events_ht_attr event_attr_##v = {		\
+	.attr		= __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\
+	.id		= 0,						\
+	.event_str_noht	= noht,						\
+	.event_str_ht	= ht,						\
+}
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
@@ -927,6 +940,8 @@ int knc_pmu_init(void);
 
 ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 			  char *page);
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
 
 static inline int is_ht_workaround_enabled(void)
 {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a9d8cab..5959f3a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1165,6 +1165,13 @@ struct perf_pmu_events_attr {
 	const char *event_str;
 };
 
+struct perf_pmu_events_ht_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str_ht;
+	const char *event_str_noht;
+};
+
 ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr,
 			      char *page);
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 02/11] x86, perf: Add Top Down events to Intel Core
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
  2016-03-22 23:08 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 03/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add declarations for the events needed for TopDown to the
Intel big core CPUs starting with Sandy Bridge. We need
to report different values if HyperThreading is on or off.

The only thing this patch does is to export some events
in sysfs.

TopDown level 1 uses a set of abstracted metrics which
are generic to out of order CPU cores (although some
CPUs may not implement all of them):

topdown-total-slots	  Available slots in the pipeline
topdown-slots-issued	  Slots issued into the pipeline
topdown-slots-retired	  Slots successfully retired
topdown-fetch-bubbles	  Pipeline gaps in the frontend
topdown-recovery-bubbles  Pipeline gaps during recovery
			  from misspeculation

A slot is a single operation in the CPU pipe line.

These metrics then allow to compute four useful metrics:
FrontendBound, BackendBound, Retiring, BadSpeculation.

The formulas to compute the metrics are generic, they
only change based on the availability on the abstracted
input values.

The kernel declares the events supported by the current
CPU and their scaling factors (such as the pipeline width)
and perf stat then computes the formulas based on the
available metrics.  This is similar how existing
perf metrics, such as TSC metrics or IPC, are implemented.

This abstracts all CPU pipe line specific knowledge in the
kernel driver, but still avoids the need for larger scale perf
interface changes.

For HyperThreading the any bit is needed to get accurate
values when both threads are executing. This implies that
the events can only be collected as root or with
perf_event_paranoid=-1 for now.

Hyper Threading also requires averaging events from both
threads together (the CPU cannot measure them independently).

In perf stat this is already done by the per core mode.  The
new .aggr-per-core attribute is added to the events, which
then forces perf stat to enable --per-core.

The basic scheme is based on the following paper:
Yasin,
A Top Down Method for Performance analysis and Counter architecture
ISPASS14
(pdf available via google)

v2: Rework scaling. Fix formulas for HyperThreading.
v3: Rename agg-per-core to aggr-per-core
Always set aggr-per-core to one to get same output for HT off.
v4: Separate between forced and advisory aggr-per-core
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 68fa55b..c737a37 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -230,9 +230,65 @@ struct attribute *nhm_events_attrs[] = {
 	NULL,
 };
 
+/*
+ * TopDown events for Core.
+ *
+ * The events are all in slots, which is a free slot in a 4 wide
+ * pipeline. Some events are already reported in slots, for cycle
+ * events we multiply by the pipeline width (4).
+ *
+ * With Hyper Threading on, TopDown metrics are either summed or averaged
+ * between the threads of a core: (count_t0 + count_t1).
+ *
+ * For the average case the metric is always scaled to pipeline width,
+ * so we use factor 2 ((count_t0 + count_t1) / 2 * 4)
+ *
+ * We tell perf to aggregate per core by setting the .aggr-per-core
+ * attribute for the alias to 1 or 2. 2 means it has to be per
+ * core, while 1 means it is optional (but on by default for consistency)
+ */
+
+EVENT_ATTR_STR_HT(topdown-total-slots, td_total_slots,
+	"event=0x3c,umask=0x0",			/* cpu_clk_unhalted.thread */
+	"event=0x3c,umask=0x0,any=1");		/* cpu_clk_unhalted.thread_any */
+EVENT_ATTR_STR_HT(topdown-total-slots.scale, td_total_slots_scale, "4", "2");
+EVENT_ATTR_STR_HT(topdown-total-slots.aggr-per-core, td_total_slots_pc,
+		"1", "2");
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued,
+	"event=0xe,umask=0x1");			/* uops_issued.any */
+EVENT_ATTR_STR_HT(topdown-slots-issued.aggr-per-core, td_slots_issued_pc,
+		"1", "2");
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired,
+	"event=0xc2,umask=0x2");		/* uops_retired.retire_slots */
+EVENT_ATTR_STR_HT(topdown-slots-retired.aggr-per-core,
+		td_slots_retired_pc, "1", "2");
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles,
+	"event=0x9c,umask=0x1");		/* idq_uops_not_delivered_core */
+EVENT_ATTR_STR_HT(topdown-fetch-bubbles.aggr-per-core,
+		td_fetch_bubbles_pc, "1", "2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
+	"event=0xd,umask=0x3,cmask=1",		/* int_misc.recovery_cycles */
+	"event=0xd,umask=0x3,cmask=1,any=1");	/* int_misc.recovery_cycles_any */
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
+	"4", "2");
+EVENT_ATTR_STR_HT(topdown-recovery-bubbles.aggr-per-core,
+		td_recovery_bubbles_pc, "1", "2");
+
 struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(mem_ld_snb),
 	EVENT_PTR(mem_st_snb),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL,
 };
 
@@ -3303,6 +3359,18 @@ static struct attribute *hsw_events_attrs[] = {
 	EVENT_PTR(cycles_ct),
 	EVENT_PTR(mem_ld_hsw),
 	EVENT_PTR(mem_st_hsw),
+	EVENT_PTR(td_slots_issued),
+	EVENT_PTR(td_slots_issued_pc),
+	EVENT_PTR(td_slots_retired),
+	EVENT_PTR(td_slots_retired_pc),
+	EVENT_PTR(td_fetch_bubbles),
+	EVENT_PTR(td_fetch_bubbles_pc),
+	EVENT_PTR(td_total_slots),
+	EVENT_PTR(td_total_slots_scale),
+	EVENT_PTR(td_total_slots_pc),
+	EVENT_PTR(td_recovery_bubbles),
+	EVENT_PTR(td_recovery_bubbles_scale),
+	EVENT_PTR(td_recovery_bubbles_pc),
 	NULL
 };
 
@@ -3644,6 +3712,12 @@ __init int intel_pmu_init(void)
 		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
 		intel_pmu_lbr_init_skl();
 
+		/* INT_MISC.RECOVERY_CYCLES has umask 1 in Skylake */
+		event_attr_td_recovery_bubbles.event_str_noht =
+			"event=0xd,umask=0x1,cmask=1";
+		event_attr_td_recovery_bubbles.event_str_ht =
+			"event=0xd,umask=0x1,cmask=1,any=1";
+
 		x86_pmu.event_constraints = intel_skl_event_constraints;
 		x86_pmu.pebs_constraints = intel_skl_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_skl_extra_regs;
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 03/11] x86, perf: Add Top Down events to Intel Atom
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
  2016-03-22 23:08 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
  2016-03-22 23:08 ` [PATCH 02/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 04/11] x86, perf: Use new ht_on flag in HT leak workaround Andi Kleen
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add topdown event declarations to Silvermont / Airmont.
These cores do not support the full Top Down metrics, but an useful
subset (FrontendBound, Retiring, Backend Bound/Bad Speculation).

The perf stat tool automatically handles the missing events
and combines the available metrics.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c737a37..ca693b3 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1388,6 +1388,29 @@ static __initconst const u64 atom_hw_cache_event_ids
  },
 };
 
+EVENT_ATTR_STR(topdown-total-slots, td_total_slots_slm, "event=0x3c");
+EVENT_ATTR_STR(topdown-total-slots.scale, td_total_slots_scale_slm, "2");
+/* no_alloc_cycles.not_delivered */
+EVENT_ATTR_STR(topdown-fetch-bubbles, td_fetch_bubbles_slm,
+	       "event=0xca,umask=0x50");
+EVENT_ATTR_STR(topdown-fetch-bubbles.scale, td_fetch_bubbles_scale_slm, "2");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-issued, td_slots_issued_slm,
+	       "event=0xc2,umask=0x10");
+/* uops_retired.all */
+EVENT_ATTR_STR(topdown-slots-retired, td_slots_retired_slm,
+	       "event=0xc2,umask=0x10");
+
+static struct attribute *slm_events_attrs[] = {
+	EVENT_PTR(td_total_slots_slm),
+	EVENT_PTR(td_total_slots_scale_slm),
+	EVENT_PTR(td_fetch_bubbles_slm),
+	EVENT_PTR(td_fetch_bubbles_scale_slm),
+	EVENT_PTR(td_slots_issued_slm),
+	EVENT_PTR(td_slots_retired_slm),
+	NULL
+};
+
 static struct extra_reg intel_slm_extra_regs[] __read_mostly =
 {
 	/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
@@ -3521,6 +3544,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_slm_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_slm_extra_regs;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.cpu_events = slm_events_attrs;
 		pr_cont("Silvermont events, ");
 		break;
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 04/11] x86, perf: Use new ht_on flag in HT leak workaround
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (2 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 03/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 05/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Now that we have a convenient ht_on flag in x86_pmu use it
to detect the HT workarounds for older CPUs.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ca693b3..d63ddb0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3854,16 +3854,14 @@ __init int intel_pmu_init(void)
  */
 static __init int fixup_ht_bug(void)
 {
-	int cpu = smp_processor_id();
-	int w, c;
+	int c;
 	/*
 	 * problem not present on this CPU model, nothing to do
 	 */
 	if (!(x86_pmu.flags & PMU_FL_EXCL_ENABLED))
 		return 0;
 
-	w = cpumask_weight(topology_sibling_cpumask(cpu));
-	if (w > 1) {
+	if (x86_pmu.ht_on) {
 		pr_info("PMU erratum BJ122, BV98, HSD29 worked around, HT is on\n");
 		return 0;
 	}
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 05/11] perf, tools: Parse an .aggr-per-core event attribute
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (3 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 04/11] x86, perf: Use new ht_on flag in HT leak workaround Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 06/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add the basic code to parse an .aggr-per-core event attribute.
The attribute means that the event needs to be measured in
per core mode.

v2: Support integer values for aggr-per-core
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/evsel.h        |  1 +
 tools/perf/util/parse-events.c |  1 +
 tools/perf/util/pmu.c          | 24 ++++++++++++++++++++++++
 tools/perf/util/pmu.h          |  2 ++
 4 files changed, 28 insertions(+)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 501ea6e..8a746d2 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -112,6 +112,7 @@ struct perf_evsel {
 	bool			tracking;
 	bool			per_pkg;
 	bool			precise_max;
+	int			aggr_per_core;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4c19d5e..efa7717 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1213,6 +1213,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
 		evsel->unit = info.unit;
 		evsel->scale = info.scale;
 		evsel->per_pkg = info.per_pkg;
+		evsel->aggr_per_core = info.aggr_per_core;
 		evsel->snapshot = info.snapshot;
 	}
 
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index adef23b..38f4d13 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -202,6 +202,24 @@ perf_pmu__parse_per_pkg(struct perf_pmu_alias *alias, char *dir, char *name)
 	return 0;
 }
 
+static void
+perf_pmu__parse_aggr_per_core(struct perf_pmu_alias *alias, char *dir, char *name)
+{
+	char path[PATH_MAX];
+	FILE *f;
+	int flag;
+
+	snprintf(path, PATH_MAX, "%s/%s.aggr-per-core", dir, name);
+
+	f = fopen(path, "r");
+	if (f) {
+		if (fscanf(f, "%d", &flag) == 1)
+			alias->aggr_per_core = flag;
+		fclose(f);
+	}
+}
+
+
 static int perf_pmu__parse_snapshot(struct perf_pmu_alias *alias,
 				    char *dir, char *name)
 {
@@ -251,6 +269,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name,
 		perf_pmu__parse_scale(alias, dir, name);
 		perf_pmu__parse_per_pkg(alias, dir, name);
 		perf_pmu__parse_snapshot(alias, dir, name);
+		perf_pmu__parse_aggr_per_core(alias, dir, name);
 	}
 
 	list_add_tail(&alias->list, list);
@@ -285,6 +304,8 @@ static inline bool pmu_alias_info_file(char *name)
 		return true;
 	if (len > 9 && !strcmp(name + len - 9, ".snapshot"))
 		return true;
+	if (len > 14 && !strcmp(name + len - 14, ".aggr-per-core"))
+		return true;
 
 	return false;
 }
@@ -864,6 +885,7 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 	int ret;
 
 	info->per_pkg = false;
+	info->aggr_per_core = 0;
 
 	/*
 	 * Mark unit and scale as not set
@@ -887,6 +909,8 @@ int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 
 		if (alias->per_pkg)
 			info->per_pkg = true;
+		if (alias->aggr_per_core)
+			info->aggr_per_core = alias->aggr_per_core;
 
 		list_del(&term->list);
 		free(term);
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 5d7e844..95e9e64 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -32,6 +32,7 @@ struct perf_pmu_info {
 	double scale;
 	bool per_pkg;
 	bool snapshot;
+	int  aggr_per_core;
 };
 
 #define UNIT_MAX_LEN	31 /* max length for event unit name */
@@ -44,6 +45,7 @@ struct perf_pmu_alias {
 	double scale;
 	bool per_pkg;
 	bool snapshot;
+	bool aggr_per_core;
 };
 
 struct perf_pmu *perf_pmu__find(const char *name);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 06/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (4 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 05/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 07/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When an event alias is used that the kernel marked as .aggr-per-core, force
--per-core mode (and also require -a and forbid cgroups or per thread mode).
This in term means, --topdown forces --per-core mode.

This is needed for TopDown in SMT mode, because it needs to measure
all threads in a core together and merge the values to compute the correct
percentages of how the pipeline is limited.

We do this if any alias is aggr-per-core.

The main stat code does the necessary checks and forces per core mode.

v2: Rename agg-per-core to aggr-per-core
v3: Split patch into parse and use
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 1f19f2f..447a434 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2156,6 +2156,7 @@ static int __cmd_report(int argc, const char **argv)
 
 int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 {
+	struct perf_evsel *counter;
 	const char * const stat_usage[] = {
 		"perf stat [<options>] [<command>]",
 		NULL
@@ -2299,6 +2300,23 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (add_default_attributes())
 		goto out;
 
+	evlist__for_each (evsel_list, counter) {
+		/* Enable per core mode if only a single event requires it. */
+		if (counter->aggr_per_core) {
+			if (stat_config.aggr_mode != AGGR_GLOBAL &&
+			    stat_config.aggr_mode != AGGR_CORE) {
+				pr_err("per core event configuration requires per core mode\n");
+				goto out;
+			}
+			stat_config.aggr_mode = AGGR_CORE;
+			if (nr_cgroups || !target__has_cpu(&target)) {
+				pr_err("per core event configuration requires system-wide mode (-a)\n");
+				goto out;
+			}
+			break;
+		}
+	}
+
 	target__validate(&target);
 
 	if (perf_evlist__create_maps(evsel_list, &target) < 0) {
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 07/11] perf, tools, stat: Avoid fractional digits for integer scales
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (5 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 06/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 08/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

When the scaling factor is a full integer don't display fractional
digits. This avoids unnecessary .00 output for topdown metrics
with scale factors.

v2: Remove redundant check.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 447a434..46c268e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -66,6 +66,7 @@
 #include <stdlib.h>
 #include <sys/prctl.h>
 #include <locale.h>
+#include <math.h>
 
 #define DEFAULT_SEPARATOR	" "
 #define CNTR_NOT_SUPPORTED	"<not supported>"
@@ -978,12 +979,12 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
 	const char *fmt;
 
 	if (csv_output) {
-		fmt = sc != 1.0 ?  "%.2f%s" : "%.0f%s";
+		fmt = floor(sc) != sc ?  "%.2f%s" : "%.0f%s";
 	} else {
 		if (big_num)
-			fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
+			fmt = floor(sc) != sc ? "%'18.2f%s" : "%'18.0f%s";
 		else
-			fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
+			fmt = floor(sc) != sc ? "%18.2f%s" : "%18.0f%s";
 	}
 
 	aggr_printout(evsel, id, nr);
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 08/11] perf, tools, stat: Scale values by unit before metrics
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (6 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 07/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 09/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Scale values by unit before passing them to the metrics printing functions.
This is needed for TopDown, because it needs to scale the slots correctly
by pipeline width / SMTness.

For existing metrics it shouldn't make any difference, as those generally
use events that don't have any units.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/stat.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4d9b481..ffa1d06 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -307,6 +307,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	struct perf_counts_values *aggr = &counter->counts->aggr;
 	struct perf_stat_evsel *ps = counter->priv;
 	u64 *count = counter->counts->aggr.values;
+	u64 val;
 	int i, ret;
 
 	aggr->val = aggr->ena = aggr->run = 0;
@@ -346,7 +347,8 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	/*
 	 * Save the full runtime - to allow normalization during printout:
 	 */
-	perf_stat__update_shadow_stats(counter, count, 0);
+	val = counter->scale * *count;
+	perf_stat__update_shadow_stats(counter, &val, 0);
 
 	return 0;
 }
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 09/11] perf, tools, stat: Basic support for TopDown in perf stat
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (7 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 08/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 10/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add basic plumbing for TopDown in perf stat

Add a new --topdown options to enable events.
When --topdown is specified set up events for all topdown
events supported by the kernel.
Add topdown-* as a special case to the event parser, as is
needed for all events containing -.

The actual code to compute the metrics is in follow-on patches.

v2: Use standard sysctl read function.
v3: Move x86 specific code to arch/
v4: Enable --metric-only implicitly for topdown.
v5: Add --single-thread option to not force per core mode
v6: Fix output order of topdown metrics
v7: Allow combining with -d
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  24 +++++++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/group.c       |  27 ++++++++
 tools/perf/builtin-stat.c              | 115 ++++++++++++++++++++++++++++++++-
 tools/perf/util/group.h                |   7 ++
 tools/perf/util/parse-events.l         |   1 +
 6 files changed, 172 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/group.c
 create mode 100644 tools/perf/util/group.h

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 04f23b4..0b05d1f 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -204,6 +204,30 @@ Aggregate counts per physical processor for system-wide mode measurements.
 --no-aggr::
 Do not aggregate counts across all monitored CPUs.
 
+--topdown::
+Print top down level 1 metrics if supported by the CPU. This allows to
+determine bottle necks in the CPU pipeline for CPU bound workloads,
+by breaking it down into frontend bound, backend bound, bad speculation
+and retiring. Metrics are only printed when they cross a threshold.
+
+The top down metrics may be collected per core instead of per
+CPU thread. In this case per core mode is automatically enabled
+and -a may need to be set, requiring root rights or
+perf.perf_event_paranoid=-1.
+
+This enable --metric-only, unless overriden with --no-metric-only.
+
+To interpret the results it is usually needed to know on which
+CPUs the workload runs on. If needed the CPUs can be forced using
+taskset.
+
+--single-thread::
+Don't force global per core mode with --topdown. The workload should
+work single threaded, otherwise the results may be unpredictable.
+
+This is generally only possible when the CPU does not use Hyper Threading
+or the secondary CPU threads have been offlined through
+/sys/devices/system/cpu/cpu*/online
 
 EXAMPLES
 --------
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 4659703..4cd8a16 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,6 +3,7 @@ libperf-y += tsc.o
 libperf-y += pmu.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
+libperf-y += group.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
new file mode 100644
index 0000000..f3039b5
--- /dev/null
+++ b/tools/perf/arch/x86/util/group.c
@@ -0,0 +1,27 @@
+#include <stdio.h>
+#include "api/fs/fs.h"
+#include "util/group.h"
+
+/*
+ * Check whether we can use a group for top down.
+ * Without a group may get bad results due to multiplexing.
+ */
+bool check_group(bool *warn)
+{
+	int n;
+
+	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
+		return false;
+	if (n > 0) {
+		*warn = true;
+		return false;
+	}
+	return true;
+}
+
+void group_warn(void)
+{
+	fprintf(stderr,
+		"nmi_watchdog enabled with topdown. May give wrong results.\n"
+		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
+}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 46c268e..6d8ce72 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,10 +59,13 @@
 #include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
+#include "util/group.h"
 #include "util/session.h"
 #include "util/tool.h"
+#include "util/group.h"
 #include "asm/bug.h"
 
+#include <api/fs/fs.h>
 #include <stdlib.h>
 #include <sys/prctl.h>
 #include <locale.h>
@@ -98,6 +101,15 @@ static const char * transaction_limited_attrs = {
 	"}"
 };
 
+static const char * topdown_attrs[] = {
+	"topdown-total-slots",
+	"topdown-slots-retired",
+	"topdown-recovery-bubbles",
+	"topdown-fetch-bubbles",
+	"topdown-slots-issued",
+	NULL,
+};
+
 static struct perf_evlist	*evsel_list;
 
 static struct target target = {
@@ -112,6 +124,8 @@ static volatile pid_t		child_pid			= -1;
 static bool			null_run			=  false;
 static int			detailed_run			=  0;
 static bool			transaction_run;
+static bool			topdown_run			= false;
+static bool			single_thread			= false;
 static bool			big_num				=  true;
 static int			big_num_opt			=  -1;
 static const char		*csv_sep			= NULL;
@@ -124,6 +138,7 @@ static unsigned int		initial_delay			= 0;
 static unsigned int		unit_width			= 4; /* strlen("unit") */
 static bool			forever				= false;
 static bool			metric_only			= false;
+static bool			force_metric_only		= false;
 static struct timespec		ref_time;
 static struct cpu_map		*aggr_map;
 static aggr_get_id_t		aggr_get_id;
@@ -1507,6 +1522,14 @@ static int stat__set_big_num(const struct option *opt __maybe_unused,
 	return 0;
 }
 
+static int enable_metric_only(const struct option *opt __maybe_unused,
+			      const char *s __maybe_unused, int unset)
+{
+	force_metric_only = true;
+	metric_only = !unset;
+	return 0;
+}
+
 static const struct option stat_options[] = {
 	OPT_BOOLEAN('T', "transaction", &transaction_run,
 		    "hardware transaction statistics"),
@@ -1565,8 +1588,12 @@ static const struct option stat_options[] = {
 		     "aggregate counts per thread", AGGR_THREAD),
 	OPT_UINTEGER('D', "delay", &initial_delay,
 		     "ms to wait before starting measurement after program start"),
-	OPT_BOOLEAN(0, "metric-only", &metric_only,
-			"Only print computed metrics. No raw values"),
+	OPT_CALLBACK_NOOPT(0, "metric-only", &metric_only, NULL,
+			"Only print computed metrics. No raw values", enable_metric_only),
+	OPT_BOOLEAN(0, "topdown", &topdown_run,
+			"measure topdown level 1 statistics"),
+	OPT_BOOLEAN(0, "single-thread", &single_thread,
+			"don't force per core mode"),
 	OPT_END()
 };
 
@@ -1759,12 +1786,61 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
+static void filter_events(const char **attr, char **str, bool use_group)
+{
+	int off = 0;
+	int i;
+	int len = 0;
+	char *s;
+
+	for (i = 0; attr[i]; i++) {
+		if (pmu_have_event("cpu", attr[i])) {
+			len += strlen(attr[i]) + 1;
+			attr[i - off] = attr[i];
+		} else
+			off++;
+	}
+	attr[i - off] = NULL;
+
+	*str = malloc(len + 1 + 2);
+	if (!*str)
+		return;
+	s = *str;
+	if (i - off == 0) {
+		*s = 0;
+		return;
+	}
+	if (use_group)
+		*s++ = '{';
+	for (i = 0; attr[i]; i++) {
+		strcpy(s, attr[i]);
+		s += strlen(s);
+		*s++ = ',';
+	}
+	if (use_group) {
+		s[-1] = '}';
+		*s = 0;
+	} else
+		s[-1] = 0;
+}
+
+__weak bool check_group(bool *warn)
+{
+	*warn = false;
+	return false;
+}
+
+__weak void group_warn(void)
+{
+}
+
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
  */
 static int add_default_attributes(void)
 {
+	int err;
 	struct perf_event_attr default_attrs0[] = {
 
   { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK		},
@@ -1883,7 +1959,6 @@ static int add_default_attributes(void)
 		return 0;
 
 	if (transaction_run) {
-		int err;
 		if (pmu_have_event("cpu", "cycles-ct") &&
 		    pmu_have_event("cpu", "el-start"))
 			err = parse_events(evsel_list, transaction_attrs, NULL);
@@ -1896,6 +1971,31 @@ static int add_default_attributes(void)
 		return 0;
 	}
 
+	if (topdown_run) {
+		char *str = NULL;
+		bool warn = false;
+
+		if (!force_metric_only)
+			metric_only = true;
+		filter_events(topdown_attrs, &str, check_group(&warn));
+		if (topdown_attrs[0] && str) {
+			if (warn)
+				group_warn();
+			err = parse_events(evsel_list, str, NULL);
+			if (err) {
+				fprintf(stderr,
+					"Cannot set up top down events %s: %d\n",
+					str, err);
+				free(str);
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "System does not support topdown\n");
+			return -1;
+		}
+		free(str);
+	}
+
 	if (!evsel_list->nr_entries) {
 		if (perf_evlist__add_default_attrs(evsel_list, default_attrs0) < 0)
 			return -1;
@@ -2304,6 +2404,15 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	evlist__for_each (evsel_list, counter) {
 		/* Enable per core mode if only a single event requires it. */
 		if (counter->aggr_per_core) {
+			if (counter->aggr_per_core == 1 &&
+				single_thread)
+				continue;
+			else if (counter->aggr_per_core == 2 &&
+				single_thread) {
+				pr_err("single thread mode cannot be used with Hyper Threading enabled\n");
+				goto out;
+			}
+
 			if (stat_config.aggr_mode != AGGR_GLOBAL &&
 			    stat_config.aggr_mode != AGGR_CORE) {
 				pr_err("per core event configuration requires per core mode\n");
diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
new file mode 100644
index 0000000..daad3ff
--- /dev/null
+++ b/tools/perf/util/group.h
@@ -0,0 +1,7 @@
+#ifndef GROUP_H
+#define GROUP_H 1
+
+bool check_group(bool *warn);
+void group_warn(void);
+
+#endif
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 1477fbc..744ebe3 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -259,6 +259,7 @@ cycles-ct					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 cycles-t					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 mem-loads					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 mem-stores					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
+topdown-[a-z-]+					{ return str(yyscanner, PE_KERNEL_PMU_EVENT); }
 
 L1-dcache|l1-d|l1d|L1-data		|
 L1-icache|l1-i|l1i|L1-instruction	|
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 10/11] perf, tools, stat: Add computation of TopDown formulas
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (8 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 09/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-22 23:08 ` [PATCH 11/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
  2016-03-27 11:27 ` Add top down metrics to perf stat Jiri Olsa
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Implement the TopDown formulas in perf stat. The topdown basic metrics
reported by the kernel are collected, and the formulas are computed
and output as normal metrics.

See the kernel commit exporting the events for details on the used
metrics.

v2: Always print all metrics, only use thresholds for coloring.
v3: Mark retiring over threshold green, not red.
v4:
Only print one decimal digit
Fix color printing of one metric
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/util/stat-shadow.c | 160 ++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.c        |   5 ++
 tools/perf/util/stat.h        |   5 ++
 3 files changed, 170 insertions(+)

diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index b33ffb2..ff6745a 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -36,6 +36,11 @@ static struct stats runtime_dtlb_cache_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_cycles_in_tx_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_transaction_stats[NUM_CTX][MAX_NR_CPUS];
 static struct stats runtime_elision_stats[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_total_slots[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_issued[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_slots_retired[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_fetch_bubbles[NUM_CTX][MAX_NR_CPUS];
+static struct stats runtime_topdown_recovery_bubbles[NUM_CTX][MAX_NR_CPUS];
 static bool have_frontend_stalled;
 
 struct stats walltime_nsecs_stats;
@@ -82,6 +87,12 @@ void perf_stat__reset_shadow_stats(void)
 		sizeof(runtime_transaction_stats));
 	memset(runtime_elision_stats, 0, sizeof(runtime_elision_stats));
 	memset(&walltime_nsecs_stats, 0, sizeof(walltime_nsecs_stats));
+	memset(runtime_topdown_total_slots, 0, sizeof(runtime_topdown_total_slots));
+	memset(runtime_topdown_slots_retired, 0, sizeof(runtime_topdown_slots_retired));
+	memset(runtime_topdown_slots_issued, 0, sizeof(runtime_topdown_slots_issued));
+	memset(runtime_topdown_fetch_bubbles, 0, sizeof(runtime_topdown_fetch_bubbles));
+	memset(runtime_topdown_recovery_bubbles, 0, sizeof(runtime_topdown_recovery_bubbles));
+	have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend");
 }
 
 /*
@@ -104,6 +115,16 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 *count,
 		update_stats(&runtime_transaction_stats[ctx][cpu], count[0]);
 	else if (perf_stat_evsel__is(counter, ELISION_START))
 		update_stats(&runtime_elision_stats[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS))
+		update_stats(&runtime_topdown_total_slots[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED))
+		update_stats(&runtime_topdown_slots_issued[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED))
+		update_stats(&runtime_topdown_slots_retired[ctx][cpu], count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES))
+		update_stats(&runtime_topdown_fetch_bubbles[ctx][cpu],count[0]);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
+		update_stats(&runtime_topdown_recovery_bubbles[ctx][cpu], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_stats(&runtime_stalled_cycles_front_stats[ctx][cpu], count[0]);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_BACKEND))
@@ -303,6 +324,104 @@ static void print_ll_cache_misses(int cpu,
 	out->print_metric(out->ctx, color, "%7.2f%%", "of all LL-cache hits", ratio);
 }
 
+/*
+ * High level "TopDown" CPU core pipe line bottleneck break down.
+ *
+ * Basic concept following
+ * Yasin, A Top Down Method for Performance analysis and Counter architecture
+ * ISPASS14
+ *
+ * The CPU pipeline is divided into 4 areas that can be bottlenecks:
+ *
+ * Frontend -> Backend -> Retiring
+ * BadSpeculation in addition means out of order execution that is thrown away
+ * (for example branch mispredictions)
+ * Frontend is instruction decoding.
+ * Backend is execution, like computation and accessing data in memory
+ * Retiring is good execution that is not directly bottlenecked
+ *
+ * The formulas are computed in slots.
+ * A slot is an entry in the pipeline each for the pipeline width
+ * (for example a 4-wide pipeline has 4 slots for each cycle)
+ *
+ * Formulas:
+ * BadSpeculation = ((SlotsIssued - SlotsRetired) + RecoveryBubbles) /
+ *			TotalSlots
+ * Retiring = SlotsRetired / TotalSlots
+ * FrontendBound = FetchBubbles / TotalSlots
+ * BackendBound = 1.0 - BadSpeculation - Retiring - FrontendBound
+ *
+ * The kernel provides the mapping to the low level CPU events and any scaling
+ * needed for the CPU pipeline width, for example:
+ *
+ * TotalSlots = Cycles * 4
+ *
+ * The scaling factor is communicated in the sysfs unit.
+ *
+ * In some cases the CPU may not be able to measure all the formulas due to
+ * missing events. In this case multiple formulas are combined, as possible.
+ *
+ * With SMT the slots of thread siblings need to be combined to get meaningful
+ * results. This is implemented by the kernel forcing per-core mode with
+ * the .aggr-per-core sysfs attribute.
+ *
+ * Full TopDown supports more levels to sub-divide each area: for example
+ * BackendBound into computing bound and memory bound. For now we only
+ * support Level 1 TopDown.
+ */
+
+static double td_total_slots(int ctx, int cpu)
+{
+	return avg_stats(&runtime_topdown_total_slots[ctx][cpu]);
+}
+
+static double td_bad_spec(int ctx, int cpu)
+{
+	double bad_spec = 0;
+	double total_slots;
+	double total;
+
+	total = avg_stats(&runtime_topdown_slots_issued[ctx][cpu]) -
+		avg_stats(&runtime_topdown_slots_retired[ctx][cpu]) +
+		avg_stats(&runtime_topdown_recovery_bubbles[ctx][cpu]);
+	total_slots = td_total_slots(ctx, cpu);
+	if (total_slots)
+		bad_spec = total / total_slots;
+	return bad_spec;
+}
+
+static double td_retiring(int ctx, int cpu)
+{
+	double retiring = 0;
+	double total_slots = td_total_slots(ctx, cpu);
+	double ret_slots = avg_stats(&runtime_topdown_slots_retired[ctx][cpu]);
+
+	if (total_slots)
+		retiring = ret_slots / total_slots;
+	return retiring;
+}
+
+static double td_fe_bound(int ctx, int cpu)
+{
+	double fe_bound = 0;
+	double total_slots = td_total_slots(ctx, cpu);
+	double fetch_bub = avg_stats(&runtime_topdown_fetch_bubbles[ctx][cpu]);
+
+	if (total_slots)
+		fe_bound = fetch_bub / total_slots;
+	return fe_bound;
+}
+
+static double td_be_bound(int ctx, int cpu)
+{
+	double sum = (td_fe_bound(ctx, cpu) +
+		      td_bad_spec(ctx, cpu) +
+		      td_retiring(ctx, cpu));
+	if (sum == 0)
+		return 0;
+	return 1.0 - sum;
+}
+
 void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				   double avg, int cpu,
 				   struct perf_stat_output_ctx *out)
@@ -310,6 +429,7 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 	void *ctxp = out->ctx;
 	print_metric_t print_metric = out->print_metric;
 	double total, ratio = 0.0, total2;
+	const char *color = NULL;
 	int ctx = evsel_context(evsel);
 
 	if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
@@ -452,6 +572,46 @@ void perf_stat__print_shadow_stats(struct perf_evsel *evsel,
 				     avg / ratio);
 		else
 			print_metric(ctxp, NULL, NULL, "CPUs utilized", 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) {
+		double fe_bound = td_fe_bound(ctx, cpu);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) {
+		double retiring = td_retiring(ctx, cpu);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) {
+		double bad_spec = td_bad_spec(ctx, cpu);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) {
+		double be_bound = td_be_bound(ctx, cpu);
+		const char *name = "backend bound";
+		static int have_recovery_bubbles = -1;
+
+		/* In case the CPU does not support topdown-recovery-bubbles */
+		if (have_recovery_bubbles < 0)
+			have_recovery_bubbles = pmu_have_event("cpu",
+					"topdown-recovery-bubbles");
+		if (!have_recovery_bubbles)
+			name = "backend bound/bad spec";
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		if (td_total_slots(ctx, cpu) > 0)
+			print_metric(ctxp, color, "%8.1f%%", name,
+					be_bound * 100.);
+		else
+			print_metric(ctxp, NULL, NULL, name, 0);
 	} else if (runtime_nsecs_stats[cpu].n != 0) {
 		char unit = 'M';
 		char unit_buf[10];
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index ffa1d06..c1ba255 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -79,6 +79,11 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TRANSACTION_START,	cpu/tx-start/),
 	ID(ELISION_START,	cpu/el-start/),
 	ID(CYCLES_IN_TX_CP,	cpu/cycles-ct/),
+	ID(TOPDOWN_TOTAL_SLOTS, topdown-total-slots),
+	ID(TOPDOWN_SLOTS_ISSUED, topdown-slots-issued),
+	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
+	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
+	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
 };
 #undef ID
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 0150e78..c29bb94 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -17,6 +17,11 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TRANSACTION_START,
 	PERF_STAT_EVSEL_ID__ELISION_START,
 	PERF_STAT_EVSEL_ID__CYCLES_IN_TX_CP,
+	PERF_STAT_EVSEL_ID__TOPDOWN_TOTAL_SLOTS,
+	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_ISSUED,
+	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
 	PERF_STAT_EVSEL_ID__MAX,
 };
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 11/11] perf, tools, stat: Add extra output of counter values with -v
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (9 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 10/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
@ 2016-03-22 23:08 ` Andi Kleen
  2016-03-27 11:27 ` Add top down metrics to perf stat Jiri Olsa
  11 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-22 23:08 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, eranian, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add debug output of raw counter values per CPU when
perf stat -v is specified, together with their cpu numbers.
This is very useful to debug problems with per core counters,
where we can normally only see aggregated values.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6d8ce72..0ee224b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -314,6 +314,14 @@ static int read_counter(struct perf_evsel *counter)
 					return -1;
 				}
 			}
+
+			if (verbose) {
+				fprintf(stat_config.output,
+					"%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+						perf_evsel__name(counter),
+						cpu,
+						count->val, count->ena, count->run);
+			}
 		}
 	}
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Add top down metrics to perf stat
  2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
                   ` (10 preceding siblings ...)
  2016-03-22 23:08 ` [PATCH 11/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
@ 2016-03-27 11:27 ` Jiri Olsa
  2016-03-27 15:22   ` Andi Kleen
  11 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2016-03-27 11:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, peterz, jolsa, eranian, mingo, linux-kernel

On Tue, Mar 22, 2016 at 04:08:46PM -0700, Andi Kleen wrote:

SNIP

> In this case perf stat automatically enables --per-core mode and also requires
> global mode (-a) and avoiding other filters (no cgroup mode)
> 
> When Hyper Threading is off this can be overriden with the --single-thread
> option. When Hyper Threading is on it is enforced, the only way to
> not require -a here is to off line the logical CPUs of the second
> threads.
> 
> One side effect is that this may require root rights or a
> kernel.perf_event_paranoid=-1 setting. 
> 
> Full tree available in 
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-16
> 

can't see this one (-16):

[jolsa@krava perf]$ git remote update ak
Fetching ak
[jolsa@krava perf]$ git branch -r | grep top-down
  ak/perf/top-down-10
  ak/perf/top-down-11
  ak/perf/top-down-13
  ak/perf/top-down-2

thanks,
jirka

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Add top down metrics to perf stat
  2016-03-27 11:27 ` Add top down metrics to perf stat Jiri Olsa
@ 2016-03-27 15:22   ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-03-27 15:22 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, peterz, jolsa, eranian, mingo, linux-kernel

> can't see this one (-16):
> 
> [jolsa@krava perf]$ git remote update ak
> Fetching ak
> [jolsa@krava perf]$ git branch -r | grep top-down
>   ak/perf/top-down-10
>   ak/perf/top-down-11
>   ak/perf/top-down-13
>   ak/perf/top-down-2

Please try again, I pushed it again.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status
  2016-04-28  8:06   ` Peter Zijlstra
@ 2016-04-28  8:17     ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-04-28  8:17 UTC (permalink / raw)
  To: Andi Kleen
  Cc: acme, jolsa, linux-kernel, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar

On Thu, Apr 28, 2016 at 10:06:09AM +0200, Peter Zijlstra wrote:
> On Wed, Apr 27, 2016 at 01:00:41PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Add a way to show different sysfs events attributes depending on
> > HyperThreading is on or off. This is difficult to determine
> > early at boot, so we just do it dynamically when the sysfs
> > attribute is read.
> 
> Thomas would like to have this in the general (x86) topology bits.
> 
> Because having SMT enabled (or not) is not something perf specific (nor
> Intel, because apparently AMD is going to do proper SMT too with their
> Zen micro-arch), and might very well be useful for other thingies to
> know.
> 
> So I suppose it should end up somewhere around:
> 
>   /sys/devices/system/cpu/

FWIW I think we already can tell if SMT is enabled by doing something
like:

for i in /sys/devices/system/cpu/cpu*/topology/thread_siblings_list;
do
	SMT=`cat $i | awk --field-separator "," '{print NF}'`;
	if [ $SMT -gt 1 ] ; then
		echo ENABLED;
		break;
	fi;
done

With pretty much the same races as this interface has, no?

In any case, a single value is certainly easier than the above.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status
  2016-04-27 20:00 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
@ 2016-04-28  8:06   ` Peter Zijlstra
  2016-04-28  8:17     ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-04-28  8:06 UTC (permalink / raw)
  To: Andi Kleen
  Cc: acme, jolsa, linux-kernel, Andi Kleen, Thomas Gleixner,
	H. Peter Anvin, Ingo Molnar

On Wed, Apr 27, 2016 at 01:00:41PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add a way to show different sysfs events attributes depending on
> HyperThreading is on or off. This is difficult to determine
> early at boot, so we just do it dynamically when the sysfs
> attribute is read.

Thomas would like to have this in the general (x86) topology bits.

Because having SMT enabled (or not) is not something perf specific (nor
Intel, because apparently AMD is going to do proper SMT too with their
Zen micro-arch), and might very well be useful for other thingies to
know.

So I suppose it should end up somewhere around:

  /sys/devices/system/cpu/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status
  2016-04-27 20:00 Add top down metrics to perf stat Andi Kleen
@ 2016-04-27 20:00 ` Andi Kleen
  2016-04-28  8:06   ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2016-04-27 20:00 UTC (permalink / raw)
  To: acme; +Cc: peterz, jolsa, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a way to show different sysfs events attributes depending on
HyperThreading is on or off. This is difficult to determine
early at boot, so we just do it dynamically when the sysfs
attribute is read.

v2:
Compute HT status only once in CPU online/offline hooks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c       | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h | 19 +++++++++++++++++++
 include/linux/perf_event.h   |  7 +++++++
 3 files changed, 61 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 41d93d0e972b..f1411062ccfb 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1477,6 +1477,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	unsigned int cpu = (long)hcpu;
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	int i, ret = NOTIFY_OK;
+	bool ht_on;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
@@ -1496,6 +1497,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 			kfree(cpuc->kfree_on_online[i]);
 			cpuc->kfree_on_online[i] = NULL;
 		}
+		x86_pmu.ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1;
 		break;
 
 	case CPU_DYING:
@@ -1507,6 +1509,15 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DEAD:
 		if (x86_pmu.cpu_dead)
 			x86_pmu.cpu_dead(cpu);
+		/* Recompute HT state for all CPUs on offline */
+		ht_on = false;
+		for_each_online_cpu (cpu) {
+			if (cpumask_weight(topology_sibling_cpumask(cpu)) > 1) {
+				ht_on = true;
+				break;
+			}
+		}
+		x86_pmu.ht_on = ht_on;
 		break;
 
 	default:
@@ -1616,6 +1627,30 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr, cha
 }
 EXPORT_SYMBOL_GPL(events_sysfs_show);
 
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page)
+{
+	struct perf_pmu_events_ht_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_ht_attr, attr);
+
+	/*
+	 * Report conditional events depending on Hyper-Threading.
+	 *
+	 * This is overly conservative as usually the HT special
+	 * handling is not needed if the other CPU thread is idle.
+	 *
+	 * Note this does not (cannot) handle the case when thread
+	 * siblings are invisible, for example with virtualization
+	 * if they are owned by some other guest.  The user tool
+	 * has to re-read when a thread sibling gets onlined later.
+	 */
+
+	return sprintf(page, "%s",
+			x86_pmu.ht_on ?
+			pmu_attr->event_str_ht :
+			pmu_attr->event_str_noht);
+}
+
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
 EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7d62a02f49a4..6ae84bb91402 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -622,6 +622,11 @@ struct x86_pmu {
 	 * Intel host/guest support (KVM)
 	 */
 	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+	/*
+	 * Hyper Threading on?
+	 */
+	bool ht_on;
 };
 
 struct x86_perf_task_context {
@@ -667,6 +672,14 @@ static struct perf_pmu_events_attr event_attr_##v = {			\
 	.event_str	= str,						\
 };
 
+#define EVENT_ATTR_STR_HT(_name, v, noht, ht)				\
+static struct perf_pmu_events_ht_attr event_attr_##v = {		\
+	.attr		= __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\
+	.id		= 0,						\
+	.event_str_noht	= noht,						\
+	.event_str_ht	= ht,						\
+}
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
@@ -937,6 +950,12 @@ int p6_pmu_init(void);
 
 int knc_pmu_init(void);
 
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
 static inline int is_ht_workaround_enabled(void)
 {
 	return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index a090700cccca..b293b89a276f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1205,6 +1205,13 @@ struct perf_pmu_events_attr {
 	const char *event_str;
 };
 
+struct perf_pmu_events_ht_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str_ht;
+	const char *event_str_noht;
+};
+
 ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr,
 			      char *page);
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status
  2016-04-04 20:41 Andi Kleen
@ 2016-04-04 20:41 ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2016-04-04 20:41 UTC (permalink / raw)
  To: peterz, acme; +Cc: jolsa, linux-kernel, mingo, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add a way to show different sysfs events attributes depending on
HyperThreading is on or off. This is difficult to determine
early at boot, so we just do it dynamically when the sysfs
attribute is read.

v2:
Compute HT status only once in CPU online/offline hooks.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/core.c       | 35 +++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h | 19 +++++++++++++++++++
 include/linux/perf_event.h   |  7 +++++++
 3 files changed, 61 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 002b2eadd600..4a9523426990 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1477,6 +1477,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	unsigned int cpu = (long)hcpu;
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 	int i, ret = NOTIFY_OK;
+	bool ht_on;
 
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_UP_PREPARE:
@@ -1496,6 +1497,7 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 			kfree(cpuc->kfree_on_online[i]);
 			cpuc->kfree_on_online[i] = NULL;
 		}
+		x86_pmu.ht_on = cpumask_weight(topology_sibling_cpumask(cpu)) > 1;
 		break;
 
 	case CPU_DYING:
@@ -1507,6 +1509,15 @@ x86_pmu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 	case CPU_DEAD:
 		if (x86_pmu.cpu_dead)
 			x86_pmu.cpu_dead(cpu);
+		/* Recompute HT state for all CPUs on offline */
+		ht_on = false;
+		for_each_online_cpu (cpu) {
+			if (cpumask_weight(topology_sibling_cpumask(cpu)) > 1) {
+				ht_on = true;
+				break;
+			}
+		}
+		x86_pmu.ht_on = ht_on;
 		break;
 
 	default:
@@ -1616,6 +1627,30 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr, cha
 }
 EXPORT_SYMBOL_GPL(events_sysfs_show);
 
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page)
+{
+	struct perf_pmu_events_ht_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_ht_attr, attr);
+
+	/*
+	 * Report conditional events depending on Hyper-Threading.
+	 *
+	 * This is overly conservative as usually the HT special
+	 * handling is not needed if the other CPU thread is idle.
+	 *
+	 * Note this does not (cannot) handle the case when thread
+	 * siblings are invisible, for example with virtualization
+	 * if they are owned by some other guest.  The user tool
+	 * has to re-read when a thread sibling gets onlined later.
+	 */
+
+	return sprintf(page, "%s",
+			x86_pmu.ht_on ?
+			pmu_attr->event_str_ht :
+			pmu_attr->event_str_noht);
+}
+
 EVENT_ATTR(cpu-cycles,			CPU_CYCLES		);
 EVENT_ATTR(instructions,		INSTRUCTIONS		);
 EVENT_ATTR(cache-references,		CACHE_REFERENCES	);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a6771e2303d2..86538527def3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -617,6 +617,11 @@ struct x86_pmu {
 	 * Intel host/guest support (KVM)
 	 */
 	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+
+	/*
+	 * Hyper Threading on?
+	 */
+	bool ht_on;
 };
 
 struct x86_perf_task_context {
@@ -662,6 +667,14 @@ static struct perf_pmu_events_attr event_attr_##v = {			\
 	.event_str	= str,						\
 };
 
+#define EVENT_ATTR_STR_HT(_name, v, noht, ht)				\
+static struct perf_pmu_events_ht_attr event_attr_##v = {		\
+	.attr		= __ATTR(_name, 0444, events_ht_sysfs_show, NULL),\
+	.id		= 0,						\
+	.event_str_noht	= noht,						\
+	.event_str_ht	= ht,						\
+}
+
 extern struct x86_pmu x86_pmu __read_mostly;
 
 static inline bool x86_pmu_has_lbr_callstack(void)
@@ -928,6 +941,12 @@ int p6_pmu_init(void);
 
 int knc_pmu_init(void);
 
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
+ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
+			  char *page);
+
 static inline int is_ht_workaround_enabled(void)
 {
 	return !!(x86_pmu.flags & PMU_FL_EXCL_ENABLED);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4065ca2d7149..e02ab677ab3c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1170,6 +1170,13 @@ struct perf_pmu_events_attr {
 	const char *event_str;
 };
 
+struct perf_pmu_events_ht_attr {
+	struct device_attribute attr;
+	u64 id;
+	const char *event_str_ht;
+	const char *event_str_noht;
+};
+
 ssize_t perf_event_sysfs_show(struct device *dev, struct device_attribute *attr,
 			      char *page);
 
-- 
2.5.5

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-04-28  8:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-22 23:08 Add top down metrics to perf stat Andi Kleen
2016-03-22 23:08 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2016-03-22 23:08 ` [PATCH 02/11] x86, perf: Add Top Down events to Intel Core Andi Kleen
2016-03-22 23:08 ` [PATCH 03/11] x86, perf: Add Top Down events to Intel Atom Andi Kleen
2016-03-22 23:08 ` [PATCH 04/11] x86, perf: Use new ht_on flag in HT leak workaround Andi Kleen
2016-03-22 23:08 ` [PATCH 05/11] perf, tools: Parse an .aggr-per-core event attribute Andi Kleen
2016-03-22 23:08 ` [PATCH 06/11] perf, tools, stat: Force --per-core mode for .aggr-per-core aliases Andi Kleen
2016-03-22 23:08 ` [PATCH 07/11] perf, tools, stat: Avoid fractional digits for integer scales Andi Kleen
2016-03-22 23:08 ` [PATCH 08/11] perf, tools, stat: Scale values by unit before metrics Andi Kleen
2016-03-22 23:08 ` [PATCH 09/11] perf, tools, stat: Basic support for TopDown in perf stat Andi Kleen
2016-03-22 23:08 ` [PATCH 10/11] perf, tools, stat: Add computation of TopDown formulas Andi Kleen
2016-03-22 23:08 ` [PATCH 11/11] perf, tools, stat: Add extra output of counter values with -v Andi Kleen
2016-03-27 11:27 ` Add top down metrics to perf stat Jiri Olsa
2016-03-27 15:22   ` Andi Kleen
2016-04-04 20:41 Andi Kleen
2016-04-04 20:41 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2016-04-27 20:00 Add top down metrics to perf stat Andi Kleen
2016-04-27 20:00 ` [PATCH 01/11] x86, perf: Support sysfs files depending on SMT status Andi Kleen
2016-04-28  8:06   ` Peter Zijlstra
2016-04-28  8:17     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox